I had a read through this and I couldn't really tell if there was something novel here?
I understand that perturbations and generating new examples from labelled examples is a pretty normal park of the process when you only have a limited number of examples available.
The novelty is in applying 2 perturbations to available unlabeled images and use them as part of training. This is different than what you are describing about applying augmentations to labeled images to increase data size.
My immediate question was "how do you use unlabeled images for training?" But then I decided to read the paper :) The answer is:
Two different perturbations to the same image should have the same predicted label by the model, even if it doesn't know what the correct label is. That information can be used in the training.
What if the model's prediction is wrong with high confidence? What if the cat is labeled as a dog for both perturbations? Then wouldn't the system train against the wrong label?
Nope,because of the way it works. So in the beginning when the model is being trained on the labeled data, it will make many mistakes. So it's confidence for either cat or dog will be low. Hence, in that case unlabeled data are not used at all.
As training progresses, the model will become better at labeled data. And so it can start predicting with high confidence on unlabeled images that are trivial/similar-looking/same distribution with labeled data. So, gradually unlabeled images get started being used as part of training. As training progresses, more and more unlabeled data are added.
The mathematics of the combined loss function and curriculum learning part talks about this.
I understand that perturbations and generating new examples from labelled examples is a pretty normal park of the process when you only have a limited number of examples available.