Yes, I think this is part of it -- when the model sees a new low-res image, it compares it to patterns that it has already seen to estimate what that location might look like at high-res.
The other important part is that the model inputs many low-res images (up to 18, i.e. about three months of images) to produce each high-res image. If you were to down-sample an image by 2x via averaging, then offset the image by one pixel to the right and down-sample it, then repeat for two more offsets, then across the four down-sampled images, you should have enough information to reconstruct the original image. We want our ML model to attempt a similar reconstruction but with actual low-res images. The idea breaks down in practice since pixel values from a camera aren't a perfect average of the light reflected from that grid cell, and there are seasonal changes and clouds and other dynamic factors, but with many aligned low-res captures (with sub-pixel offsets) an ML model should still be able to somewhat accurately estimate what the scene looks like at 2x or 4x higher-res (the Satlas map shows a 4x attempt). The current model we've deployed does this far from perfectly, so there are some issues like figuring out where the model might be making a mistake and enabling the model to best make use of the many low-res input images, and we're actively exploring how to improve on these.
The other important part is that the model inputs many low-res images (up to 18, i.e. about three months of images) to produce each high-res image. If you were to down-sample an image by 2x via averaging, then offset the image by one pixel to the right and down-sample it, then repeat for two more offsets, then across the four down-sampled images, you should have enough information to reconstruct the original image. We want our ML model to attempt a similar reconstruction but with actual low-res images. The idea breaks down in practice since pixel values from a camera aren't a perfect average of the light reflected from that grid cell, and there are seasonal changes and clouds and other dynamic factors, but with many aligned low-res captures (with sub-pixel offsets) an ML model should still be able to somewhat accurately estimate what the scene looks like at 2x or 4x higher-res (the Satlas map shows a 4x attempt). The current model we've deployed does this far from perfectly, so there are some issues like figuring out where the model might be making a mistake and enabling the model to best make use of the many low-res input images, and we're actively exploring how to improve on these.
This shares ideas with burst super-resolution, see e.g. Deep Burst Super-Resolution [https://arxiv.org/pdf/2101.10997.pdf].