A few months (weeks?) ago I would've said that this already was the case for language models. It's absolutely mind-blowing to me what is happening here - same with stable diffusion. Once Dall-E was out, I was sure that there was no way that anything like this could be run on consumer hardware. I'm very happy to be proven wrong.
In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.
Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.
This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).
In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.