I wonder if one could build something like SETI@home, but for open source model training. Assuming the model fits on a gaming GPU, it's just data distributed parallel training but with a large distance between training nodes.
I wouldn’t discount the complexity of the code and development. The model architecture itself is incredibly complex, likely with tons of custom layers and tensor operators, along with all the custom tooling for data I/o, likely custom optimization package for training, utilities for observability and diagnostics, and the actual configuration/orchestration of storage and compute resources…
And then you have the resources themselves. Which enable them to iterate more quickly on building all of the above. Oh and the training dataset.
In that way, it makes sense to just release it under a permissive license because there's still a massive cost to use it.