Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.
This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).
This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).