Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few months (weeks?) ago I would've said that this already was the case for language models. It's absolutely mind-blowing to me what is happening here - same with stable diffusion. Once Dall-E was out, I was sure that there was no way that anything like this could be run on consumer hardware. I'm very happy to be proven wrong.

In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.



LLaMA can be fine tuned in hours on a consumer GPU or in a free Colab with just 12GB of VRAM, and soon 6GB in 4bit training, using PEFT.

https://github.com/zphang/minimal-llama#peft-fine-tuning-wit...


Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.

This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: