Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We wouldn't be able to run it ourselves if they hadn't trained it on 4000 GPUs for a month.


The cost of training is actually quite a bit less. Emad, the creator of SD stated this on Twitter:

"We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k"


Even if it was hard to train, you could make your own by fine-tuning a larger model for much cheaper.

That's called "base models". (or "foundation models" if you're Stanford trying to co-opt it)


suppose one has an idea for a different architecture / functional form etc, assuming the receiving model is substantially smaller so that the dominant computational cost is in the SD model, how long would effective knowledge distillation take on say a CPU?


That’s called teacher-student learning. It could still take weeks on a single machine easily, but renting more GPU time or getting free credits from somewhere is perfectly plausible.


Christ, so what happens when google throws a cheeky 10 million at a model?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: