What about some kind of sharding, parts of the computation that could be execute... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		emteycz on Jan 18, 2021 \| parent \| context \| favorite \| on: GPT-Neo – Building a GPT-3-sized model, open sourc... What about some kind of sharding, parts of the computation that could be executed in isolation for a longer period of time?

Filligree on Jan 18, 2021 [–]

An ongoing research problem. OpenAI would certainly like being able to use smaller GPUs, instead of having to fit the entire model into one.

jne2356 on Jan 18, 2021 | [–]

GPT-3 does not fit in any one GPU that exists at present. It's already spread out across multiple GPUs.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact