This is not true. The OpenAI team only trained one full-sized GPT-3, and conducted their hyperparameter sweep on significantly smaller models (see: https://arxiv.org/abs/2001.08361). The compute savings from not having to do the hyperparameter sweep are negligible and do not significantly change the feasibility of the project.