Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it's meant in the literal sense but with metaphorical hacksaws and duct tape.

Early on, some advanced LLM users noticed they could get better results by forcing insertion of a word like "Wait," or "Hang on," or "Actually," and then running the model for a few more paragraphs. This would increase the chance of a model noticing a mistake it made.

Reasoning is basically this.



It's not just force inserting a word. Reasoning is integrated into the training process of the model.


Not the core foundation model. The foundation model still only predicts the next token in a static way. The reasoning is tacked onto the instructGPT style finetuning step and its done through prompt engineering. Which is the shittiest way a model like this could have been done, and it shows




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: