The other thing about chat was when we had these instruct models. The task of “complete this text, but in a nice or helpful way” is a pretty poorly defined task. That task is both confusing for the model and for the human who’s supposed to do the data labeling.
Whereas for chat, people had an intuitive sense of what a helpful robot should be like. So it was just much easier for people to get an idea of what the model was supposed to do. As a result, the model had a much more coherent personality and it was much easier to get pretty sensible behavior robustly.
FROM:Dwarkesh PatelJohn Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI
One of the other reasons we have a chat interface for LLMs and large machine-learned models: it was easier for humans to evaluate chat as a form of output/interaction.
Josh Beckman