Note on John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI via Dwarkesh Patel

The other thing about chat was when we had these instruct models. The task of “complete this text, but in a nice or helpful way” is a pretty poorly defined task. That task is both confusing for the model and for the human who’s supposed to do the data labeling.

Whereas for chat, people had an intuitive sense of what a helpful robot should be like. So it was just much easier for people to get an idea of what the model was supposed to do. As a result, the model had a much more coherent personality and it was much easier to get pretty sensible behavior robustly.

FROM:
Dwarkesh Patel
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI
Source

One of the other reasons we have a chat interface for LLMs and large machine-learned models: it was easier for humans to evaluate chat as a form of output/interaction.

Reference

Notes
llm
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI
Dwarkesh Patel
2024, May 16, Thursday
Permalink to 2024.NTE.095
Insight
Edit

← Previous	Next →
Note on Performance Impact of the Memoization Idiom on Modern Ruby via Jean Boussier	Note on Mapping the Mind of a Large Language Model via anthropic.com

Widgets

Network Graph

Legend

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search