Note on Epistemic Calibration and Searching the Space of Truth via thesephist.com

recognizing this difference between base models and feedback-tuned models is important, because this kind of a preference tuning step changes what the model is doing at a fundamental level. A pretrained base model is an epistemically calibrated world model. It’s epistemically calibrated, meaning its output probabilities exactly mirror frequency of concepts and styles present in its training dataset. If 2% of all photos of waterfalls also have rainbows, exactly 2% of photos of waterfalls the model generates will have rainbows. It’s also a world model, in the sense that what results from pretraining is a probabilistic model of observations of the world (its training dataset). Anything we can find in the training dataset, we can also expect to find in the model’s output space.

Once we subject the model to preference tuning, however, the model transforms into something very different, a function that greedily and cleverly finds a way to interpret every input into a version of the request that includes elements it knows is most likely to result in a positive rating from a reviewer.

FROM:
thesephist.com
Epistemic Calibration and Searching the Space of Truth
Source

Preference tuning methods like RLHF and DPO change the goal of a large neural net model: from modeling the input to targeting approval.

Reference

Notes
side-effects, rlhf, ai, dpo, machine-learning, alignment, llm, epistemology
Epistemic Calibration and Searching the Space of Truth
thesephist.com
2024, July 16, Tuesday
Permalink to 2024.NTE.123
Insight
Edit

← Previous	Next →
Note on Rainbow Box - Wikipedia via wikipedia.org	Note on Epistemic Calibration and Searching the Space of Truth via thesephist.com

Widgets

Network Graph

Legend

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search