Note on Epistemic Calibration and Searching the Space of Truth via thesephist.com

When we build a preference dataset, what we should actually be asking is, “Is a world with a model trained on this dataset preferable to a world with a model trained on that dataset?” Of course, this is an intractable question to ask, because doing so would require somehow collecting human labels on every possible arrangement of a training dataset, leading to a combinatorial explosion of options. Instead, we approximate this by collecting human preference signals on each individual data point. But there’s a mismatch: just because humans prefer a more detailed image in one instance doesn’t mean that we’d prefer a world where every single image was maximally detailed.

FROM:
thesephist.com
Epistemic Calibration and Searching the Space of Truth
Source

Preference tuning tunes models away from being accurate reflections of reality. When we ask a human labeler to choose between one output and another, it’s a poor proxy for the actual thing we want: a rank of the direction the model is pursuing.

Reference

Widgets

Network Graph

Legend

Insight Agent

This widget generates “insights” about the post using an agentic loop and MCP server for this site. A response may take up to a minute to generate.

Generating

Keyboard Shortcuts

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search