Note on We Need to Tell People ChatGPT Will Lie to Them, Not Debate Linguistics via Simon Willison

More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.

FROM:
Simon Willison
We Need to Tell People ChatGPT Will Lie to Them, Not Debate Linguistics
Source

Reference

Notes
llm
We Need to Tell People ChatGPT Will Lie to Them, Not Debate Linguistics
Simon Willison
2023, April 07, Friday
Permalink to 2023.NTE.398
Insight
Edit

← Previous	Next →
Note on Avoidance Speech via wikipedia.org	Chicago 🚶

Widgets

Network Graph

Legend

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search

Note on We Need to Tell People ChatGPT Will Lie to Them, Not Debate Linguistics via Simon Willison

Reference

Comments & Replies

Widgets