Note on S1: The $6 R1 Competitor? via Tim Kellogg

In s1, when the LLM tries to stop thinking with "</think>", they force it to keep going by replacing it with "Wait". It’ll then begin to second guess and double check it’s answer. They do this to trim or extend thinking time (trimming is just abruptly inserting "</think>").

It’s really dumb, I love it.

FROM:
Tim Kellogg
S1: The $6 R1 Competitor?
Source

I did this to myself today by typing out a full response to a colleague then stepping back and forcing myself to rethink it.

There are so many simple tricks still to be discovered with LLMs: here, an example of SFT (supervised fine tuning) over RLHF (reinforcement learning human feedback).

Reference

Notes
llm, feedback
S1: The $6 R1 Competitor?
Tim Kellogg
2025, February 15, Saturday
Permalink to 2025.NTE.021
Insight
Edit

← Previous	Next →
Note on What Comes Next via Jackie Luo	The Inventors of Deep Research - Latent.Space via swyx & Alessio

Widgets

Network Graph

Legend

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search