Josh

Building in the open

Note on Dragoncatcher: Ungrounded Thought via Robin Sloan

The over­loading of common words is well underway: new lan­guage models have “thinking” modes, “reasoning” capabilities! What this means, in practice, is that they’ve learned to pro­duce a spe­cial kind of text, the con­ver­sion of the lin­guistic if-then into a dynamo that spins and spins and, often, magically — yes, it is magical — pro­duces useful results.

Here is one dis­tinc­tion among several: this process can only compound — the models can only “think” by spooling out more text — while human thinking often does the oppo­site: retreats into silence, because it doesn’t have words yet to say what it wants to say.

Human thinking often washes the dishes, then goes for a walk.

So, if you rede­fine “thinking” to mean “arriving at a solu­tion through an iter­a­tive lin­guistic loop” … yes, that’s what these models do. That def­i­n­i­tion is IMHO pretty thin.

We talk about humans thinking harder, which is not the same as thinking longer. I think most people know from expe­ri­ence that thinking longer gen­er­ally just makes you anxious. But that’s what the models do, and not only longer, but in parallel, all those step-by-step mono­logues spilling out simultaneously, some­where in the dark of a data center. “Quantity has a quality all its own,” said Stalin, maybe … 

Well, okay — what does it mean for a human to think harder? Rea­son­able people will dis­agree (and in inter­esting ways) but, for my part, I think it means prospecting new analogies; pitching your inquiry out away from the grav­i­ta­tional attrac­tors of pro­tocol and cliché; turning the work­piece around to inspect it from new angles; and espe­cially bringing more senses into the mix — grounding your­self in reality. You’ll note these moves are chal­lenging or impos­sible for sys­tems that operate only on/with/inside lan­guage.

A couple of years ago, when I wondered if lan­guage models are in hell, I expressed some hope about the rich­ness of mul­ti­modal training. So far, this hasn’t panned out. Rather than images anchoring text in a richer, more embodied realm, the mar­riage seems to have gone the oppo­site direction. The models chop images into sequences of tokens — big bright pic­tures become spindly threads, a bit sad — and feed them in along with every­thing else.

We are going to lose this word — we might already have lost it — but/and we can put a marker down; a gravestone, you might call it; for a kind of thinking that used to mean more than “more”.

Keyboard Shortcuts

Key Action
o Source
e Edit
i Insight
r Random
h Home
s or / Search
Josh Beckman: https://www.joshbeckman.org/notes/954493879