the tech am I digging recently is a software framework calle...

the tech am I digging recently is a software framework called LangChain (here are the docs) which does something pretty straightforward: it makes it easy to call OpenAI’s GPT, say, a dozen times in a loop to answer a single question, and mix in queries to Wikipedia and other databases.

This is a big deal because of a technique called ReAct from a paper out of Princeton and Google Research (the ReAct website links to the Nov 2022 paper, sample code, etc).

ReAct looks innocuous but here’s the deal: instead of asking GPT to simply do smart-autocomplete on your text, you prompt it to respond in a thought/act/observation loop. So you ask GPT to respond like:

Thought: Let’s think step by step. I need to find out X and then do Y.

Act: Search Wikipedia for X

Observation: From the Wikipedia page I have learnt that …

Thought: So the answer is …

And it is allowed to repeat as many times as necessarily, iterating towards its goal.

The clever bit is that, using LangChain, you intercept GPT when it starts a line with “Act:” and then you go and do that action for it, feeding the results back in as an “Observation” line so that it can “think” what to do next.

The really clever bit is that, at the outset, you tell GPT what tools it has available, and how to access them. So it might have:

• Public databases like Wikipedia or IMDB or arXiv or company registers • Proprietary databases like your internal HR system • One-shot tools like a calculator, or a programming language • Systems it can drive, not just query – like it could open and close windows on your computer, if you built an interface, or trundle a robot forward for a better view.

And this is wild.

Because now we have reasoning, goal-directed action, and tool use for AI.

It circumvents the problem of the language model “lying” (LLMs tend to be highly convincing confabulators) by giving it access to factual sources.

Comments
www.joshbeckman.org/notes/524990338