Note on CaMeL Offers a Promising New Direction for Mitigating Prompt Injection Attacks via Simon Willison

The new DeepMind paper introduces a system called CaMeL (short for CApabilities for MachinE Learning). The goal of CaMeL is to safely take a prompt like “Send Bob the document he requested in our last meeting” and execute it, taking into account the risk that there might be malicious instructions somewhere in the context that attempt to over-ride the user’s intent.

It works by taking a command from a user, converting that into a sequence of steps in a Python-like programming language, then checking the inputs and outputs of each step to make absolutely sure the data involved is only being passed on to the right places.

FROM:
Simon Willison
CaMeL Offers a Promising New Direction for Mitigating Prompt Injection Attacks
Source

Basically it tracks output from tool calls and assigns capabilities and policies to inputs to tools calls.

Capabilities are effectively tags that can be attached to each of the variables, to track things like who is allowed to read a piece of data and the source that the data came from. Policies can then be configured to allow or deny actions based on those capabilities.

Reference

Notes
llm, safety, security
CaMeL Offers a Promising New Direction for Mitigating Prompt Injection Attacks
Simon Willison
2025, July 16, Wednesday
Permalink to 2025.NTE.100
Insight
Edit

← Previous	Next →
Note on Documenting What You're Willing to Support via rachelbythebay.com	Soggy sore 🏃

Widgets

Network Graph

Legend

Key	Action
`o`	Source
`e`	Edit
`i`	Insight
`r`	Random
`h`	Home
`s` or `/`	Search

Note on CaMeL Offers a Promising New Direction for Mitigating Prompt Injection Attacks via Simon Willison

Reference

Comments & Replies

Widgets