If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of combining tools with the following three characteristics. Failing to understand this can let an attacker steal your data.

The lethal trifecta of capabilities is:

Access to your private data—one of the most common purposes of tools in the first place! • Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM • The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)

If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.

The lethal trifecta (diagram). Three circles: Access to Private Data, Ability to Externally Communicate, Exposure to Untrusted Content.

The problem is that LLMs follow instructions in content

I think of it as: every message from a user and response from a tool call is exogenous code Reminds me to think about the sux rule to prevent untrusted external code (which is just plain/natural language, in an LLM system) from being executed outside of a sandbox.


Keyboard Shortcuts

Key Action
o Source
e Edit
i Insight
r Random
h Home
s or / Search
www.joshbeckman.org/notes/907494919