As your agent takes on more capabilities, its action space naturally grows more complex—in plain terms, the number of tools explodes. The recent popularity of MCP only adds fuel to the fire. If you allow user-configurable tools, trust me: someone will inevitably plug hundreds of mysterious tools into your carefully curated action space. As a result, the model is more likely to select the wrong action or take an inefficient path. In short, your heavily armed agent gets dumber.

A natural reaction is to design a dynamic action space—perhaps loading tools on demand using something RAG-like. We tried that in Manus too. But our experiments suggest a clear rule: unless absolutely necessary, avoid dynamically adding or removing tools mid-iteration. There are two main reasons for this:

  1. In most LLMs, tool definitions live near the front of the context after serialization, typically before or after the system prompt. So any change will invalidate the KV-cache for all subsequent actions and observations.
  2. When previous actions and observations still refer to tools that are no longer defined in the current context, the model gets confused. Without constrained decoding, this often leads to schema violations or hallucinated actions.

To solve this while still improving action selection, Manus uses a context-aware state machine to manage tool availability. Rather than removing tools, it masks the token logits during decoding to prevent (or enforce) the selection of certain actions based on the current context.

In practice, most model providers and inference frameworks support some form of response prefill, which allows you to constrain the action space without modifying the tool definitions. There are generally three modes of function calling (we’ll use the Hermes format from NousResearch as an example):

  • Auto: The model may choose to call a function or not. Implemented by prefilling only the reply prefix: <|im_start|>assistant
  • Required: The model must call a function, but the choice is unconstrained. Implemented by prefilling up to tool call token: <|im_start|>assistant
  • Specified: The model must call a function from a specific subset. Implemented by prefilling up to the beginning of the function name: <|im_start|>assistant{"name": “browser_

Using this, we constrain action selection by masking token logits directly. For example, when the user provides a new input, Manus must reply immediately instead of taking an action. We’ve also deliberately designed action names with consistent prefixes—e.g., all browser-related tools start with browser_, and command-line tools with shell_. This allows us to easily enforce that the agent only chooses from a certain group of tools at a given state without using stateful logits processors.

These designs help ensure that the Manus agent loop remains stable—even under a model-driven architecture.

This is a very interesting technique: force an assistant response to every user request, so as to keep open the possibility for removing tools/calls in future edits of the conversation.

Also from this post/overview:

the file system as the ultimate context


Keyboard Shortcuts

Key Action
o Source
e Edit
i Insight
r Random
h Home
s or / Search
www.joshbeckman.org/notes/920141445