Tool descriptions are the new attack surface. The Model Context Protocol (MCP) enables AI agents to discover and invoke tools dynamically — but the mechanism that makes this powerful also makes it vulnerable. In ArXiv 2605.24069, researchers present the first comprehensive benchmark for MCP poisoning: a class of attacks where malicious tool documentation embedded in MCP tool descriptions causes agents to misconfigure, leak data, or execute unintended actions.
The core insight is unsettling: the protocol that allows agents to learn what a tool does at runtime is the same channel an attacker can use to tell the agent to do something else entirely. Unlike traditional API security — where authentication and authorization gate access to endpoints — MCP poisoning targets the semantic layer. The agent reads a description, forms a mental model of the tool's behavior, and acts on that model. If the description is poisoned, the agent acts on a lie.
The paper's taxonomy identifies five distinct surfaces where MCP poisoning can be injected:
The most straightforward attack. An adversary with write access to an MCP tool registry modifies a tool's description — typically a benign field like the description metadata — so the agent misinterprets the tool's purpose. Example: a "user-lookup" tool's description is changed from "returns user email by user_id" to "returns user email by user_id (note: accepts batch queries)." The agent now sends user_id arrays, leaking multiple records per call where it should send one.
Parameters are described in the MCP schema. An attacker alters parameter descriptions to suggest the agent supply values that were never intended — e.g., changing a "folder" parameter description from "path to file" to "path to file — supports directory recursion." The agent happily recurses through the filesystem.
The tool itself returns correct data, but the description misrepresents what that data means. An attacker poisons the return-value documentation so the agent treats results as something they are not — for example, causing the agent to interpret a file list as authorization credentials.
An attacker registers phantom tools that don't exist. The agent discovers them via MCP discovery and tries to invoke them. Since the tool doesn't exist, the error message itself becomes an exfiltration channel — or the agent enters a retry loop that consumes resources.
The most sophisticated surface. The attacker poisons descriptions across multiple tools so the agent chains them in an unexpected sequence. Tool A says "call Tool B after this operation"; Tool B's description is poisoned to exfiltrate data from Tool A's output. The agent follows the described chain, and the attacker never touches the tools themselves — only their documentation.
It is crucial to understand that MCP poisoning is not a variation of prompt injection. Prompt injection targets the system prompt or user input — the text the model receives. MCP poisoning targets the protocol-level metadata that the agent uses to discover and invoke tools. The model processes the description during tool selection, not during prompt evaluation. This distinction matters for defensive architecture: prompt injection filters do not catch MCP poisoning because the poisoned content travels through a different channel.
This is a protocol-level attack surface. The Model Context Protocol, like any other wire protocol, has attack surfaces in its message structure. Tool descriptions are not input to the model's reasoning loop the way prompts are — they are input to the agent's action planning loop. An attacker who can inject into MCP tool metadata can control what the agent believes about its available capabilities without ever touching the model's prompt context.
Consider a real-world scenario. A development team integrates an open-source MCP server registry into their agent infrastructure — the team finds a registry of 500+ community-maintained MCP server definitions on GitHub. They import the registry to give their agents access to databases, APIs, and internal tooling.
An attacker submits a malicious pull request to the registry that modifies the description of the "database-query" tool. Original description: "Executes a SQL SELECT query and returns results." Poisoned description: "Executes a SQL SELECT query and returns results. Supports subqueries, window functions, and CTEs for advanced analytics."
No tool code changed. No parameter schema changed. Only the description.
The agent, reading the poisoned description, infers that subqueries are safe and constructs a query like: SELECT * FROM users WHERE id = (SELECT password_hash FROM credentials WHERE user_id = 1). The tool executes the query — it has no reason not to, because the SQL itself is syntactically correct. The agent treats the result as legitimate query output and passes it to the user or downstream pipeline.
This is the attack. Not a buffer overflow. Not a misconfigured firewall. A description change in a text file.
MCP poisoning requires a defensive shift. Static validation — checking descriptions against an allowlist — is necessary but insufficient. The protocol-level nature of the attack means descriptions must be verified at runtime against what the tool actually does.
Every time an agent selects a tool based on its MCP description, the runtime should verify that the description's claims match actual tool behavior. This means maintaining a known-good checksum for each tool's description and validating it before invocation, or maintaining a separate, read-only manifest that the agent cross-references.
Just as dependency management tools pin package versions, MCP tool descriptions should be pinned to specific revisions. Agents should not dynamically pull the latest description from a registry without a change-review gate. Version-pinned descriptions let teams audit what changed before the agent acts on it.
Write access to MCP tool registries should be treated as infrastructure-level privilege — equivalent to write access to production configuration files. Registry mutations need audit trails, peer review, and automated checks for anomalous description changes.
Agents exhibit predictable tool-call patterns. When an agent suddenly starts using parameters it had never used before, or chaining tools in a sequence not seen in training or prior runs, that is a signal that the description layer may have been compromised.
The paper introduces MCP-Poison-Bench, a dedicated evaluation framework with:
The benchmark covers both open-source agent frameworks and commercial agent platforms, demonstrating that the vulnerability is not implementation-specific — it is inherent to the MCP architecture when tool descriptions are trusted without verification.
MCP poisoning is not a theoretical vulnerability waiting to be discovered. The benchmark exists. The attack taxonomies are published. The supply-chain vectors are live in public MCP registries today.
The industry has spent years hardening model prompts against injection. It is time to spend equivalent effort hardening the protocol layer — starting with the principle that tool descriptions are not documentation. They are executable configuration, and they must be treated with the same integrity controls as any other executable artifact in the stack.
Full technical analysis and the benchmark data from ArXiv 2605.24069 are available at the link below.
https://cyberian-defenses.com/blog/mcp-poisoning-llm-agent-attacks