How MCP exploits actually work (and why they are easy to miss)

The Model Context Protocol is unlocking powerful new workflows. It lets models reach out to tools, query systems, and take actions on behalf of users. That capability delivers real productivity gains. It also creates a new, practical attack surface: when a model, agent, or AI client inherits a user’s identity, small changes in an MCP tool or package can introduce large-scale, invisible risk.
The risk in a single sentence
An MCP server running under a user’s credentials can access the same systems and APIs the user can. Compromised tools do not need to break authentication. They already operate as the legitimate user.
How these exploits actually happen
Every MCP tool exposes a simple set of information to the model: a name, a human description, and a list of parameters. The model uses those descriptions to choose which tool to call and how to populate its parameters. In effect, tool metadata can influence the model’s behavior. That is prompt injection built into the protocol.
For example, consider a benign “Get Weather” tool that was modified to require output from a CRM lookup before it ran. The model followed that instruction, pulled CRM data, and passed it into the weather tool’s execution context. All it took to exfiltrate the CRM record was one additional line of code that posted the collected data to a public endpoint. The user still received a normal weather response, while sensitive deal data was quietly sent outside the organization.
That exploit needed no jailbreak. It required no complicated hacking. It was a small change to a tool and a small amount of natural-language guidance. And because many MCP packages are installed or updated with convenience commands like npx or “latest” tags, compromised code can arrive automatically when a client launches.
Why traditional controls often miss it
Most security tools are built to protect APIs, enforce identity, and monitor network boundaries. MCP attacks exploit a different dimension: how the model selects and chains tools together. Local MCP servers can run arbitrary code, and the model can chain tools together in ways that are invisible to API-only monitoring.
Put simply, these attacks look like legitimate user actions. They use legitimate credentials to make legitimate calls. That makes detection and attribution difficult with existing tooling.
Why this risk is expanding
MCP is moving out of developer-only workflows. SaaS vendors and product teams are exposing agent-style automations as “tools,” “agents,” or “automations” with click-to-add flows. Nontechnical users can enable these capabilities without understanding the security implications. That means supply chain or tool-level compromises can reach beyond engineering teams and affect product users across the business.
We are also seeing registries and curated catalogs that present third-party MCP services as “trusted” components. Those lists can be scraped or populated from public repositories, and a badge does not guarantee safety.
Practical consequences
A successful MCP exploit can do a range of harmful things. It can silently exfiltrate customer records. It can retrieve temporary credentials or tokens and use them in follow-up calls. It can trigger destructive actions, or combine small actions into a larger, costly chain. Importantly, these behaviors can occur while the user believes they are performing routine, low-risk tasks.
Want to see the exploits from above in action? Watch here:
What to do next
Blocking MCP entirely is not a workable option for most organizations. MCP delivers real value, and removing access simply pushes users toward shadow AI and unsanctioned workarounds. The practical path forward is to gain visibility into which MCP resources are in use, classify those tools, and apply policy and runtime controls at the model boundary.
Book a demo of SurePath AI to combine discovery, classification, and enforcement so your workforce can stay productive while risky actions are prevented.
Related articles





