In late December 2025, a single operator pointed Claude Code at 10 Mexican government agencies and a financial institution, walked out with 150 gigabytes of sensitive data, and watched Claude flag a SCADA interface as a high-value target on its own, without ever being asked to look for OT systems. The model scoped the engagement, prioritized targets, and executed the intrusion end-to-end.
That’s the version of Claude your teams are now wiring into production. The same agentic features that compress a week of developer work into an afternoon also compress a months-long intrusion into a single weekend, and they do it with whatever credentials, tools, and data your environment hands over.
If you’re a CISO or risk leader evaluating Claude deployments, your teams are already using it. You need visibility into that usage and controls that keep pace with the model’s expanding capabilities.
This article examines Claude AI security risks across prompt injection attacks, agentic capabilities, Shadow AI, and regulatory exposure. It also outlines what is required for safe enterprise deployment.
Key takeaways
- Enterprise risk from Claude grows with the system’s reach: connected tools, internal data, user permissions, and business processes all broaden the potential impact beyond the model itself.
- The threat isn’t hypothetical. Published testing, disclosed vulnerabilities in Claude-related tooling, and reported attack activity show that Claude can be abused in real operating environments.
- Autonomous agents and unsanctioned Claude use creates blind spots for security teams, especially when actions happen with inherited credentials and outside approved monitoring channels.
- Reducing exposure requires enterprise-side guardrails around the model, including prompt and response inspection, sensitive data protections, control over agent actions, and auditable oversight aligned with major security and privacy frameworks.
What Claude AI security risks look like in the enterprise
Claude AI security risks in enterprise environments include the technical vulnerabilities, operational exposures, and compliance gaps that arise when organizations deploy or encounter Anthropic’s Claude models across their workforce and systems.
That includes documented real-world exploitation alongside expanding agentic functionality. Claude Code operates in developer terminals with the same permissions as the invoking user, and the Model Context Protocol (MCP) connects Claude to external tools, databases, and APIs.
Your Employees Use 5x More AI Tools Than You Think
WitnessAI scans your entire network to catalog every AI app, agent, and conversation. No endpoint clients or browser extensions are required.
See How Observe WorksDocumented vulnerabilities and real-world attacks against Claude
Claude’s security exposure is already documented in model behavior, surrounding tooling, and observed misuse. The pattern runs from model-level weaknesses through ecosystem vulnerabilities into confirmed exploitation by both nation-state and criminal actors.
Prompt injection failure rates
Prompt injection is the starting point because it targets a core weakness in how language models handle instructions. In Anthropic’s published testing, Claude Opus 4.6 showed low prompt-injection attack success rates under the evaluated conditions, with some benchmarks reporting 0.0% even across 200 attempts with safeguards active.
The constraint remains. The UK National Cyber Security Center has noted that LLMs don’t always clearly distinguish between data and instructions, making prompt injection very difficult to completely mitigate through model training alone.
Critical CVEs across Claude’s ecosystem
That model-level constraint matters even more once Claude is connected to tools. CVE-2025-49596 (CVSS 9.4) exposed a remote code execution path in Anthropic’s MCP Inspector. A zero-click RCE vulnerability (CVSS 10.0) in Claude Desktop Extensions could be triggered through a Google Calendar event when a user delegated calendar management. The issue stemmed from MCP-based systems chaining tools without enforcing security boundaries.
Confirmed nation-state and criminal exploitation
Those weaknesses are no longer theoretical. In mid-September 2025, Anthropic said a Chinese state-sponsored group used Claude Code to attempt multi-stage attacks against roughly 30 global targets, with AI performing 80% to 90% of the campaign and human operators making only 4 to 6 critical decisions per hacking campaign.
On the criminal side, Anthropic’s August 2025 threat report disclosed a Claude Code extortion campaign targeting at least 17 organizations across healthcare, emergency services, and government, with ransom demands exceeding $500,000.
Claude Code automated reconnaissance, harvested credentials, and penetrated networks, while Claude itself decided which data to exfiltrate, analyzed stolen financial records to set ransom amounts, and generated the extortion notes displayed on victim machines.
Runtime AI Threats Need Runtime Defense.
WitnessAI’s enterprise AI firewall delivers bidirectional runtime defense, blocking prompt injections, jailbreaks, and data exfiltration before they reach your models or your customers.
Explore ProtectWhy agentic capabilities and shadow AI amplify every Claude AI security risk
Those documented issues become harder to manage when Claude is connected to enterprise tools or used outside approved channels. Agents inherit enterprise credentials and act autonomously, while shadow AI usage leaves much of that activity outside security visibility.
Agentic AI turns Claude into an operational system
Connected tools are what turn Claude from a chatbot into an operational system. With a chatbot, the main path runs from user input to model output. Agentic systems expand that path across memory stores, connected tools, and the external environment. Adversaries can target all of those components, using attack vectors that operate across system components rather than along a single path.
That broader attack surface runs heavily through tool access. MCP is the primary vector by which Claude gains access to tools. The Cloud Security Alliance characterizes MCP’s risks as required by default, not incidental to poor implementation.
Snyk’s ToxicSkills study found 36.82% of agent skills contained at least one security flaw. Trail of Bits researchers demonstrated that multi-agent systems create opportunities for privilege escalation when high-privilege agents trust unvalidated agent outputs.
Shadow Claude usage creates the governance gap
Shadow Claude use is a meaningful blind spot for many security teams, and it sits outside the tool-access risks above. While security teams focus on sanctioned deployments, unauthorized AI use remains widespread in the enterprise.
IBM’s 2025 Cost of Data Breach Report found that organizations with high levels of Shadow AI incurred breach costs approximately $670,000 higher. Claude Pro subscriptions can enable unmonitored autonomous agents with enterprise credentials to operate outside SIEM, DLP, and traditional monitoring tools.
You Can’t Secure What You Can’t See
WitnessAI gives you network-level visibility into every AI interaction across employees, models, apps, and agents. One platform. No blind spots.
Explore the PlatformBuilding runtime AI risk management for Claude deployments
Anthropic’s framework splits responsibility across four layers: model, harness, tools, and environment. Anthropic is responsible for the security of the model and platform, while customers are responsible for securing their own use and applications. The deploying organization owns the other three: system prompts, governance gaps, supply chain compromise, and overly permissive access, as well as sensitive data exposure.
Because you own the harness, tool, and environment layers, you need controls that manage Claude safely at runtime. A runtime defense layer can operate outside the model’s processing loop, applying controls at the network level before prompts reach models and before responses reach users or downstream systems.
The sections below cover the two pieces that runtime defense has to hold together: alignment with the major security frameworks your auditors already reference, and the compliance obligations that fall on you the moment Claude touches regulated data. From there, we break down the five control areas this all maps to in practice.
Framework alignment starts with NIST and OWASP
NIST and OWASP both place AI security controls outside the model itself, which sets the direction for any Claude deployment. NIST AI 100-2e2025 directs application designers to “design systems with the assumption that prompt injection attacks are possible if a model is exposed to untrusted input sources.”
OWASP’s LLM Top 10 recommends human-in-the-loop checkpoints for high-stakes agent actions and emphasizes that authorization should be enforced in downstream systems rather than by the LLM itself. The preliminary draft of NIST IR 8596 states as a general consideration that a human is assigned responsibility for the actions of an AI system.
Compliance gaps require enterprise-side controls
The same model-external approach applies to compliance, and it spans the EU AI Act, GDPR, and HIPAA at once.
Under the EU AI Act, enterprises deploying Claude for high-risk use cases such as HR screening and credit assessment face conformity-assessment and related compliance obligations, although the full obligations for high-risk AI systems apply from August 2, 2026.
Under the GDPR, a Data Protection Impact Assessment is required for processing that is likely to pose a high risk to individuals’ rights and freedoms. This may apply to enterprise AI deployments depending on the specific use case and processing involved.
For HIPAA-covered entities, Claude Enterprise requires explicit administrator activation for BAA coverage, and you must verify which Claude features are in scope before processing PHI.
What enterprise controls must cover
Claude’s shared responsibility model leaves five control areas with the enterprise:
- Input security: Organizations should implement defenses against prompt injection before models process untrusted instructions.
- Data protection: Runtime tokenization can protect sensitive information such as PII, credentials, and source code before it reaches external AI systems. This preserves workflows while reducing leakage risk.
- Runtime defense: Bidirectional defense should inspect both prompts and responses. Risk can enter or leave the system.
- Agentic controls: Pre-execution protection, identity attribution tying every agent action to a human, MCP visibility, and tool authorization policies help govern actions across both agents and conversations.
- Audit and compliance: Immutable audit trails of every interaction with full attribution support investigations, regulatory reporting, and board oversight.
Those controls cover the enterprise-owned harness, tool, and environment layers.
WitnessAI addresses these requirements through three core modules. Observe discovers AI applications, agents, and MCP servers across the network. Control enforces policies using intent-based classification, with ML models that analyze conversations and context rather than keywords, and four enforcement actions (allow, warn, block, route) instead of legacy binary allow/block decisions.
Protect provides bidirectional runtime defense, applying controls to prompts before models process them and filtering responses before delivery. WitnessAI applies real-time data redaction and tokenization as part of its protection features.
The platform catalogs 4,000+ AI applications, secures more than 250,000 employees, operates across 40+ countries, and delivers consistent protection across 100+ LLM types. It operates without endpoint clients or browser extensions, covering AI traffic across native applications, embedded copilots, IDEs, and agent API calls.
For Claude Code and MCP deployments specifically, this provides visibility into agent tool activity and MCP server connections that many traditional security tools were not originally designed to govern. InComm Payments’ CISO described the result: “We’re reducing risk while maximizing our productivity because of WitnessAI.”
AI Compliance Doesn’t Have to Slow You Down.
WitnessAI gives compliance teams pre-built controls, automated data classification, and complete audit trails so you can adopt AI confidently in even the most regulated environments.
Learn About WitnessAI For ComplianceSecuring Claude with the right controls in place
Claude AI security risks are well-documented and grow alongside the model’s expanding capabilities. The pattern is a clear progression, from model-level jailbreaks to agents that can identify operational technology targets on their own.
If you’re moving Claude from pilot to production, many organizations implement a runtime defense layer consisting of network-level visibility, intent-based policies, bidirectional defense, and audit trails to address responsibilities that fall outside the model provider.
WitnessAI provides security and AI teams with a single platform for visibility, governance, policy enforcement, and runtime guardrails across human and digital workforces. Book a demo to see how the platform applies to your Claude deployment.