The way LLMs are designed and how they work opens up vulnerabilities and risks that traditional security tools were never built to handle.
The consequences are already showing up across enterprises: proprietary data exfiltration, unauthorized actions triggered by manipulated model outputs, and compliance gaps that existing security stacks can’t close. 29% of cybersecurity leaders said their organizations had experienced an attack on enterprise GenAI infrastructure.
That’s why the Open Worldwide Application Security Project (OWASP) built a dedicated Top 10 risk framework for LLMs. This article walks through all ten risks and outlines the defense patterns that move organizations from awareness to operational confidence.
Key Takeaways
- LLMs introduce security risks that traditional tools were not designed to address on their own, creating critical gaps in visibility, context, and runtime control.
- The OWASP top 10 risks for LLMs lists distinct failure modes, from prompt injection to supply chain vulnerabilities and unbounded consumption.
- Attackers can chain these risks together for greater impact. A prompt injection can trigger excessive agency, making isolated fixes insufficient.
- Effective defense requires layered, runtime-aware controls that operate across the AI ecosystem, combining visibility, bidirectional inspection of prompts and responses, policy-driven governance, and least-privilege enforcement for agentic systems.
Why Traditional Security Falls Short for LLMs
LLMs break the assumptions behind traditional application security. With a conventional web app vulnerability, your team can find the bug, push a patch, and confirm the fix. LLM vulnerabilities don’t work that way. They emerge from natural-language inputs that look exactly like normal usage, and from training data your team may never fully audit.
For security leaders, this creates a practical coverage gap:
- Traditional controls like WAFs, DLP, CASB, and SAST/DAST weren’t built to understand conversational intent. A WAF can block a known injection string, but it can’t evaluate whether a 200-word paragraph is trying to manipulate a model’s reasoning.
- The hardest risks emerge at runtime, when models process untrusted inputs, generate outputs, and trigger downstream actions. This is where visibility, intent understanding, and real-time policy enforcement become critical, forming the foundation of emerging AI runtime security and governance systems.
- Agentic workflows widen the gap further. When AI agents read files, call tools, query databases, or invoke APIs, the attack surface expands well beyond a single prompt-response exchange.
Stop Choosing Between AI Innovation and Security WitnessAI lets you observe, protect, and control your entire AI ecosystem without slowing down the business. Enterprise AI adoption, without the risk. See How It Works
You Can’t Secure What You Can’t See
WitnessAI gives you network-level visibility into every AI interaction across employees, models, apps, and agents. One platform. No blind spots.
Explore the PlatformThe Full OWASP Top 10 for LLMs
Each entry below represents a distinct failure mode, but attackers often chain them together: a prompt injection that triggers excessive agency, or a poisoned dataset that enables misinformation, amplifying the impact well beyond any single vulnerability.
LLM01: Prompt Injection
Prompt injection is an attack in which an adversary crafts input that an LLM interprets as a new instruction rather than content to process.
It exploits a fundamental design trait: LLMs handle instructions and data in the same channel, with no hard technical boundary between them, so the model follows the injected instruction because it can’t tell the difference.
Three variants define the attack surface:
- Direct injection targets the model through its primary input field, instructing it to override its original behavior or persona. The injected instruction is syntactically indistinguishable from a legitimate one, so the model complies.
- Indirect injection embeds malicious instructions in documents, emails, or web pages that the model reads as part of its normal workflow. The attacker never interacts directly with the application’s input layer, making detection significantly harder.
- Multimodal injection hides instructions in images processed alongside text, an emerging variant that standard text-based defenses don’t address.
Prompt injection cannot be fully eliminated through model-level fixes alone, because it exploits how LLMs process natural language. Mitigation requires runtime controls, guardrails, and policy enforcement layered around the model.
LLM02: Sensitive Information Disclosure
Sensitive information disclosure refers to an LLM exposing confidential data it should never reveal.
Two distinct sources define the risk:
- A model may reproduce fragments of sensitive content from its training data.
- Employees may submit confidential material to external AI providers without recognizing the exposure.
The primary controls are training data sanitization, output filtering for PII and proprietary content, and real-time tokenization that redacts sensitive data before it reaches any third-party model. All three depend on a more fundamental capability: visibility into which AI tools employees are using and what data those tools receive.
LLM03: Supply Chain
Supply chain risk arises because LLM deployments depend on complex chains of external components: foundation model providers, third-party datasets, fine-tuning services, inference APIs, and open-source model weights.
A compromised component anywhere in that chain can introduce vulnerabilities that persist invisibly into production. The model itself can’t be audited the way source code can, so these issues are harder to catch.
Practically, this means requiring documented origins for models before they enter the pipeline, tracking AI components as you would any software dependency, and monitoring production for anomalous behavior that pre-deployment review missed.
LLM04: Data and Model Poisoning
Data and model poisoning attacks manipulate the data used to train, fine-tune, or guide a model, introducing backdoors or biases that only surface under specific conditions.
The scope extends across pre-training datasets, fine-tuning data, RAG knowledge bases, and agentic pipelines — anywhere data touches the model.
Each stage has a different exposure profile:
- Pre-training poisoning is the hardest to detect because the influence gets baked directly into the model’s weights. Bad data from external or unverified sources becomes part of how the model thinks, with no straightforward way to extract it after the fact.
- Fine-tuning poisoning hits closer to home for enterprises customizing models for specific tasks. An attacker who can influence the external datasets used in fine-tuning can alter model behavior in targeted, hard-to-spot ways.
- RAG and embedding poisoning is the most immediate risk for most enterprise teams. An attacker can manipulate what the model retrieves and presents as grounded information, without ever touching the model itself. (LLM08 below covers the underlying vector store vulnerabilities in detail.)
LLM05: Improper Output Handling
Improper output handling turns the model’s responses into an attack vector when downstream systems accept them without validation.
Unvalidated outputs can trigger cross-site scripting, server-side request forgery, privilege escalation, and code execution in systems that treat model-generated content as trusted input.
The mitigation is straightforward: treat all LLM output as untrusted by default, apply encoding and sanitization before rendering in web interfaces, and validate responses against expected schemas before any downstream system acts on them.
LLM06: Excessive Agency
Excessive agency describes LLM-based systems that have more capabilities or permissions than their task requires, and then take damaging actions in response to manipulated outputs.
OWASP identifies three root causes:
- Excessive functionality
- Excessive permissions
- Excessive autonomy over high-impact actions
Agent actions can be irrevocable. An agent with write access to financial systems, code repositories, or communications platforms can execute unauthorized actions at machine speed before any human can intervene.
A successful prompt injection that reaches such an agent compounds the damage further, directing it to act using its own legitimate permissions in ways the operator never intended. Least-privilege scoping and independent authorization checks in downstream systems are the primary mitigations.
LLM07: System Prompt Leakage
System prompt leakage exposes the hidden instructions that define a model’s behavior, role, and constraints.
Extracted system prompts can reveal proprietary business logic, API credentials, internal architecture details, and security policies.
Leakage typically results from direct elicitation or from prompt injection that causes the model to surface prompt content in its responses. OWASP recommends against storing credentials or authorization logic in system prompts and favors external enforcement systems for behavior controls.
LLM08: Vector and Embedding Weaknesses
Vector and embedding weaknesses are vulnerabilities in the retrieval infrastructure underlying RAG systems, the layer that LLM04’s RAG poisoning exploits.
In practice, this plays out in three ways:
- Corpus poisoning: inject malicious content into a vector database so it gets retrieved during legitimate queries, redirecting model behavior at the retrieval layer before generation even begins.
- Embedding manipulation: degrade retrieval quality or introduce misleading similarity relationships, causing harmful or inaccurate content to surface in response to normal queries.
- Multi-tenant vector store leakage: when access controls on a shared embedding index are insufficient, one tenant’s queries can pull back another tenant’s sensitive data.
Access controls on embedding indexes should be as rigorous as those on the models they support.
LLM09: Misinformation
Misinformation is a model-level failure in which LLMs generate and propagate false information with apparent confidence.
Unlike simple hallucination, this is a systemic risk: confident, well-written outputs that are factually wrong can drive real-world decisions before anyone catches the error.
This matters most in financial analysis and legal and regulatory work, where incorrect outputs compound quickly. Output validation and human verification before AI-generated content drives any binding decision are the minimum controls.
LLM10: Unbounded Consumption
Unbounded consumption targets the operational resources behind LLM applications — compute capacity, API budgets, and model access controls — leading to denial-of-service attacks, runaway costs, or unauthorized model replication, especially in pay-per-use cloud environments.
Key attack vectors include Denial of Wallet (DoW), in which excessive queries drive up operational costs, and resource overload, where malicious inputs overwhelm compute capacity. Mitigation includes rate limiting, execution timeouts, usage monitoring, and budget thresholds for API consumption.
Runtime AI Threats Need Runtime Defense. WitnessAI’s enterprise AI firewall delivers bidirectional runtime defense, blocking prompt injections, jailbreaks, and data exfiltration before they reach your models or your customers. Explore Protect
Your Employees Are Already Using AI. Are You Governing It?
WitnessAI gives you full visibility into employee AI usage, classifies intent behind every interaction, and enforces smart policies, without slowing anyone down.
Learn About WitnessAI For EmployeesHow to Build a Stronger LLM Security Posture
Across all ten risks, the defensive pattern is the same: don’t ask the model to protect itself. Keep your control points outside the model, inspect what goes in and what comes out, and limit what every connected system is allowed to do.
That principle translates into four practical layers:
- Visibility and discovery. You can’t secure what you can’t see. Teams need to know which AI tools and models are in use across the organization, who is using them, and what data is flowing through them before addressing any individual risk.
- Input and output inspection. Bidirectional defense, inspecting both prompts and responses in real time, prevents prompt injections from reaching the model and stops harmful, sensitive, or hallucinated outputs from reaching users or downstream systems.
- Policy-driven governance. Context-aware policies let security teams define what AI interactions are acceptable without blocking legitimate work: what data can be shared with external models, what actions agents can take, and what outputs require human review.
- Least privilege and action controls for agentic systems. When models can call tools, query databases, or trigger workflows, pre-execution authorization and response validation become essential. An agent that goes wrong doesn’t wait for approval, so your controls can’t either.
No single tool covers every risk on this list. But the closer your enforcement sits to the point of interaction, the more of these risks you address simultaneously.
Blocking AI Isn’t a Strategy. Governing It Is. WitnessAI enforces intent-based policies, routes prompts to the right models, and redacts sensitive data in real time so your teams keep moving while your data stays protected. Explore Control
Is Your Customer-Facing AI Secure?
WitnessAI filters harmful and off-brand outputs before they reach users, tokenizes sensitive data before it reaches models, and hardens your defenses with automated red teaming.
See How Protect WorksStart Securing Your LLM Deployments
Understanding these ten risks is the necessary first step. But awareness alone doesn’t close the gap between knowing what can go wrong and having the controls in place to prevent it.
WitnessAI is a unified AI security and governance platform purpose-built for enterprises deploying LLMs and AI agents. It delivers all four layers — visibility, bidirectional inspection, policy enforcement, and agentic action controls — from a single point of interaction between employees, AI tools, and the systems those tools can act on.
It doesn’t replace your existing security stack; it closes the gap left by traditional tools when AI enters the workflow. If your team is evaluating how to secure LLM deployments or trying to get ahead of risks you’re already seeing, explore how WitnessAI can help you move from AI hesitation to AI confidence.