The National Security Agency just published comprehensive guidance on protecting data used in artificial intelligence systems. This document arrives at a critical time as organizations adopt AI while struggling to secure the massive datasets and AI applications that power these systems.
The guidance focuses on three major risk areas: compromised data supply chains, intentionally corrupted training data, and gradual data drift that degrades model performance over time. Each risk requires different protection strategies and continuous vigilance.
The Data Supply Chain Problem
Most organizations don’t create all their training data from scratch. They rely on third-party datasets, pre-trained models, and web-scraped information. This creates multiple points where attackers can insert malicious content.
The NSA discovered that attackers can poison popular datasets for as little as $60 by purchasing expired domains referenced in the data¹. When AI systems train on this poisoned data, they learn incorrect or dangerous behaviors that persist into production.
Discovering and Cataloging AI Usage
The NSA emphasizes that organizations must maintain comprehensive visibility into all AI systems and data flows. This starts with discovering every AI application in use across the enterprise. Many organizations find they’re using far more AI tools than they realized, often without proper oversight or security controls.
There is a need to deploy modern AI security platforms that can automatically detect and catalog AI applications across your network, including those embedded in other software or accessed through native desktop applications. This discovery process reveals not just which tools exist, but how employees actually use them and what data they share. Without this foundational visibility, organizations cannot implement any meaningful security controls.
The guidance specifically calls for maintaining an AI catalog that documents all models, applications, and data sources. This inventory becomes the foundation for risk assessments, compliance reporting, and security policy enforcement. Organizations that skip this step often find themselves playing catch-up after a security incident reveals unknown AI usage.
Protecting Sensitive Data in Real Time
One of the NSA’s core recommendations involves preventing sensitive data from reaching AI systems in the first place. The guidance extensively covers data masking, encryption, and access controls as essential protective measures.
Organizations should look for AI security platforms that automatically detect and protect sensitive information before it reaches AI models. This happens through real-time tokenization, where sensitive data gets replaced with non-sensitive placeholders while maintaining the data’s utility for AI processing. For example, when an employee pastes customer records into an AI tool, the system should automatically mask social security numbers, credit card details, and other sensitive information.
Implementing Comprehensive Audit Trails
The guidance makes clear that organizations must maintain detailed records of all AI data interactions. This goes beyond simple logging to include comprehensive audit trails showing who accessed what data, when they accessed it, and how AI systems processed that information.
These audit trails serve multiple purposes. They support compliance with regulations like GDPR and CCPA, enable forensic investigations after security incidents, and help organizations understand their AI usage patterns. The NSA emphasizes that these logs must be tamper-proof and cryptographically signed to maintain their integrity.
Organizations should look for tools that integrate with existing security information and event management (SIEM) systems to provide unified visibility across traditional IT and AI-specific activities. This integration allows security teams to correlate AI-related events with other security data, improving threat detection and response capabilities.
Managing Access Through Context-Aware Controls
The NSA guidance strongly emphasizes role-based access controls and the principle of least privilege. However, traditional access controls often fail in AI environments where the same user might need different permissions based on their current task.
AI security platforms should offer context-aware access controls that adapt based on what users are trying to accomplish. For instance, a financial analyst might freely use AI for market research but face restrictions when attempting to process customer payment data. This nuanced approach maintains security without creating productivity bottlenecks that drive users to bypass controls entirely.
The guidance also highlights the importance of dynamic data classification. As data flows through AI systems, its sensitivity level might change based on aggregation, inference, or combination with other information. Security platforms must continuously reassess data classification and adjust protections accordingly.
Detecting and Preventing Data Tampering
The NSA identifies several sophisticated attack vectors targeting AI training data. Adversarial actors might inject subtle changes that cause models to learn incorrect behaviors, attempt to extract training data through model inversion attacks, or poison datasets to compromise future models.
Organizations need anomaly detection capabilities that identify suspicious patterns in data before it affects AI systems. This includes statistical analysis to spot outliers, validation against known-good baselines, and continuous monitoring for unexpected changes. The guidance recommends ensemble methods that combine multiple detection techniques for more robust protection.
When anomalies are detected, platforms should provide immediate alerts and automated responses. This might include quarantining suspicious data, rolling back to previous versions, or temporarily blocking specific AI interactions while security teams investigate. The key is catching these issues before they propagate through your AI systems.
Building Integration with Existing Security Infrastructure
The NSA emphasizes that AI security shouldn’t exist in isolation. Organizations have invested heavily in security infrastructure, and AI protection should enhance these investments rather than replace them.
Look for platforms that integrate seamlessly with existing tools. This includes proxy servers and secure web gateways from vendors like Zscaler and Palo Alto Networks, identity management systems like Microsoft Entra, and SIEM platforms for centralized monitoring. These integrations enable organizations to extend their current security policies to AI usage without learning entirely new systems.
The guidance also stresses the importance of maintaining vendor independence. Organizations should avoid lock-in to specific AI providers and maintain the flexibility to switch between models and platforms as their needs evolve. Security platforms that work across multiple AI providers support this flexibility while maintaining consistent protection.
Looking Forward: Continuous Improvement
The NSA’s guidance provides a comprehensive framework, but AI threats evolve. Organizations must treat AI data security as an ongoing process rather than a one-time implementation.
Most importantly, organizations need solutions that balance security with usability. The most sophisticated protections become worthless if users bypass them due to complexity or productivity impacts. The best platforms provide strong security through elegant design that enhances rather than hinders AI adoption. Look for platforms that allow you to enable enterprise AI, safely.
Want to learn how WitnessAI can help ? Schedule a demo with an AI security expert today.
References
- NSA et al. (2025). “AI Data Security: Best Practices for Securing Data Used to Train & Operate AI Systems.” National Security Agency. Link
- Carlini, N. et al. (2023). “Poisoning Web-Scale Training Datasets is Practical.” arXiv:2302.10149. Link
NIST. (2024). “NIST AI 100-1: Artificial Intelligence Risk Management Framework.” Link