Connect light

Training Your LLM Dragons—Why DSPM is the Key to AI Security

Share with your network!

The transformative potential of AI comes at a price. Because it’s complex and relies on sensitive data, it’s a prime target for bad actors. Notably, two AI implementations—custom large language models (LLMs) and tools like Microsoft Copilot—pose unique challenges for most organizations. 

Custom LLMs often need to be trained extensively on an organization’s data. This creates risks that the data will be embedded into models. And Microsoft Copilot integrates with enterprise applications and processes. So, if it’s not governed properly, then personal, financial and proprietary data might get exposed.  

To prevent data from being exposed and ensure compliance, organizations need to take a robust approach to security when it comes to their AI implementations. What follows are some tips for securing LLMs and AI tools like Copilot as well as details about how data security posture management (DSPM) can help.  

What is DSPM—and why is it critical for AI implementations? 

Data security posture management (DSPM) is both a strategy and set of tools. Its role is to discover, classify and monitor valuable and sensitive data as well as user access. It does this across an organization’s cloud and on-premises environments.  

For AI implementations like custom LLMs and Microsoft Copilot, DSPM is crucial for ensuring that sensitive or regulated data is properly governed. This reduces the risk of data leaking or being misused. 

Here are some key threats to AI implementations: 

  • Prompt injection attacks. Crafty prompts can trick models into indirectly disclosing sensitive data. This enables bad actors to bypass traditional security measures. 
  • Training data poisoning. Threat actors can embed sensitive or biased data into training sets. This can lead to unethical or insecure model outputs. 
  • Data leakage in outputs. Poorly configured models may inadvertently expose private data during user interactions or as part of their outputs. 
  • Compliance failures. AI systems that mishandle regulated data risk steep fines under laws like GDPR, CCPA or HIPAA. When this happens, customer trust is lost. 

Use case 1: securing custom LLMs 

Custom LLMs allow organizations to fine-tune AI models to meet their specific business needs. However, they also create significant risks. Sensitive data can enter the model during training or through other interactions, which can lead to data being disclosed inadvertently.  

Custom LLMs can introduce these risks: 

  • Sensitive data being embedded in models during training  
  • Inadvertent data leakage in model outputs 
  • Compliance failures if regulated data, like personally identifiable information (PII), is mishandled 
  • Security vulnerabilities that lead to training data poisoning or prompt injection attacks 

These risks highlight why it’s so important to audit training data, monitor data flows and enforce strict access controls. 

Tips for securing custom LLMs 

  • Audit and sanitize training data 
    • Regularly review data sets. Look for sensitive or regulated data before using that data in training. 
    • Anonymize data with masking or encryption techniques. This will help to protect PII and other critical data. 
  • Monitor data lineage 
    • Use tools like Proofpoint to map how data flows from ingestion to model training and outputs. 
    • Ensure traceability to maintain compliance and quickly address vulnerabilities. 
  • Set strict access controls 
    • Enforce role-based permissions for data scientists and engineers who are interacting with training data sets. 
    • Limit access to sensitive data sets to only those who absolutely need it. 
  • Proactively monitor outputs 
    • Analyze model responses to ensure that they don’t reveal sensitive data. This is particularly important after updates or retraining cycles. 

How Proofpoint helps 

The Proofpoint DSPM solution can automatically discover sensitive data across cloud environments and classify it. This gives you comprehensive visibility into both structured and unstructured data sources.  

Proofpoint provides a complete lineage view. It illustrates how sensitive data flows through various stages, including where it comes from, how it’s connected to data sets, where it’s involved in training pipelines, and where it’s integrated into custom AI models. This detailed view enables you to trace the movement of sensitive data, stay compliant with regulations like GDPR and CCPA, and build trust with your users.  

Plus, Proofpoint proactively notifies you if sensitive data is being used inappropriately—whether it’s being used in training data, model responses or user interactions. As a result, potential risks can be addressed immediately.  

Use case 2: mitigating risks in Microsoft Copilot 

Microsoft Copilot delivers responses that are accurate and contextually relevant. It does this through a process called grounding. By accessing Microsoft Graph and the semantic index, grounding pulls context from across your applications to generate more specific and tailored prompts for its LLM. While this improves the quality of responses, it also increases the chances that data will leak or be misused.  

Copilot implementations introduce these risks: 

  • Data leakage if sensitive files or emails are improperly governed 
  • Misuse of confidential data if role-based access controls are inadequate 
  • Exposure of regulated data if sensitivity labels are not consistently applied 

Tips for securing Copilot implementations 

  • Enforce sensitivity labels 
    • Map sensitive data to Microsoft Information Protection (MIP) labels to ensure that access is properly restricted. 
    • Assign labels consistently across files and applications to govern the data that Copilot can access. 
  • Curate approved data sources 
    • Consider using a curated set of approved SharePoint sites or data sets for Copilot to minimize exposing data that’s not vetted. 
    • Ensure all the data sets that are included are sanitized for sensitive or regulated content. 
  • Monitor prompt behavior and outputs 
    • Log and analyze prompts to identify unusual or malicious behavior. 
    • Use tools to monitor Copilot’s outputs and flag sensitive data in real time. 
  • Limit access by role 
    • Configure Copilot’s access so that it’s based on user roles to ensure employees only see the data that’s relevant to their responsibilities. 

How Proofpoint helps 

Proofpoint DSPM integrates seamlessly with Microsoft MIP labels. This means that Proofpoint can map discovered data classes to existing sensitivity labels, which enhances how sensitive data is classified and governed. It also ensures that access controls and compliance requirements are consistently enforced across environments.  

Proofpoint identifies potential risks that are tied to sensitive outputs, like data that has been surfaced through Copilot interactions. By analyzing sensitive data flows and monitoring outputs, Proofpoint can detect and alert teams when an attempt to access it is not authorized—even when it stems from a sophisticated scenario like an unauthorized prompt.  

Proofpoint enables you to take a proactive approach to securing data. As a result, you can maintain robust data governance across all your AI-driven tools.   

Tips for building a secure AI framework 

Regardless of the use case, a proactive and layered approach is essential to securing AI infrastructure. Here’s a summary of the five steps that organizations should take: 

  1. Discover and classify sensitive data. Use automated tools to identify PII, intellectual property and regulated data across your cloud and on-premises environments. 
  2. Ensure data lineage visibility. Track how sensitive data moves through your AI workflows, from ingestion to model training and beyond. 
  3. Establish role-based access controls. Limit access to sensitive data and ensure that permissions align with your employees’ responsibilities. 
  4. Audit and anonymize data. Sanitize training data sets and ensure that outputs don’t disclose sensitive data. 
  5. Continuously monitor interactions. Track user inputs, model prompts and outputs to identify and mitigate risks as they arise. 

Proofpoint helps mitigate AI security risks 

AI is a transformative tool. However, because it relies on sensitive data it creates unique challenges for security teams. By adopting a structured approach to securing AI infrastructure, you can unlock the potential of custom LLMs and tools like Copilot. And you can do so without compromising your data integrity, violating compliance rules or losing customer trust. 

Proofpoint DSPM helps organizations secure their AI infrastructure by: 

  • Automatically discovering and classifying sensitive data across cloud and on-premises environments 
  • Mapping data lineage so that you can see how data flows into and out of AI systems 
  • Integrating with tools like Microsoft MIP labels for enhanced data governance 
  • Proactively identifying risks and notifying teams of unauthorized access or sensitive data usage 

For a deeper dive into these strategies—and to see live demos showing how Proofpoint can help—watch the full webinar: "Training Your LLM Dragons: Why DSPM is Foundational for Every AI Initiative."