Guarding against AI’s internal threats

The rapid proliferation of Large Language Models (LLMs) across industries is evident, with many companies developing proprietary LLMs for Generative AI applications.

recent survey by Gartner, Inc. revealed that 55% of corporations are currently testing or deploying LLM projects, a figure projected to grow swiftly. However, organisations should exercise caution and carefully evaluate the associated risks before hastily adopting this technology.

While the unfolding technological revolution is undoubtedly exciting, it’s crucial to address identity security concerns and establish new frameworks. One key principle I propose is: Consider your LLM a potential security threat at all times.

And here’s why:

The intrinsic risks of Large Language Models

Despite the surge in LLM research, with over 3,000 papers published in the last year alone, a consensus on secure development and seamless integration of LLMs into existing systems remains elusive.

LLMs can be easily manipulated to produce inaccurate outputs with minor prompt alterations. Beyond unreliability, they can introduce significant identity security vulnerabilities to the systems they’re integrated with.

These vulnerabilities manifest in various ways. Primarily, in their current form, LLMs are susceptible to “jailbreaking” – where attackers can manipulate them to behave in unintended or harmful ways. A recent study by EPFL researchers demonstrated a near 100% success rate in jailbreaking leading models using a combination of known techniques. This represents just the beginning, as new attack methods and jailbreaking strategies continue to emerge in monthly publications.

The consequences of LLM jailbreaking vary in severity based on context. In milder cases, compromised LLMs might provide instructions for illicit activities against their intended policies. While undesirable, Simon Willison characterises this as a “screenshot attack” – the model’s misbehaviour has limited impact, either publicised or potentially misused, but the information is already available online.

The stakes increase dramatically with more capable LLMs that can execute database queries, make external API calls, or access networked machines. In such scenarios, manipulating LLM behaviour could allow attackers to use the model as a springboard for malicious activities.

paper presented at BlackHat Asia this year highlights this risk: 31% of examined code bases contained remote code execution (RCE) vulnerabilities introduced by LLMs. This means attackers could potentially execute arbitrary code using natural language inputs alone.

Considering LLMs’ vulnerability to manipulation and their potential to compromise their operating environment, it’s crucial to adopt an “assume breach” approach when designing system architecture. This mindset involves treating the LLM as if it’s already compromised by an attacker and implementing protective identity security measures accordingly.

Minimising the risks

It is crucial to first cultivate an understanding that Large Language Models (LLMs) integrated into our systems cannot be inherently trusted. We must then leverage our traditional identity security expertise alongside our experience in organisational LLM integration to follow general guidelines that minimise risks associated with LLM deployments.

Firstly, never use an LLM as a security boundary. Only provide the LLM with capabilities you intend it to use. Do not rely on alignment or system prompts to enforce security measures. Adhere to the principle of least privilege. Additionally, grant the LLM only the minimum access required to perform its designated task. Any additional access could potentially be exploited by attackers to infiltrate a company’s technological infrastructure.

Approach the LLM as you would any employee or end user. Restrict its actions to only those essential for completing the assigned job. It is also important to implement thorough output sanitisation. Before utilising any LLM-generated output, ensure its validation or sanitisation. This includes removing potential XSS payloads in the form of HTML tags or markdown syntax. Make sure you also sanitise training data to prevent inadvertent leakage of sensitive information by attackers.

If code execution by the LLM is necessary, employ sandboxing techniques. This limits the LLM’s access to specific system resources, thereby mitigating the risk of errors or malware affecting the broader system in the event of a cyberattack. By following these guidelines, organisations can better protect themselves while harnessing the power of LLMs in their operations.

Neutralise from within

Large Language Models (LLMs) undoubtedly present remarkable capabilities and opportunities, but we must not underestimate their vulnerability to manipulation. It’s vital to design systems with the assumption that LLMs could be compromised, treating them as potential identity security threats. The key takeaway is to approach LLMs with the same caution and strategic thinking you’d apply to a potential cyberattacker. By adopting this new paradigm, you can navigate the integration of LLMs into your systems more safely, avoiding many of the associated security risks. This mindset is essential for harnessing the power of LLMs while maintaining robust system security.


About the Author

Shaked Reiner is Principal Cyber Researcher at CyberArk Labs. CyberArk is the global leader in Identity Security. Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – human or machine – across business applications, distributed workforces, hybrid cloud workloads and throughout the DevOps lifecycle. The world’s leading organizations trust CyberArk to help secure their most critical assets. For over a decade CyberArk has led the market in securing enterprises against cyber attacks that take cover behind insider privileges and attack critical enterprise assets. Today, only CyberArk is delivering a new category of targeted security solutions that help leaders stop reacting to cyber threats and get ahead of them, preventing attack escalation before irreparable business harm is done.

Featured image: Adobe Stock

more insights