In recent years, the principle of safety by design has taken root in cybersecurity. Rather than writing code with vulnerabilities and then searching for them, either by hand or through automated mitigation solutions, it is much better to use languages which simply lack the vocabulary to define those bugs in the first place. A large majority of existing vulnerabilities can be eliminated simply by switching to more modern languages like Rust, as opposed to various decades-old C variants.
Just as these changes gain momentum in the traditional software world, however, the concurrent rise of AI is thoroughly undermining that principle. LLMs with sufficient permissions can inherently produce undefined behavior. In some cases, these vulnerabilities can be solved via prompt context, verification by other LLMs, or perhaps hard-coded responses to specific failure cases. While sometimes these systems might be considered safe enough for practical use, due to the architecture of transformers, their failure modes can never be fully modeled. LLM safety means safety by statistics, rather than by design.
Although these dangers were already well known, they were demonstrated for the first time in June with the disclosure of EchoLeak. For some background, Microsoft’s rollout of Copilot was controversial from the beginning due to its broad and not entirely transparent default permissions, which most notably included periodic searchable screenshots on Copilot+ PCs. Among many other features, it also automatically scans users’ email inboxes, allowing it to summarize long threads.
Thus, EchoLeak requires the attacker only to send an email to the target, who does not even need to open it or click on anything – making it a “zero-click” vulnerability, one of the highest forms of the art. The email contains malicious instructions, weaponing the LLM against itself to help the attackers search for the most sensitive information in its context window before it exfiltrates it, bypassing several of these non-deterministic filters in the process. The bug was apparently never used in the wild, and the specific vulnerability has since been patched, however the implications for similar attacks are far-reaching.
EchoLeak makes use of prompt injection, in which a user convinces a model to ignore the instructions provided by the system. The well-known instruction to “ignore all previous instructions” is the simplest example of this class of attack. The difficulty in defending against prompt injections is that no input sanitation currently exists for LLM inputs. Unlike databases, which can take measures to ensure that the engine reads inputs like “drop table” as a true input rather than a control command, LLMs analyze all text on an equal basis, potentially leading to confusion within the model, which can be exploited by an attacker.
In fact, comprehensive solutions may be developed for prompt sanitation relatively soon. A January paper by several German researchers translates hard system prompts into what could be called a secret language. AI systems in general compress massive amounts of data, leading to collisions in their embedding of features, so text which appears as gibberish to a human can carry meaning for AI. Because we cannot trace the individual paths of all neurons within the brain simultaneously, human intelligence lacks this unintended feature, akin to seeing a specific screen of static and receiving an intended meaning, but it has been documented in diverse forms of neural nets.
With a solution like this one, a model would not confuse a malicious user instruction it receives in English with the hard prompt, which from its perspective might manifest as something more like a telepathic signal. It is still debatable whether a solution along these lines could ever be called “security by design,” considering that it is still based on a black-box LLM, but it does move systems incrementally closer toward that ideal.
Supposing that solution was perfected and expanded, space for attacks like EchoLeak would be narrowed considerably, yet that specific bug just marks a first attempt. The entire universe of LLM vulnerabilities is still an unbounded space. Prompt Injection is just one of the 2025 OWASP top 10 for LLMs, a list put together by the Open Web Application Security Project to categorize various classes of vulnerabilities across a wide range of security topics. It is probably the most important attack method on that list, but there are many others. Excessive Agency is likely to become a large and multifaceted problem over the next couple of years with the growth of agentic AI.
Going forward, the integration of AI systems will look less like cybersecurity, and more like an employee onboarding process. It will be prudent not to put them in charge of sensitive systems on their first day, but impossible to avoid trusting them at all, without missing out on the greatest technological revolution at least since the advent of the internet.