Skip to main content
Mallory
Mallory

OpenAI's Ongoing Defense Against Prompt Injection Attacks in ChatGPT Atlas

ChatGPT AtlasOpenAIAI-driven securityprompt injectionproactive defenseattack vectorautomated testingattack surfaceevolving threatsmalicious instructionsagentic browsersecurity updatebrowser agentsred teaminginternal testing
Updated December 25, 2025 at 02:00 AM3 sources

Get Ahead of Threats Like This

Know if you're exposed — before adversaries strike.

OpenAI has implemented an automated attacker system to proactively test and strengthen the security of ChatGPT Atlas, its agentic web browser, against prompt injection attacks. These attacks involve embedding malicious instructions into content that the AI agent processes, potentially causing it to act against the user's interests. The company acknowledges that the very features making agentic browsers powerful also introduce persistent vulnerabilities, and that complete protection from prompt injection is unlikely. OpenAI's approach leverages AI-driven red teaming to rapidly identify and address new attack vectors, aiming to stay ahead of evolving threats.

A recent security update to Atlas was prompted by the internal discovery of a new class of prompt injection attacks using this automated red-teaming system. The attack surface for browser agents is broad, as they can interact with untrusted content from emails, documents, social media, and web pages, increasing the risk of harmful actions such as forwarding sensitive information or altering cloud files. OpenAI emphasizes that defending against prompt injection will be a continuous effort, likening it to an arms race similar to combating online scams, and stresses the importance of a rapid response loop to reduce real-world risks over time.

Related Entities

Organizations

Related Stories

Prompt Poaching and Injection Threats in AI Browser Extensions and Agents

Prompt Poaching and Injection Threats in AI Browser Extensions and Agents

Browser extensions, particularly those from web analytics companies like Similarweb, have been found to engage in 'prompt poaching' by capturing and exfiltrating user conversations with AI chat platforms. The Similarweb extension, installed by over a million users, was discovered to collect not only clickstream data but also sensitive AI prompts and responses, significantly escalating privacy risks. This data collection is often enabled through remote configuration updates that allow the extension to scrape targeted web pages and monitor user interactions with AI tools, raising concerns about the exploitation of browser extensions as a vector for harvesting private information. In parallel, OpenAI has responded to the growing threat of prompt injection attacks against its ChatGPT Atlas browser agent by deploying new model-level and system-level defenses. Prompt injection attacks involve embedding malicious instructions in web content to manipulate AI agents into performing unintended actions, such as exfiltrating sensitive data. OpenAI's update includes automated red-teaming using reinforcement learning to proactively identify and mitigate sophisticated prompt injection techniques, highlighting the evolving security landscape for AI-powered browser tools and the need for robust defenses against both extension-based data harvesting and adversarial prompt manipulation.

2 months ago

AI Prompt Injection and Data Leakage Vulnerabilities in OpenAI's ChatGPT and Atlas Browser

Tenable Research has identified seven novel vulnerabilities and attack techniques in OpenAI's ChatGPT, including indirect prompt injections, exfiltration of user data, and bypasses of safety mechanisms in the latest GPT-5 model. These vulnerabilities allow attackers to manipulate the large language model (LLM) through crafted inputs, potentially leading to the theft of private information from user memories and chat histories, even when users simply interact with ChatGPT. The research highlights that hundreds of millions of users could be at risk, as attackers can exploit these weaknesses to bypass safeguards and extract sensitive data without user awareness. The release of OpenAI's ChatGPT Atlas, an AI-powered browser that remembers user activities and acts autonomously, further amplifies these concerns. Security experts warn that features such as persistent memory and autonomous actions increase the attack surface, making the browser susceptible to prompt injection and other AI-specific vulnerabilities. The implications for enterprise security and privacy are significant, as these AI-driven tools become more integrated into business processes, necessitating new approaches to identity management, access controls, and oversight to mitigate the risks posed by advanced AI-enabled attacks.

4 months ago

Prompt Injection and Browser-Based AI Security Risks

The launch of ChatGPT Atlas, an AI-powered web browser with agentic capabilities, has raised significant concerns about prompt injection attacks. As browsers become more integrated with large language models (LLMs), attackers can exploit both direct and indirect prompt injection techniques to manipulate AI agents, potentially causing them to divulge sensitive information or perform unintended actions. The accessibility of such agentic browsers, combined with their ability to automate complex tasks, amplifies the risk landscape for organizations adopting these technologies. Security experts warn that the browser now represents a critical control point for AI security, as it serves as the main interface between users and generative AI systems. The rapid increase in GenAI browser traffic has led to a surge in data security incidents, including inadvertent exposure of confidential information through LLM prompts. Traditional network security measures are often insufficient to address these browser-borne threats, making it imperative for organizations to reassess their security strategies and implement controls specifically designed to mitigate risks associated with AI-powered browsers and prompt injection attacks.

3 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed — before adversaries strike.