OpenAI's Ongoing Defense Against Prompt Injection Attacks in ChatGPT Atlas
OpenAI has implemented an automated attacker system to proactively test and strengthen the security of ChatGPT Atlas, its agentic web browser, against prompt injection attacks. These attacks involve embedding malicious instructions into content that the AI agent processes, potentially causing it to act against the user's interests. The company acknowledges that the very features making agentic browsers powerful also introduce persistent vulnerabilities, and that complete protection from prompt injection is unlikely. OpenAI's approach leverages AI-driven red teaming to rapidly identify and address new attack vectors, aiming to stay ahead of evolving threats.
A recent security update to Atlas was prompted by the internal discovery of a new class of prompt injection attacks using this automated red-teaming system. The attack surface for browser agents is broad, as they can interact with untrusted content from emails, documents, social media, and web pages, increasing the risk of harmful actions such as forwarding sensitive information or altering cloud files. OpenAI emphasizes that defending against prompt injection will be a continuous effort, likening it to an arms race similar to combating online scams, and stresses the importance of a rapid response loop to reduce real-world risks over time.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
3 events from the most recent confirmed update back to the earliest known activity.
OpenAI says prompt injection will remain an ongoing risk for Atlas
OpenAI publicly acknowledged that agentic browsers such as Atlas are inherently vulnerable to prompt injection and similar attacks, and that the problem is unlikely to be fully eliminated. The company said it expects to keep rapidly mitigating and strengthening defenses over time as Atlas becomes a more valuable target.
OpenAI releases Atlas security update with stronger prompt-injection defenses
OpenAI released a security update for ChatGPT Atlas that added adversarially trained models and enhanced safeguards to reduce prompt injection risk. The update was part of the company's response to newly identified attack techniques against the browser agent.
OpenAI uses automated red teaming to uncover new Atlas prompt injections
OpenAI developed an LLM-based automated attacker using reinforcement learning to test its ChatGPT Atlas browser agent and identified a new class of prompt injection attacks. The testing revealed sophisticated multi-step attack paths that had not previously been found by human red teamers or external reports.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
3 references tracked. Mallory keeps watching after this page renders.
ChatGPT Atlas Under Guard: OpenAI Fortifies Browser Agent Against “Prompt Injection” Attacks
securityonline.info
Open sourceHow OpenAI is defending ChatGPT Atlas from attacks now - and why safety's not guaranteed
zdnet.com
Open sourceOpenAI Will Forever Fight Prompt Injection Attacks
bankinfosecurity.com
Open sourceSee the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


