Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
ai-platform-securitywidely-deployed-product-advisory

OpenAI's Ongoing Defense Against Prompt Injection Attacks in ChatGPT Atlas

Updated 3mo agoFirst seen Dec 23, 20253 sources

OpenAI has implemented an automated attacker system to proactively test and strengthen the security of ChatGPT Atlas, its agentic web browser, against prompt injection attacks. These attacks involve embedding malicious instructions into content that the AI agent processes, potentially causing it to act against the user's interests. The company acknowledges that the very features making agentic browsers powerful also introduce persistent vulnerabilities, and that complete protection from prompt injection is unlikely. OpenAI's approach leverages AI-driven red teaming to rapidly identify and address new attack vectors, aiming to stay ahead of evolving threats.

A recent security update to Atlas was prompted by the internal discovery of a new class of prompt injection attacks using this automated red-teaming system. The attack surface for browser agents is broad, as they can interact with untrusted content from emails, documents, social media, and web pages, increasing the risk of harmful actions such as forwarding sensitive information or altering cloud files. OpenAI emphasizes that defending against prompt injection will be a continuous effort, likening it to an arms race similar to combating online scams, and stresses the importance of a rapid response loop to reduce real-world risks over time.

Share:
OpenAI's Ongoing Defense Against Prompt Injection Attacks in ChatGPT Atlas
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS
Dec 23, 20256mo ago

OpenAI says prompt injection will remain an ongoing risk for Atlas

OpenAI publicly acknowledged that agentic browsers such as Atlas are inherently vulnerable to prompt injection and similar attacks, and that the problem is unlikely to be fully eliminated. The company said it expects to keep rapidly mitigating and strengthening defenses over time as Atlas becomes a more valuable target.

OpenAI releases Atlas security update with stronger prompt-injection defenses

OpenAI released a security update for ChatGPT Atlas that added adversarially trained models and enhanced safeguards to reduce prompt injection risk. The update was part of the company's response to newly identified attack techniques against the browser agent.

OpenAI uses automated red teaming to uncover new Atlas prompt injections

OpenAI developed an LLM-based automated attacker using reinforcement learning to test its ChatGPT Atlas browser agent and identified a new class of prompt injection attacks. The testing revealed sophisticated multi-step attack paths that had not previously been found by human red teamers or external reports.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

2 LINKEDOpen in app
Organizations
2 linked
OpenaiZiff Davis
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.

OpenAI's Ongoing Defense Against Prompt Injection Attacks in ChatGPT Atlas | Mallory