Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
privacy-surveillance-policyai-platform-securitycybersecurity-regulation

Privacy Risks and Data Handling in Large Language Models and ChatGPT

Updated 3mo agoFirst seen Oct 20, 20252 sources

Recent research and analysis have highlighted significant concerns regarding the privacy risks associated with large language models (LLMs) and popular AI applications such as ChatGPT. A comprehensive study conducted by researchers from Carnegie Mellon University and Northeastern University reviewed over 1,300 academic papers on AI and machine learning privacy published between 2016 and 2025. The study found that the vast majority of research has focused narrowly on training data leakage and direct chat exposure, with 92 percent of papers addressing only these two areas. This leaves a critical gap in understanding and mitigating more subtle privacy risks, such as inference attacks, context leakage through LLM agents, and the aggregation of user data at scale. The researchers argue that privacy risks extend throughout the entire lifecycle of LLMs, from data collection and processing to deployment and user interaction. They also point out systemic barriers in the research community, including a lag between technological advances and policy development, and a cultural bias that undervalues privacy concerns involving human factors. Meanwhile, practical guidance for end users is becoming increasingly important as AI applications like ChatGPT collect a wide range of personal and technical information, including account details, device data, and potentially sensitive user inputs. OpenAI, the developer of ChatGPT, maintains region-specific privacy policies, but both versions allow for extensive data collection by default. Users can take steps to limit the use of their data for model training, such as configuring privacy settings and enabling features like Temporary Chats. There are also risks associated with connecting third-party services to ChatGPT, which can increase the exposure of personal data. Users are advised to manage AI memory, disable unnecessary integrations, and secure their accounts to reduce the risk of unauthorized access. The lack of cross-disciplinary collaboration between AI, policy, and human-computer interaction fields further exacerbates the challenge of addressing these privacy risks. As LLMs become more integrated into daily life and business operations, organizations and individuals must be proactive in understanding and mitigating the full spectrum of privacy threats. The research underscores the need for a broader approach to privacy that goes beyond technical fixes and includes policy, design, and user education. Both academic and practical perspectives agree that current privacy protections are insufficient for the evolving landscape of AI-driven applications. The findings call for increased attention to underexplored areas of privacy risk, more robust privacy controls for end users, and greater alignment between technological development and regulatory frameworks. Ultimately, safeguarding privacy in the age of LLMs requires a holistic strategy that addresses both technical and human factors across the entire AI ecosystem.

Share:
Privacy Risks and Data Handling in Large Language Models and ChatGPT
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

2 events from the most recent confirmed update back to the earliest known activity.

2 EVENTS
Oct 20, 20258mo ago

Kaspersky publishes guidance on ChatGPT privacy settings

Kaspersky published a blog post explaining privacy and security settings in ChatGPT, reflecting ongoing efforts to help users manage data exposure in consumer AI tools.

Research highlights mismatch in AI privacy focus

A Help Net Security report said much of AI privacy research is focused in the wrong direction, framing a broader concern about how privacy risks in large language model systems are being studied and addressed.

The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.