Privacy Risks and Data Handling in Large Language Models and ChatGPT
Recent research and analysis have highlighted significant concerns regarding the privacy risks associated with large language models (LLMs) and popular AI applications such as ChatGPT. A comprehensive study conducted by researchers from Carnegie Mellon University and Northeastern University reviewed over 1,300 academic papers on AI and machine learning privacy published between 2016 and 2025. The study found that the vast majority of research has focused narrowly on training data leakage and direct chat exposure, with 92 percent of papers addressing only these two areas. This leaves a critical gap in understanding and mitigating more subtle privacy risks, such as inference attacks, context leakage through LLM agents, and the aggregation of user data at scale. The researchers argue that privacy risks extend throughout the entire lifecycle of LLMs, from data collection and processing to deployment and user interaction. They also point out systemic barriers in the research community, including a lag between technological advances and policy development, and a cultural bias that undervalues privacy concerns involving human factors. Meanwhile, practical guidance for end users is becoming increasingly important as AI applications like ChatGPT collect a wide range of personal and technical information, including account details, device data, and potentially sensitive user inputs. OpenAI, the developer of ChatGPT, maintains region-specific privacy policies, but both versions allow for extensive data collection by default. Users can take steps to limit the use of their data for model training, such as configuring privacy settings and enabling features like Temporary Chats. There are also risks associated with connecting third-party services to ChatGPT, which can increase the exposure of personal data. Users are advised to manage AI memory, disable unnecessary integrations, and secure their accounts to reduce the risk of unauthorized access. The lack of cross-disciplinary collaboration between AI, policy, and human-computer interaction fields further exacerbates the challenge of addressing these privacy risks. As LLMs become more integrated into daily life and business operations, organizations and individuals must be proactive in understanding and mitigating the full spectrum of privacy threats. The research underscores the need for a broader approach to privacy that goes beyond technical fixes and includes policy, design, and user education. Both academic and practical perspectives agree that current privacy protections are insufficient for the evolving landscape of AI-driven applications. The findings call for increased attention to underexplored areas of privacy risk, more robust privacy controls for end users, and greater alignment between technological development and regulatory frameworks. Ultimately, safeguarding privacy in the age of LLMs requires a holistic strategy that addresses both technical and human factors across the entire AI ecosystem.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
2 events from the most recent confirmed update back to the earliest known activity.
Kaspersky publishes guidance on ChatGPT privacy settings
Kaspersky published a blog post explaining privacy and security settings in ChatGPT, reflecting ongoing efforts to help users manage data exposure in consumer AI tools.
Research highlights mismatch in AI privacy focus
A Help Net Security report said much of AI privacy research is focused in the wrong direction, framing a broader concern about how privacy risks in large language model systems are being studied and addressed.
Sources
2 references tracked. Mallory keeps watching after this page renders.
See the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


