Skip to main content
Mallory
Mallory

Privacy Risks and Data Handling in Large Language Models and ChatGPT

Updated October 20, 2025 at 01:00 PM2 sources

Get Ahead of Threats Like This

Know if you're exposed — before adversaries strike.

Recent research and analysis have highlighted significant concerns regarding the privacy risks associated with large language models (LLMs) and popular AI applications such as ChatGPT. A comprehensive study conducted by researchers from Carnegie Mellon University and Northeastern University reviewed over 1,300 academic papers on AI and machine learning privacy published between 2016 and 2025. The study found that the vast majority of research has focused narrowly on training data leakage and direct chat exposure, with 92 percent of papers addressing only these two areas. This leaves a critical gap in understanding and mitigating more subtle privacy risks, such as inference attacks, context leakage through LLM agents, and the aggregation of user data at scale. The researchers argue that privacy risks extend throughout the entire lifecycle of LLMs, from data collection and processing to deployment and user interaction. They also point out systemic barriers in the research community, including a lag between technological advances and policy development, and a cultural bias that undervalues privacy concerns involving human factors. Meanwhile, practical guidance for end users is becoming increasingly important as AI applications like ChatGPT collect a wide range of personal and technical information, including account details, device data, and potentially sensitive user inputs. OpenAI, the developer of ChatGPT, maintains region-specific privacy policies, but both versions allow for extensive data collection by default. Users can take steps to limit the use of their data for model training, such as configuring privacy settings and enabling features like Temporary Chats. There are also risks associated with connecting third-party services to ChatGPT, which can increase the exposure of personal data. Users are advised to manage AI memory, disable unnecessary integrations, and secure their accounts to reduce the risk of unauthorized access. The lack of cross-disciplinary collaboration between AI, policy, and human-computer interaction fields further exacerbates the challenge of addressing these privacy risks. As LLMs become more integrated into daily life and business operations, organizations and individuals must be proactive in understanding and mitigating the full spectrum of privacy threats. The research underscores the need for a broader approach to privacy that goes beyond technical fixes and includes policy, design, and user education. Both academic and practical perspectives agree that current privacy protections are insufficient for the evolving landscape of AI-driven applications. The findings call for increased attention to underexplored areas of privacy risk, more robust privacy controls for end users, and greater alignment between technological development and regulatory frameworks. Ultimately, safeguarding privacy in the age of LLMs requires a holistic strategy that addresses both technical and human factors across the entire AI ecosystem.

Sources

October 20, 2025 at 12:00 AM
October 20, 2025 at 12:00 AM

Related Stories

Security Risks and Privacy Challenges of Large Language Models in AI Systems

Large language models (LLMs) present a dual-use dilemma in cybersecurity, as their capabilities can be leveraged for both defensive and offensive purposes. Security researchers have identified purpose-built malicious LLMs, such as WormGPT and KawaiiGPT, which are designed to facilitate cybercrime by generating convincing phishing content and rapidly producing or modifying malicious code. The thin line between beneficial and harmful use of LLMs is defined largely by developer intent and the presence or absence of ethical safeguards, raising concerns about the proliferation of offensive AI tools in the threat landscape. In addition to malicious use, LLMs face significant challenges in maintaining privacy and security due to contextual integrity failures and regulatory-driven censorship. Research from Microsoft highlights the need for AI agents to respect contextual privacy norms, as current models may inadvertently leak sensitive information. Meanwhile, the DeepSeek-R1 model demonstrates how geopolitical censorship mechanisms can introduce security flaws, such as insecure code generation and broken authentication, especially when handling politically sensitive prompts. These issues underscore the urgent need for robust privacy controls and security-aware development practices in the deployment of LLM-powered systems.

3 months ago

Privacy Concerns Over AI Training Data and Chatbot Adoption Risks

The rapid adoption of generative AI chatbots, such as ChatGPT, is transforming both consumer and enterprise environments, with significant growth in usage and market value. These chatbots are being used for a wide range of applications, from customer service to code generation and mental health support. However, their increasing prevalence raises concerns about risks such as hallucinations, dangerous suggestions, and the need for robust guardrails to ensure safe deployment and use. Simultaneously, privacy concerns have emerged regarding how major technology companies, like Google, may use personal data to train AI models. Google recently denied allegations that it analyzes private Gmail content to train its Gemini AI model, following a class action lawsuit and public confusion over changes in Gmail's smart features settings. The company clarified that while smart features have existed for years, Gmail content is not used for AI model training, and any changes to terms or policies would be communicated transparently. These developments highlight the ongoing tension between AI innovation, user privacy, and the need for clear communication about data usage.

3 months ago

Enterprise Security Risks and Criminal Abuse of Large Language Models

The widespread integration of large language models (LLMs) into enterprise environments is introducing new security risks at every layer of the technology stack. Security leaders are being urged to rethink traditional trust boundaries, as LLMs can alter assumptions about data handling, application behavior, and internal controls. Key risks include prompt injection, sensitive data leakage through inputs and outputs, and fragmented ownership of LLM-related security responsibilities. Experts emphasize the need to treat LLMs as untrusted compute and to enforce explicit policy and validation layers, rather than relying solely on prompt engineering or fine-tuning. Meanwhile, cybercriminals are actively exploiting the popularity of LLMs by selling discounted access to mainstream AI tools such as ChatGPT, Perplexity, and Gemini on underground forums. These tools are being used by threat actors for a range of malicious activities, including phishing, reconnaissance, and automating cybercrime operations. The criminal use of LLMs lowers the barrier to entry for less-skilled attackers and enables more efficient execution of threat campaigns, highlighting the dual challenge of securing enterprise LLM deployments while monitoring their abuse in the cybercriminal ecosystem.

3 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed — before adversaries strike.