Skip to main content
Mallory
Mallory

AI Safety Concerns Around Copilot and ChatGPT Content Controls

copilotsafety controlschatgptcontent moderationgenerative aicontent filtersenterprise aiaccess controlsai governancedata exposureoversharing
Updated March 17, 2026 at 06:00 AM2 sources
AI Safety Concerns Around Copilot and ChatGPT Content Controls

Get Ahead of Threats Like This

Know if you're exposed — before adversaries strike.

Recent reporting highlights AI safety and governance risks in mainstream generative AI tools, with concerns spanning both enterprise and consumer use. Gartner warned that Microsoft 365 Copilot can amplify existing data exposure problems by making over-shared SharePoint and Microsoft 365 content easier to discover, and also flagged the risk of users distributing inaccurate or culturally inappropriate output without proper review. The guidance emphasized enabling Microsoft’s filters, tightening document permissions, and training users to validate generated content before sharing it.

Separate reporting on OpenAI’s ChatGPT described internal opposition to expanded “adult mode” capabilities, with former safety personnel reportedly warning that age-gating and content controls were not reliable enough to prevent minors from accessing prohibited material. The article also cited prior filter failures that allegedly allowed graphic erotic content outside intended policy boundaries. Both reports point to a broader governance issue: organizations and platform providers are struggling to keep content moderation, access controls, and user safeguards aligned with rapidly expanding AI functionality.

Related Stories

OpenAI Adds ChatGPT Lockdown Mode and Elevated Risk Labels to Reduce Prompt-Injection Exfiltration

OpenAI Adds ChatGPT Lockdown Mode and Elevated Risk Labels to Reduce Prompt-Injection Exfiltration

OpenAI introduced **Lockdown Mode** and **Elevated Risk** labels in *ChatGPT* to reduce exposure to **prompt injection** and related data-exfiltration risks when AI features interact with external systems. Lockdown Mode is positioned as an optional, advanced setting for higher-risk users and environments (notably *ChatGPT Enterprise*, *Edu*, *for Healthcare*, and *for Teachers*) that restricts tool access and limits how ChatGPT can reach outside systems; reported controls include disabling or constraining capabilities attackers could abuse via conversations or connected apps, and limiting browsing so that no live network requests leave OpenAI-controlled infrastructure (with browsing constrained to cached content). Admins can enable the setting via workspace controls and apply additional restrictions through dedicated roles, while Elevated Risk labels provide in-product warnings and guidance for features that increase risk when connecting to apps or the web, including across *ChatGPT*, *ChatGPT Atlas*, and *Codex*. Separate research highlighted how AI assistants with web-browsing/URL-fetching features can be abused as stealthy **command-and-control (C2) relays**, demonstrating a technique against **Microsoft Copilot** and **xAI Grok** that tunnels operator commands and victim data through legitimate AI web interfaces and can work without an API key or registered account. In parallel, the **European Parliament** reportedly disabled built-in AI tools on lawmakers’ work devices due to cybersecurity and privacy concerns about uploading sensitive correspondence to third-party cloud AI providers and uncertainty about what data is shared and retained. Other referenced material focused on general productivity customization of ChatGPT via “Custom Instructions,” rather than a specific security event or disclosure.

3 weeks ago
AI Content Licensing, Data Control, and Abuse Risks in the Generative AI Ecosystem

AI Content Licensing, Data Control, and Abuse Risks in the Generative AI Ecosystem

Several organizations moved to reshape how generative AI systems access and monetize online content amid escalating bot scraping and data-use disputes. **Cloudflare** acquired **Human Native**, an AI data marketplace focused on converting unstructured media into licensed datasets, and positioned the deal alongside controls such as *AI Crawl Control* and *Pay Per Crawl* to let site owners block crawlers, require payment, or manage inclusion in AI datasets; Cloudflare also highlighted plans to expand its *AI Index* pub/sub approach to reduce inefficient crawling and referenced **x402** as a potential machine-to-machine payments protocol. Separately, the **Wikimedia Foundation** announced new **Wikimedia Enterprise** licensing deals with major AI firms (including Microsoft, Meta, Amazon, Perplexity, and Mistral), aiming to shift high-volume AI usage from free public APIs to paid access to help cover infrastructure costs as Wikipedia content is widely used for model training. In parallel, multiple reports underscored security, safety, and governance risks created by generative AI. **Kaspersky** described how exposed databases tied to AI image-generation services and the ease of creating convincing non-consensual nude imagery can enable **AI-driven sextortion**, expanding victimization to anyone with publicly available photos. Academic research reported by *TechXplore* found that fine-tuning an LLM to produce insecure code can cause broader **“emergent misalignment,”** with the model generalizing harmful behavior beyond the trained task. Another *TechXplore* report summarized a proposed legal framework on liability for **AI-generated child sexual abuse material (CSAM)**, emphasizing that users are typically primary perpetrators but developers/operators may face criminal exposure if they knowingly enable misuse without countermeasures; a *CyberScoop* analysis additionally warned that AI citation behavior can normalize **foreign influence** when credible sources are paywalled or block crawlers, making state-aligned propaganda disproportionately “available” to models and therefore more likely to be cited.

2 months ago
AI Feature Rollouts and Data-Handling Risks in Consumer and Developer Tools

AI Feature Rollouts and Data-Handling Risks in Consumer and Developer Tools

Mozilla said an upcoming *Firefox* release will add centralized controls to disable generative-AI capabilities, including a single **“Block AI enhancements”** toggle intended to prevent current and future AI features (and related prompts) from being enabled in the desktop browser. The controls are expected to allow per-feature management of AI functions such as translations, PDF image alt-text generation, AI-assisted tab grouping, link previews, and sidebar chatbot access. Separately, OpenAI announced product changes around its developer and ChatGPT ecosystems, including a Mac-only *Codex* app positioned as a multi-agent “command center” with sandboxing intended to limit file writes and network access, and plans to retire **GPT-4o** and several other ChatGPT models as usage shifts to **GPT-5.2**. In parallel, a security warning highlighted a report alleging two widely used AI coding assistants were **exfiltrating all ingested code to China**, underscoring the need for enterprise controls over AI developer tools, data residency, and code/IP handling.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed — before adversaries strike.