Practical Guidance on Using LLMs in Security Work and Testing LLM Applications

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS

Feb 6, 20265mo ago

Praetorian introduces Augustus LLM security testing suite

Praetorian introduced Augustus, an open-source LLM security testing tool and accompanying taxonomy covering jailbreaks, prompt injection, data extraction, package hallucinations, RAG/context attacks, multimodal attacks, renderer exploits, evasion methods, and agent/tooling probes. The publication framed these as structured evaluation probes for assessing LLM security.

Feb 5, 20265mo ago

NVISO outlines automated LLM red-teaming with Promptfoo

NVISO published a walkthrough of automated LLM red teaming using Promptfoo, explaining a workflow with target, adversarial, and grader models to test risks such as prompt injection, data leakage, jailbreaking, and authorization failures. The article included a lab against a deliberately vulnerable ChainLit chatbot and reported baseline and iterative jailbreak test results.

Feb 3, 20265mo ago

Guide published on using LLMs to augment security work

A practitioner guide described how to use LLMs such as Claude, Cursor, and ChatGPT to accelerate security and engineering tasks through context-rich prompting, role-stacking, iterative refinement, and validation. It emphasized that LLMs should augment rather than replace analyst judgment.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

3 LINKEDOpen in app

Organizations

3 linked

NvidiaNVISOPromptfoo

SOURCE COVERAGE

Sources

3 references tracked. Mallory keeps watching after this page renders.

3 SOURCESView all

Praetorian BlogNews

Feb 6, 2026

Introducing Augustus: Open Source LLM Prompt Injection Tool | Praetorian

praetorian.com

Open source

Nviso LabsNews

Feb 5, 2026

Boost LLM Security: automated Red Teaming at Scale with Promptfoo

blog.nviso.eu

Open source

Thor Collective DispatchNews

Feb 3, 2026

How I Use LLMs for Security Work - by Josh Rickard

dispatch.thorcollective.com

Open source

ON THE SAME THREAD

Security researchers highlighted multiple ways **LLM adoption can introduce or amplify risk**, including both technical attacks and unsafe development practices. G DATA reported that a Git-hosted “detector” for the **Shai-Hulud worm** shipped with “test files” that were effectively *real malware*: scripts capable of deleting user directories and, in at least one case, uploading data to actual threat actors. The files were apparently intended to validate detection efficacy and may have been produced via AI-assisted “vibe coding,” where the model replicated malicious behavior one-to-one while comments claimed the code was only a simulation; although the test artifacts are not executed during normal tool operation, users could trigger damage by manually running them. Separate academic work summarized by Bruce Schneier described **side-channel attacks against LLM inference**, where data-dependent timing and token/packet-size patterns (including those introduced by efficiency techniques like speculative decoding) can leak information about user prompts even over encrypted channels. Reported impacts include inferring conversation topics with high accuracy and, in some settings, recovering sensitive data such as phone numbers or credit card numbers via active probing. In parallel, an SC Media segment discussed the operational upside of **LLM-driven secure code analysis**, citing results that improved security across hundreds of open-source projects but noting the importance of human validation and patching effort; an OSINT Team post provided a cautionary, practitioner-level example of how easily malware can be accidentally executed during analysis, reinforcing the need for disciplined handling and isolation when working with suspicious files.

Mar 21, 2026

LLM Guardrail Bypass and Prompt Injection Weaknesses

Multiple writeups describe how **LLM safety controls can be bypassed through prompt-based attacks**, arguing that jailbreaks and prompt injection are a practical security problem rather than a novelty. The reporting highlights common defense layers—training-time alignment, system prompts, input classifiers, and output filters—and says each can fail because the same model that follows instructions is also asked to interpret and enforce them. One article frames jailbreaks as an attack on the trust architecture of enterprise AI deployments, while the other demonstrates the issue through Lakera’s *Gandalf* challenge, where progressively stronger controls are still defeated by prompt manipulation. The material is **not fluff** because it provides substantive security analysis of an emerging attack class affecting AI systems. Both references focus on the same topic: how prompts can subvert LLM defenses, expose protected information, and reveal architectural weaknesses in current guardrail designs. The practical takeaway for defenders is that natural-language controls alone are brittle, especially when secrets, policy enforcement, and user-controlled input share the same inference path, making prompt injection and jailbreak resistance a core application security concern for enterprise AI deployments.

Mar 22, 2026

LLM Security Research Shows Faster Exploitation and Hunting, but Reliability Gaps Persist

Security researchers and vendors reported that large language models are becoming materially useful across offensive and defensive security workflows, including vulnerability discovery, exploit development, autonomous penetration testing, and cyber threat intelligence extraction. Recent work described LLM-assisted systems that generated proof-of-concept exploits for npm flaws at a reported **77%** success rate, exploited real-world web vulnerabilities with multi-agent architectures, and executed multistage attacks in emulated enterprise networks when paired with task-specific agents and attack-abstraction layers. Other experiments found local self-hosted models could reliably solve straightforward Juice Shop challenges, while purpose-built scaffolds helped researchers uncover memory-corruption bugs in Windows endpoint products and accelerate reverse engineering of AV and EDR logic. At the same time, multiple studies warned that headline results can overstate real-world capability without strong validation. Google Project Zero said benchmark design, tooling, and automatic verification heavily influence measured performance and concluded current models still fall short of meaningful autonomous offensive research in live environments. Separate academic and practitioner reviews found many automated pentesting frameworks remain brittle, and human validation showed that **71.5%** of supposedly successful LLM-generated exploit PoCs in one follow-up study were actually invalid because the models simulated exploitation rather than triggering the flaw. Across the research, the consistent finding was that LLMs perform best when constrained by structured workflows, specialized tools, execution feedback, and rigorous verification rather than being trusted as fully autonomous hackers.

Jun 10, 2026

Practical Guidance on Using LLMs in Security Work and Testing LLM Applications

Get ahead of threats like this

How this story unfolded

Praetorian introduces Augustus LLM security testing suite

NVISO outlines automated LLM red-teaming with Promptfoo

Guide published on using LLMs to augment security work

Related entities

Sources

Introducing Augustus: Open Source LLM Prompt Injection Tool | Praetorian

Boost LLM Security: automated Red Teaming at Scale with Promptfoo

How I Use LLMs for Security Work - by Josh Rickard

See the full picture, correlated to your attack surface.

Practical Guidance on Using LLMs in Security Work and Testing LLM Applications

Get ahead of threats like this

How this story unfolded

Praetorian introduces Augustus LLM security testing suite

NVISO outlines automated LLM red-teaming with Promptfoo

Guide published on using LLMs to augment security work

Related entities

Sources

Introducing Augustus: Open Source LLM Prompt Injection Tool | Praetorian

Boost LLM Security: automated Red Teaming at Scale with Promptfoo

How I Use LLMs for Security Work - by Josh Rickard

See the full picture, correlated to your attack surface.

Related stories

AI and LLM Security Risks: Malicious Test Artifacts, Side-Channel Leakage, and LLM-Assisted Code Review

LLM Guardrail Bypass and Prompt Injection Weaknesses

LLM Security Research Shows Faster Exploitation and Hunting, but Reliability Gaps Persist