As AI systems get embedded into critical infrastructure, the attack surface grows — and so does the tooling available to security researchers. This post documents the most relevant GitHub repositories for AI red teaming, LLM jailbreaking, and malware ML research, organized by risk level and use case.

⚠️ Warning: Some repositories listed here contain actual malware samples or tools capable of bypassing AI safety systems. Use only in isolated, sandboxed environments.


🔴 High Risk — Potential Malware / Offensive Tools

These repos contain either real malware samples or tools that can actively bypass AI safety mechanisms.

llm-attacks/llm-attacks

Implements the GCG (Greedy Coordinate Gradient) attack — universal adversarial suffixes that can bypass safety filters on virtually any LLM. Published alongside a peer-reviewed research paper, this is one of the most cited offensive LLM tools in existence.

Risk: Direct safety bypass capability on production LLMs.


tml-epfl/llm-adaptive-attacks

ICLR 2025 paper — “Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks.” Contains jailbreak artifacts and scripts targeting GPT-4, Claude, Gemini, and other frontier models.

Risk: Ready-to-run jailbreak artifacts for major LLMs.


jiep/offensive-ai-compilation

Compilation of offensive AI/ML techniques covering:

  • Model stealing via prediction APIs
  • Hyperparameter theft
  • Web-scale training data poisoning
  • Adversarial example generation

Risk: Comprehensive offensive ML playbook.


cyber-research/APTMalware

3,500+ real malware samples linked to 12 Advanced Persistent Threat (APT) groups. Used for benchmarking ML-based malware authorship attribution. Contains actual executables.

Risk: Real malware — sandbox only, never execute directly.


iosifache/DikeDataset

Labeled collection of benign and malicious PE and OLE files for training AI malware classifiers. Contains actual malicious binaries alongside benign samples.

Risk: Real malware binaries — handle with care.


🟠 Research — AI Red Teaming Datasets

Prompt and behavior datasets for evaluating LLM safety mechanisms. Generally safer to work with than the above, but still potentially sensitive.

Comprehensive red teaming prompt dataset across 8 AI safety risk categories:

  • Jailbreaking (system prompt injection, roleplay, hypothetical scenarios)
  • Harmful content generation
  • Privacy violations
  • Bias and fairness
  • Cybersecurity threats
  • Misinformation
  • Illegal activities
  • Psychological manipulation

Cited in 2025 research. Good baseline for safety evaluations.


verazuo/jailbreak_llms

Official repo for ACM CCS 2024. Contains 15,140 real ChatGPT prompts collected Dec 2022 to Dec 2023, including 1,405 confirmed jailbreak prompts — the largest in-the-wild jailbreak dataset publicly available.


JailbreakBench/jailbreakbench

NeurIPS 2024 Datasets and Benchmarks Track. The JBB-Behaviors dataset provides 200 distinct benign and misuse behaviors for tracking progress in jailbreak generation and defense over time.


DAMO-NLP-SG/multilingual-safety-for-LLMs

ICLR 2024 — The MultiJail dataset. Studies how multilingual inputs can bypass safety mechanisms that were primarily trained on English. A significant gap in most current LLM safety training.


allenai/wildjailbreak

262,000 vanilla and adversarial prompt-response pairs from the WildTeaming framework (Allen AI). Mines human-devised jailbreak tactics to programmatically generate diverse adversarial inputs. Primarily used for safety training rather than attacking.


🟡 Tools — Red Teaming Frameworks

Azure/PyRIT

Microsoft’s Python Risk Identification Tool for generative AI. Provides a systematic, repeatable framework for red teaming LLMs. Widely adopted in enterprise AI security assessments.


leondz/garak

Think of it as nmap for LLMs. Probes for hallucination, data leakage, prompt injection, jailbreaks, and a growing list of vulnerability classes. Actively developed and one of the most production-ready scanners available.


sherdencooper/GPTFuzz

Fuzzing-based jailbreak generation. Mutates seed prompts automatically to produce novel jailbreaks at scale. Useful for generating large test suites without manual effort.


confident-ai/deepteam

Open-source LLM red teaming framework released November 2025. Implements vulnerability classes including:

  • Prompt injection
  • PII leakage
  • Hallucinations
  • Encoding obfuscations
  • Multi-turn jailbreaks

promptfoo/promptfoo

Open-source LLM testing and red teaming tool with CI/CD integration. Supports automated vulnerability scanning and regression testing — good for catching regressions after model updates.


kurogai/100-redteam-projects

100 red team project ideas ranging from beginner to advanced. Useful for building out lab environments and developing systematic offensive skills.


🔵 Curated Lists — Start Here for Research

Repository Focus
user1342/Awesome-LLM-Red-Teaming Best overall — frameworks, playgrounds, attack toolkits
corca-ai/awesome-llm-security LLM security focus, actively maintained
yueliu1999/Awesome-Jailbreak-on-LLMs SOTA jailbreak methods, papers, code
wearetyomsmnv/Awesome-LLMSecOps LLM SecOps, updated Dec 2025
cybersecurity-dev/awesome-malware-datasets Malware ML datasets collection
shramos/Awesome-Cybersecurity-Datasets Broad cybersecurity dataset list
jivoi/awesome-ml-for-cybersecurity ML applied to security research

Summary

The AI red teaming ecosystem has matured significantly. The tooling now rivals traditional offensive security in sophistication — with automated jailbreak generation, systematic probing frameworks, and large-scale benchmark datasets all publicly available.

If you’re building a red team practice around AI systems, start with garak for scanning, PyRIT for structured assessments, and the JailbreakBench dataset for benchmarking your defenses.


All tools and datasets listed here are intended for security research and defensive purposes. Use responsibly.