As AI systems get embedded into critical infrastructure, the attack surface grows — and so does the tooling available to security researchers. This post documents the most relevant GitHub repositories for AI red teaming, LLM jailbreaking, and malware ML research, organized by risk level and use case.
⚠️ Warning: Some repositories listed here contain actual malware samples or tools capable of bypassing AI safety systems. Use only in isolated, sandboxed environments.
🔴 High Risk — Potential Malware / Offensive Tools
These repos contain either real malware samples or tools that can actively bypass AI safety mechanisms.
llm-attacks/llm-attacks
Implements the GCG (Greedy Coordinate Gradient) attack — universal adversarial suffixes that can bypass safety filters on virtually any LLM. Published alongside a peer-reviewed research paper, this is one of the most cited offensive LLM tools in existence.
Risk: Direct safety bypass capability on production LLMs.
tml-epfl/llm-adaptive-attacks
ICLR 2025 paper — “Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks.” Contains jailbreak artifacts and scripts targeting GPT-4, Claude, Gemini, and other frontier models.
Risk: Ready-to-run jailbreak artifacts for major LLMs.
jiep/offensive-ai-compilation
Compilation of offensive AI/ML techniques covering:
- Model stealing via prediction APIs
- Hyperparameter theft
- Web-scale training data poisoning
- Adversarial example generation
Risk: Comprehensive offensive ML playbook.
cyber-research/APTMalware
3,500+ real malware samples linked to 12 Advanced Persistent Threat (APT) groups. Used for benchmarking ML-based malware authorship attribution. Contains actual executables.
Risk: Real malware — sandbox only, never execute directly.
iosifache/DikeDataset
Labeled collection of benign and malicious PE and OLE files for training AI malware classifiers. Contains actual malicious binaries alongside benign samples.
Risk: Real malware binaries — handle with care.
🟠 Research — AI Red Teaming Datasets
Prompt and behavior datasets for evaluating LLM safety mechanisms. Generally safer to work with than the above, but still potentially sensitive.
navirocker/llm-red-teaming-dataset
Comprehensive red teaming prompt dataset across 8 AI safety risk categories:
- Jailbreaking (system prompt injection, roleplay, hypothetical scenarios)
- Harmful content generation
- Privacy violations
- Bias and fairness
- Cybersecurity threats
- Misinformation
- Illegal activities
- Psychological manipulation
Cited in 2025 research. Good baseline for safety evaluations.
verazuo/jailbreak_llms
Official repo for ACM CCS 2024. Contains 15,140 real ChatGPT prompts collected Dec 2022 to Dec 2023, including 1,405 confirmed jailbreak prompts — the largest in-the-wild jailbreak dataset publicly available.
JailbreakBench/jailbreakbench
NeurIPS 2024 Datasets and Benchmarks Track. The JBB-Behaviors dataset provides 200 distinct benign and misuse behaviors for tracking progress in jailbreak generation and defense over time.
DAMO-NLP-SG/multilingual-safety-for-LLMs
ICLR 2024 — The MultiJail dataset. Studies how multilingual inputs can bypass safety mechanisms that were primarily trained on English. A significant gap in most current LLM safety training.
allenai/wildjailbreak
262,000 vanilla and adversarial prompt-response pairs from the WildTeaming framework (Allen AI). Mines human-devised jailbreak tactics to programmatically generate diverse adversarial inputs. Primarily used for safety training rather than attacking.
🟡 Tools — Red Teaming Frameworks
Azure/PyRIT
Microsoft’s Python Risk Identification Tool for generative AI. Provides a systematic, repeatable framework for red teaming LLMs. Widely adopted in enterprise AI security assessments.
leondz/garak
Think of it as nmap for LLMs. Probes for hallucination, data leakage, prompt injection, jailbreaks, and a growing list of vulnerability classes. Actively developed and one of the most production-ready scanners available.
sherdencooper/GPTFuzz
Fuzzing-based jailbreak generation. Mutates seed prompts automatically to produce novel jailbreaks at scale. Useful for generating large test suites without manual effort.
confident-ai/deepteam
Open-source LLM red teaming framework released November 2025. Implements vulnerability classes including:
- Prompt injection
- PII leakage
- Hallucinations
- Encoding obfuscations
- Multi-turn jailbreaks
promptfoo/promptfoo
Open-source LLM testing and red teaming tool with CI/CD integration. Supports automated vulnerability scanning and regression testing — good for catching regressions after model updates.
kurogai/100-redteam-projects
100 red team project ideas ranging from beginner to advanced. Useful for building out lab environments and developing systematic offensive skills.
🔵 Curated Lists — Start Here for Research
| Repository | Focus |
|---|---|
| user1342/Awesome-LLM-Red-Teaming | Best overall — frameworks, playgrounds, attack toolkits |
| corca-ai/awesome-llm-security | LLM security focus, actively maintained |
| yueliu1999/Awesome-Jailbreak-on-LLMs | SOTA jailbreak methods, papers, code |
| wearetyomsmnv/Awesome-LLMSecOps | LLM SecOps, updated Dec 2025 |
| cybersecurity-dev/awesome-malware-datasets | Malware ML datasets collection |
| shramos/Awesome-Cybersecurity-Datasets | Broad cybersecurity dataset list |
| jivoi/awesome-ml-for-cybersecurity | ML applied to security research |
Summary
The AI red teaming ecosystem has matured significantly. The tooling now rivals traditional offensive security in sophistication — with automated jailbreak generation, systematic probing frameworks, and large-scale benchmark datasets all publicly available.
If you’re building a red team practice around AI systems, start with garak for scanning, PyRIT for structured assessments, and the JailbreakBench dataset for benchmarking your defenses.
All tools and datasets listed here are intended for security research and defensive purposes. Use responsibly.