AI REKT — every AI security disaster, catalogued

2025 – 2026

everything that went
wrong

OpenAI had to rotate code-signing certificates across four platforms. The reason: a supply chain worm called Shai-Hulud reached their npm packages. The same worm poisoned 84 TanStack versions in 6 minutes, spread to PyPI through Mistral and UiPath, and swallowed the entire @antv charting ecosystem (639 versions, one hour). TeamPCP breached GitHub from the inside. One poisoned VS Code extension on an employee laptop. 3,800 internal repositories out the door, listed on BreachForums for $50K. They hit Microsoft the same week — three poisoned versions of the official Azure durabletask SDK on PyPI in 35 minutes. North Korea’s Lazarus Group stole $1.5 billion from Bybit in a single transaction. ShinyHunters took 275 million student records from Canvas. Nitrogen ransomware grabbed 8TB of Apple and NVIDIA schematics from Foxconn. DOGE connected the federal personnel database to the open internet. The NSA chief told the Senate that Anthropic’s Mythos broke into “almost all” classified systems in hours. These are 491 incidents from five months. Every one sourced, graphed, and scored.

491 Incidents Catalogued

7.3 Peak Magnitude

92 CVEs Tracked

Changelog

What was added and when. Sorted by date added to the catalog, not incident date. New entries appear at the top even if the incident happened earlier.

The feed

All incidents sourced from CVEs, security blogs, and public disclosures. Click for details.

Min magnitude: 0.0

It's getting worse

Each dot is an incident. Y-axis is magnitude on a log₁₀ scale of estimated GDP impact in USD. Each +1.0 = 10× more economic damage. See methodology & reference points.

Critical High Medium Trend

The leaderboard

Top 10 by magnitude. Click a row for the full writeup.

The chain reaction

One misconfigured CI/CD pipeline kicked off a seven-month cascade across ten projects and two ecosystems.

By attack type

Incidents sorted by attack type.

What to do about it

Pin everything

MCP servers, VS Code extensions, npm packages. Pin versions to SHAs, not tags. The hackerbot-claw campaign force-pushed 75 of 76 Trivy version tags. Tags lie.

Sandbox your agents

Your coding assistant does not need prod credentials. The OpenClaw agent deleted a live inbox because someone gave it write access to a mailbox for a "review" task. Apply least privilege to every agent connection.

Log what agents do

Log what the agent did, not what you asked it to do. The Meta agent gave bad engineering advice and the resulting config change sat in production for two hours before monitoring picked up the anomalous access.

Rotate credentials constantly

Every supply chain attack here harvested API keys and tokens. The LiteLLM malware fired on every Python script, whether or not it imported LiteLLM. Short-lived tokens limit the blast radius.

Require human approval for destructive actions

Delete, send, publish, pay. These verbs should require a human confirmation step. The Claude Opus incident (9 seconds from prompt to DROP TABLE) happened because there was no approval gate between intent and execution.

Read the OWASP checklists

The OWASP Top 10 for LLM Applications and the Top 10 for Agentic Applications cover the vulnerability classes behind most incidents on this page. Start there.

Harden your CI/CD

The hackerbot-claw → Trivy → LiteLLM → Mini Shai-Hulud chain started with one pull_request_target workflow misconfiguration. Set permissions: read-all. Don't echo untrusted input into shell commands.

Red-team your AI integrations

Hidden markdown, invisible Unicode, poisoned tool descriptions. All published, all with working PoCs. Test your integrations with adversarial inputs before someone else does.

Devin can do this for you

Devin can run dependency audits, harden your CI/CD config, review MCP server setups, and test for prompt injection. It finds the problems described on this page and writes the fixes.

Try Devin

Methodology: the magnitude scale

Magnitude = log₁₀(estimated economic impact in USD) − 2. Each +1.0 on the scale means 10× more economic damage. Estimates combine direct financial losses, remediation costs, business disruption, and downstream cascading effects.

Reference points: cyber

Incident	Year	Est. Impact	Magnitude
NotPetya	2017	~$10B	8.0
Log4Shell (remediation)	2021	~$10B	8.0
CrowdStrike outage	2024	~$5.4B	7.7
WannaCry	2017	~$4–8B	7.7
SolarWinds	2020	~$1–5B	7.2
Colonial Pipeline	2021	~$1–2B	7.1
Equifax	2017	$700M	6.8
Heartbleed (remediation)	2014	~$500M	6.7
Target breach	2013	$162M	6.2

Reference points: crypto

Incident	Year	Est. Impact	Magnitude
FTX collapse	2022	~$8.7B	7.9
Bybit hack	2025	$1.46B	7.2
Ronin Bridge	2022	$625M	6.8
Poly Network	2021	$611M	6.8
Mt. Gox	2014	$450M	6.7
Wormhole	2022	$326M	6.5

Incidents on this page range from magnitude 2.3 (OpenClaw inbox deletion) to 7.3 (Outsider Enterprise, $1.9B cumulative). For context, CrowdStrike sits at 7.7 and NotPetya at 8.0. The AI-specific incidents top out at 7.3 (Outsider Enterprise AI-powered phishing network), up from 6.3 (Mini Shai-Hulud). 491 incidents catalogued.

everything that wentwrong