Observed2026-05-18updated 2026-05-288 min read
A four-cell Purple Games pulse tested Watcher/Hunter/Responder defenders against the Kings baseline. The signal is useful and uncomfortable: role-specialized blue teams often saw much more, but still failed to turn detection into containment.
Observed2026-04-306 min read
A founder-researcher note on why Purple Games is moving from brittle point counters to hybrid AI judge panels for agentic cyber defense benchmarks.
Observed2026-05-079 min read
Eight matches on Opus 4.7 vs GPT-5.5 against an AI cyber range. Headline finding: defenders detect but rarely respond. Then match 17 broke the headline. The story of running 8 paid AI-vs-AI cyber matches, calling a finding, then having the eighth match rewrite it.