Observed

Articles

Founder-researcher notes from the build. More honest than a press release, less formal than the methodology wiki, and always tied back to what the current evidence can actually support.

latest

Observed2026-05-18updated 2026-05-288 min read

The First Multi-Blue Pulse: Better Detection, Still Not Enough Defense

A four-cell Purple Games pulse tested Watcher/Hunter/Responder defenders against the Kings baseline. The signal is useful and uncomfortable: role-specialized blue teams often saw much more, but still failed to turn detection into containment.

Observed2026-04-306 min read

Predicates Can't Grade Art. Agentic Cyber Defense Needs Judges.

A founder-researcher note on why Purple Games is moving from brittle point counters to hybrid AI judge panels for agentic cyber defense benchmarks.

Observed2026-05-079 min read

Frontier Defenders Are Bimodal — Pre-Match Utilization Decides the Match

Eight matches on Opus 4.7 vs GPT-5.5 against an AI cyber range. Headline finding: defenders detect but rarely respond. Then match 17 broke the headline. The story of running 8 paid AI-vs-AI cyber matches, calling a finding, then having the eighth match rewrite it.