Light Research Atlasresearch-note onlyAnecdotalcurrent corpus

Purple Games

A defense-first benchmark for agentic cyber matches. The public surface starts with real AI-vs-AI match artifacts, labels every row by evidence strength, and keeps current results framed as research notes until index-grade methodology and replication are ready.

View Benchmarks Read Methodology Match Mechanics

observed matches

8 scored public research-note rows; match 004 aborted/excluded

index-grade rows

No rows have cleared canonical bundle, redaction, mirrored-run, and uncertainty gates yet.

judge panel

claude-opus-4.7 + gpt-5.4 + gemini-2.5-pro

evidence boundary

Purple Games can show observed match notes, scoring design, and methodology status today. It does not present model rankings, production-readiness claims, or index-grade rows from the current corpus.

The launch snapshot uses 8 scored public research-note rows from 9 real matches; match 004 aborted/excluded.

Purple Games

evidence boundary

Cyber Defense

Cyber Offense

GAMES Framework