Public research benchmark for agentic cyber defense. Current data is early evidence, not a model leaderboard.

Real matches 9Public rows 8 research notesIndex-grade rows 0
Observedresearch-note onlyAnecdotaln=1

match 001

Match 001 Public Report

early archetype-A replay. Marker-only public replay built from the current research-note corpus, not raw operator telemetry.

scenario
archetype-a-vuln

scenario family visible in the public report

markers
4

marker-only replay; raw match event payloads are intentionally excluded

redaction
pending

screenshots and clips stay pending until review

status
research-note only

research-note reports do not become model rankings

Marker-only replay

Replay entries summarize public evidence markers. Raw commands, infrastructure details, and private transcripts are not exposed.

Replay
  1. #1
    Observedblue.observeblue

    Defensive signal observed

    The public marker stream records that the defender observed activity relevant to the scenario.

    scenario id
    archetype-a-vuln
    sample size
    n=1
  2. #2
    Replayblue.classifyblue

    Detection score reconstructed

    Panel replay scored the blue team's detection behavior under the current schema.

    metric name
    blue_detection
    value
    1.7
    panel schema
    v1
  3. #3
    Replaydefense.action_proposedblue

    Response behavior summarized

    The public marker stream keeps response evidence as a summary, with raw commands withheld.

    metric name
    blue_response
    value
    0.3
  4. #4
    Replaybenchmark.metricreferee

    Rubric scores attached

    Research-note scores are attached for methodology review without upgrading the row into a model ranking.

    red methodology
    6
    creativity
    5
    public status
    research-note only

Research links

Limitations

  • marker-only replay; raw match event payloads are intentionally excluded
  • legacy SOC health and canonical bundle review pending
  • Current public report is research-note only until canonical bundles, redaction metadata, and public ingestion manifests are complete.