Reports & Baselines
Every AXIS run produces a report with full scoring breakdowns and interaction transcripts. Baselines let you snapshot scores and detect regressions over time.
Understanding Reports
Every run automatically saves a report to .axis/reports/. Each report is a
directory containing a manifest and per-scenario result files.
.axis/reports/{reportId}/
report.json # Manifest with summary + metadata
report.html # Visual report (after scoring)
scenarios/{key}/{agent}.json # Full result with transcript + scores
scenarios/{key}/{agent}.raw.ndjson # Raw agent stdout
scenarios/{key}/{agent}.sparse-index.txt # Compressed transcript for scoring The manifest
report.json contains the run metadata and a summary of every scenario/agent result:
the composite AXIS Result, per-dimension scores, token usage, duration, and any error messages.
This is the file you read when scripting against AXIS output.
Scenario files
Each {agent}.json file under scenarios/ contains the full result
for one scenario/agent combination: the complete interaction transcript, judge evaluations,
per-interaction signal scores, and the rubric assessment.
Viewing Reports
# List all reports
npx @netlify/axis reports
# View the latest report summary
npx @netlify/axis reports latest
# View a specific scenario detail
npx @netlify/axis reports latest hello-world
# Filter by agent
npx @netlify/axis reports latest --agent claude-code HTML reports
Open the visual report in your browser for the richest view:
npx @netlify/axis reports latest --html The HTML report includes:
- Composite and per-dimension score breakdowns with visual indicators.
- The full interaction transcript with tool calls and results.
- Judge evaluations for each rubric check and interaction signal.
- Score insights identifying the weakest signals for low-scoring dimensions.
JSON output
For scripting and CI integration, use --json to get machine-readable output:
npx @netlify/axis reports latest --json Baselines
Baselines snapshot your scores at a point in time. You compare future runs against a baseline to detect regressions -scores that dropped by more than the noise tolerance (1 point).
Setting a baseline
# Save from the latest report
npx @netlify/axis baseline set
# Save with a name (for multiple baselines)
npx @netlify/axis baseline set v1.0
# Save from a specific report
npx @netlify/axis baseline set --from 20260415-143022 Comparing against a baseline
# Compare during a run (automatic)
npx @netlify/axis run --compare-baseline
# Compare explicitly after a run
npx @netlify/axis baseline compare
# Compare against a named baseline
npx @netlify/axis baseline compare v1.0 The comparison shows deltas for each score. Score changes within the noise tolerance (0 to 1 point) are reported as unchanged. Regressions are highlighted and the command exits with code 1 if any are detected.
When to set baselines
- After establishing a good score: Run your scenarios, review the results, and if you are satisfied, save the baseline. This becomes your quality floor.
- After intentional changes: If you change your project structure, APIs, or agent configuration and scores change as expected, update the baseline to reflect the new normal.
- Named baselines for releases: Use named baselines (
baseline set v2.0) to track scores across major versions.
Managing baselines
# List all baselines
npx @netlify/axis baseline list
# View baseline contents
npx @netlify/axis baseline show
# Delete a baseline
npx @netlify/axis baseline delete v1.0 CI Integration
AXIS is designed to run in CI environments. The key patterns:
-
--json-Machine-readable output to stdout. No live terminal display, no color codes. Suitable for piping to other tools or saving as artifacts. -
--compare-baseline-Exits with code 1 if regressions are detected. Use this as a CI gate: the build fails if agent experience degrades. -
--concurrency-Control resource usage in constrained CI environments. - API keys via environment -Pass
ANTHROPIC_API_KEY,CODEX_API_KEY, orGEMINI_API_KEYas CI secrets.
GitHub Actions example
# GitHub Actions example
- name: Run AXIS tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: npx @netlify/axis run --json --compare-baseline Report Storage
Baselines are stored in .axis/baselines/ and designed to be checked into version
control so your team shares the same regression thresholds.
Reports and cached skills should not be committed:
# .gitignore
.axis/reports/
.axis/skills-cache/