Quick Start

Get AXIS running in your project. This guide walks through creating a config, writing your first scenario, running it, and understanding the results.

Prerequisites

Node.js 18 or later.
An API key for at least one supported agent (for example, ANTHROPIC_API_KEY for Claude Code).

1. Create a Config File

Add an axis.config.json to your project root. At minimum, specify where your scenarios live and which agents to run.

{
  "scenarios": "./scenarios",
  "agents": ["claude-code"]
}

See Configuration Reference for the full set of options, including scoring weights, MCP servers, and custom agents.

2. Write a Scenario

Create a scenarios/ directory and add your first scenario as a JSON file. Each scenario needs three things: a name, a prompt (the task for the agent), and a rubric (how to judge whether it succeeded).

{
  "name": "Create a greeting file",
  "prompt": "Create a file called hello.txt with the content 'Hello from AXIS'.",
  "rubric": [
    { "check": "File hello.txt exists in the workspace", "weight": 0.5 },
    { "check": "File contains exactly 'Hello from AXIS'", "weight": 0.5 }
  ]
}

Save this as scenarios/hello-world.json. The filename (without .json) becomes the scenario key used in reports and CLI commands.

A few things that make scenarios work well:

Specific prompts -tell the agent exactly what to do. Vague prompts lead to inconsistent results.
Observable rubric checks -each check should describe something a judge can verify from the transcript and workspace state.
Weighted checks -distribute weight based on importance. If the file existing matters more than its content, weight it higher.

See Writing Scenarios for a deeper guide on prompts, rubrics, setup/teardown, and examples.

3. Run It

npx @netlify/axis run

AXIS spawns the agent in an isolated workspace, captures the full interaction transcript, scores the result against your rubric, and displays a summary in your terminal.

What to Expect

The terminal displays a live progress view while the agent runs. You will see each scenario/agent combination with its current status (running, scoring, done, or failed) and a live token counter showing how many tokens the agent has consumed.

Once scoring completes, AXIS prints a summary table showing:

The composite AXIS Result (0 to 100) for each scenario/agent pair.
Breakdowns for each of the four scoring dimensions: Goal Achievement, Environment, Service, and Agent.
Score insights for any dimension that scored below 75, identifying the weakest signal.

4. View the Report

Every run saves a report to .axis/reports/. You can revisit it at any time.

# View the latest report summary
npx @netlify/axis reports latest

# Open the HTML report in your browser
npx @netlify/axis reports latest --html

# Get JSON output for scripting
npx @netlify/axis reports latest --json

The HTML report includes the full scoring breakdown, interaction transcript, and judge evaluations. See Reports & Baselines for details on report contents and storage.

5. Interpret Your Results

The AXIS Result is a composite of four dimensions, each measuring a different aspect of the agent's interaction with your system.

Dimension	What It Tells You
Goal Achievement	Did the agent complete the task? Scored against your rubric checks.
Environment	How well did shell commands, file operations, and dev tools work?
Service	How effectively were APIs, MCP tools, and external services used?
Agent	How well did the agent reason and plan? Were its actions necessary and well-scoped?

A score of 50 represents median performance. Scores above 75 are good; above 90 is excellent. See Scoring Framework for the full explanation of how each dimension is calculated and why.

6. Set a Baseline

Once you have a run you are satisfied with, save it as a baseline. Future runs can diff against it to detect regressions.

# Save the latest report as a baseline
npx @netlify/axis baseline set

# Compare future runs automatically
npx @netlify/axis run --compare-baseline

The comparison exits with code 1 if any regressions are detected, making it suitable for CI gating. See Reports & Baselines for baseline workflows.

Gitignore

Add .axis/reports/ and .axis/skills-cache/ to your .gitignore. Baselines (.axis/baselines/) are designed to be committed so the whole team shares the same regression thresholds.

Next Steps

Scoring Framework -how the four dimensions are calculated, what signals drive each score, and why the scoring works the way it does.
Writing Scenarios -how to write effective prompts, design rubrics, and use setup/teardown actions.
Execution & Agents -how AXIS runs scenarios, supported and custom agents, and workspace isolation.