Quick Start
Get AXIS running in your project. This guide walks through creating a config, writing your first scenario, running it, and understanding the results.
Prerequisites
- Node.js 18 or later.
- An API key for at least one supported agent (for example,
ANTHROPIC_API_KEYfor Claude Code).
1. Create a Config File
Add an axis.config.json to your project root. At minimum, specify where your
scenarios live and which agents to run.
{
"scenarios": "./scenarios",
"agents": ["claude-code"]
} See Configuration Reference for the full set of options, including scoring weights, MCP servers, and custom agents.
2. Write a Scenario
Create a scenarios/ directory and add your first scenario as a JSON file. Each
scenario needs three things: a name, a prompt (the task for
the agent), and a rubric (how to judge whether it succeeded).
{
"name": "Create a greeting file",
"prompt": "Create a file called hello.txt with the content 'Hello from AXIS'.",
"rubric": [
{ "check": "File hello.txt exists in the workspace", "weight": 0.5 },
{ "check": "File contains exactly 'Hello from AXIS'", "weight": 0.5 }
]
}
Save this as scenarios/hello-world.json. The filename (without .json)
becomes the scenario key used in reports and CLI commands.
A few things that make scenarios work well:
- Specific prompts -tell the agent exactly what to do. Vague prompts lead to inconsistent results.
- Observable rubric checks -each check should describe something a judge can verify from the transcript and workspace state.
- Weighted checks -distribute weight based on importance. If the file existing matters more than its content, weight it higher.
See Writing Scenarios for a deeper guide on prompts, rubrics, setup/teardown, and examples.
3. Run It
npx @netlify/axis run AXIS spawns the agent in an isolated workspace, captures the full interaction transcript, scores the result against your rubric, and displays a summary in your terminal.
What to Expect
The terminal displays a live progress view while the agent runs. You will see each scenario/agent combination with its current status (running, scoring, done, or failed) and a live token counter showing how many tokens the agent has consumed.
Once scoring completes, AXIS prints a summary table showing:
- The composite AXIS Result (0 to 100) for each scenario/agent pair.
- Breakdowns for each of the four scoring dimensions: Goal Achievement, Environment, Service, and Agent.
- Score insights for any dimension that scored below 75, identifying the weakest signal.
4. View the Report
Every run saves a report to .axis/reports/. You can revisit it at any time.
# View the latest report summary
npx @netlify/axis reports latest
# Open the HTML report in your browser
npx @netlify/axis reports latest --html
# Get JSON output for scripting
npx @netlify/axis reports latest --json The HTML report includes the full scoring breakdown, interaction transcript, and judge evaluations. See Reports & Baselines for details on report contents and storage.
5. Interpret Your Results
The AXIS Result is a composite of four dimensions, each measuring a different aspect of the agent's interaction with your system.
| Dimension | What It Tells You |
|---|---|
| Goal Achievement | Did the agent complete the task? Scored against your rubric checks. |
| Environment | How well did shell commands, file operations, and dev tools work? |
| Service | How effectively were APIs, MCP tools, and external services used? |
| Agent | How well did the agent reason and plan? Were its actions necessary and well-scoped? |
A score of 50 represents median performance. Scores above 75 are good; above 90 is excellent. See Scoring Framework for the full explanation of how each dimension is calculated and why.
6. Set a Baseline
Once you have a run you are satisfied with, save it as a baseline. Future runs can diff against it to detect regressions.
# Save the latest report as a baseline
npx @netlify/axis baseline set
# Compare future runs automatically
npx @netlify/axis run --compare-baseline The comparison exits with code 1 if any regressions are detected, making it suitable for CI gating. See Reports & Baselines for baseline workflows.
Add .axis/reports/ and .axis/skills-cache/ to your
.gitignore. Baselines (.axis/baselines/) are designed to be
committed so the whole team shares the same regression thresholds.
Next Steps
- Scoring Framework -how the four dimensions are calculated, what signals drive each score, and why the scoring works the way it does.
- Writing Scenarios -how to write effective prompts, design rubrics, and use setup/teardown actions.
- Execution & Agents -how AXIS runs scenarios, supported and custom agents, and workspace isolation.