AXIS - Agent Experience Index Score

AXIS is both an open scoring framework for measuring agent experience (AX) and a CLI tool that implements it. Think Lighthouse, but instead of scoring user experience, AXIS scores agent experience.

Why AX Matters

The web has Lighthouse. APIs have contract testing. Performance has k6. But there is no standardized way to answer: "How well does my system work when an AI agent tries to use it?"

Without measurement, you are guessing. Maybe agents struggle with your project because the directory structure is confusing. Maybe they waste tokens calling an API that returns unhelpful errors. Maybe they succeed at the task but take three times longer than they should. You cannot fix what you cannot see. AX gives you the same visibility into agent interactions that UX metrics give you for human users.

Our Approach

AXIS is built on two core beliefs.

Measure each dimension independently. Agent experience is not a single number. When a score drops, you need to know why. Is your API slow? Is the agent making unnecessary calls? Is the project structure confusing? AXIS scores four independent dimensions so you can pinpoint the problem. A low Service score tells you your APIs are the bottleneck. A low Environment score tells you your filesystem or tooling is tripping agents up. Generic pass/fail testing cannot surface these signals.

Test against real agent behavior. It is not enough to validate that your system publishes the right config files or follows a protocol spec. What matters is whether agents actually discover and use what you provide. AXIS runs real agents against real scenarios and observes what happens: which tools they call, which files they read, which APIs they hit. This tells you what agents do, not what they could do in theory.

The Scoring Framework

At its core, AXIS defines a standard way to measure agent experience across four independent dimensions. Any tool, platform, or CI system can implement this framework to produce comparable AX measurements.

40%

Goal Achievement

Did the agent complete the task? Evaluated against rubric criteria you define for each scenario.

20%

Environment

How well did the agent use the OS, filesystem, and dev tools? Measures quality of shell commands, file operations, git usage, and build tools.

20%

Service

How effectively did the agent use external services? Evaluates API calls, MCP tools, network requests, and third-party integrations.

20%

Agent

How well did the agent reason and self-organize? Covers planning, task management, tool discovery, and metacognitive behavior.

These four dimensions combine into a single 0 to 100 AXIS Result. The framework specifies what signals feed each dimension, how interactions are categorized, and how the composite score is calculated. See Scoring Framework for full details on the signals and scoring logic.

The CLI Tool

The @netlify/axis package is an easy to use implementation of the scoring framework that will help you get started quickly and maintain a good AX. Because the scoring framework is open and modular, you can also implement your own tooling on top of it. With the AXIS CLI, you can run agent scenarios, capture transcripts, score the results, and produce reports.

Define scenarios as JSON files with a prompt, rubric, and optional setup/teardown steps.
Run them against any supported agent (or your own) in isolated workspaces.
Score automatically using a multi-pass LLM evaluation pipeline that produces per-dimension and composite scores.
Track over time with persistent reports, baseline snapshots, and regression detection for CI gating.

Agent Support

AXIS ships with built-in support for Claude Code, Codex, Gemini, Goose, and more. You can also bring your own agent using the custom agent API. See Execution & Agents for the full list and details.