zaira_standard

The measure of
agent-readiness

v0.9 March 2026 71 criteria 12 modules open methodology

Developer tools have a new class of customer. Already today, AI coding agents are choosing and using tools at a higher rate than humans.

Is your tool ready for them?

Read the Standard v0.9 Get Certified →

01 — at_a_glance

The Standard at a glance

The Zaira Standard defines what "Agent Ready" means: 71 criteria across all domains of a tool's function, based on broad industry research, delivered in an open methodology.

What's measured

Base Standard applies to all tools. Additional modules activate based on tool complexity.

Base Standard 15 criteria

Complexity modules (5) 36 criteria

Domain modules (6) 20 criteria

Total 71 criteria

How it's scored

Scoring gradients are built to reflect realistic use cases and how agents work today.

0

Not present

1

Basic

2

Solid

3

Exemplary

Where you stand

The Zaira Score is the weighted percentage across all applicable modules.

Agent Ready ≥60%

All MUST criteria pass

Agent Native ≥80%

All MUSTs ≥2, zero gaps

The best tools are independently evaluated. See what it takes to get Zaira Certified.

02 — how_it_works

One standard, right-sized for every tool

"Developer tools" is a broad category where different tools have different needs. Agent-readiness looks very different for a simple utility library vs. a payment processing API. The Standard is intentionally modular: a base evaluation applies to every tool, with additional complexity modules activating based on what your tool actually does. You're only measured on what's relevant.

15

criteria

Base Standard

Applies to every tool, always. The foundation everything else builds on.

Documentation quality, machine-readable formats, error handling, versioning, security posture, maintenance signals, licensing.

Complexity modules activate based on what the tool does

	Module	#	When	Coverage
`PI`	Programmatic Interface	16	Tool exposes an API, SDK, or MCP server	SDK quality, API design, type safety, error contracts
`NS`	Network Service	8	Tool is a hosted or remote service	Uptime, error responses, rate limiting, latency
`WO`	Write Operations	4	Tool creates, modifies, or deletes data	Idempotency, rollback, conflict handling
`AU`	Authentication	4	Tool requires credentials or tokens	Key management, token flows, non-interactive auth
`CLI`	CLI	4	Tool has a command-line interface	Exit codes, machine-readable output, scriptability

Domain modules criteria specific to a tool's domain

	Module	#	When	Coverage
`PM`	Payments	4	Tool processes payments	Test mode parity, webhook verification, PCI scope
`CM`	Communications	3	Tool sends messages or notifications	Delivery status, templates, channel abstraction
`DB`	Databases	4	Tool stores or queries data	Migrations, connection pooling, schema introspection
`HI`	Hosting	3	Tool deploys or hosts applications	Deploy automation, environment parity, scaling
`AP`	Auth Providers	3	Tool manages user identity	Non-interactive flows, MFA, session management
`FL`	Frameworks	3	Tool scaffolds or structures applications	Scaffolding, conventions, plugin architecture

How modules activate

Your tool's characteristics determine which modules apply.

Complex tool: payment API

Has an SDK, runs as a network service, handles write operations, requires authentication, operates in the payments domain.

Base (15) + PI (16) + NS (8) + WO (4) + AU (4) + PM (4) = 51 criteria

Simple tool: open-source library

No API, no network calls, no auth. A library you install and call. Only the Base Standard applies.

Base (15) = 15 criteria

Want to know what this looks like
in practice?

The Best Practices Guide walks through every criterion with concrete and realistic implementation patterns with real examples.

Best Practices Guide →

03 — the_research

Built on evidence, not opinion

The Zaira Standard wasn't designed in a vacuum. Every criterion traces back to published research, real-world incident data, or empirical measurement of how agents actually interact with developer tools. The findings shaped what we measure and why.

40+

academic research sources

1,400+

MCP servers analyzed

400+

tools vulnerability-tested

844K+

llms.txt files investigated

Examples from the research

Each finding below directly shaped a criterion in the Standard.

Finding

43% of MCP servers have command injection vulnerabilities. 84% attack success rate via prompt injection against major coding agents.

standard response

Safety criteria are MUST gates. A tool that scores perfectly on documentation but fails safety cannot earn any designation. Security is non-negotiable.

Finding

Structured error messages improve agent recovery by 4.75x. 97.1% of MCP tool descriptions have quality defects. Adding examples boosts success from 11% to 76%.

standard response

Error handling and tool description quality are weighted as Critical (2x) in scoring. These aren't nice-to-haves. The data shows they're the difference between an agent that recovers and one that fails.

Finding

Good documentation improves agent accuracy by 91%. But more documentation is not better. Accuracy degrades 30%+ when relevant information sits in the middle of large context.

standard response

Documentation criteria evaluate structure and extractability, not volume. The Standard rewards answer-first layouts, semantic sections, and machine-readable formats over comprehensive reference dumps.

Additional metrics and research findings can be found throughout the Standard and Best Practices Guide.

04 — governance

Open and stable

The Zaira Standard is public and stable. Revisions are made only when durable shifts in agent-tool interaction warrant them, not in response to short-term trends. Certified tools receive at least 30 days advance notice before any version change takes effect, giving vendors time to adapt ahead of the transition.

The first version is purposefully v0.9 to signify the expected tuning based on pilot evaluation feedback before v1.0 is published.

current_version

v0.9 March 2026

Read the Standard v0.9

The measure ofagent-readiness

The Standard at a glance

What's measured

How it's scored

Where you stand

One standard, right-sized for every tool

Base Standard

How modules activate

Complex tool: payment API

Simple tool: open-source library

Want to know what this looks likein practice?

Built on evidence, not opinion

Examples from the research

Open and stable

The measure of
agent-readiness

Want to know what this looks like
in practice?