@sovereign-labs/verify

Your agent says "done." But do you know how it fails?

Not whether it fails. How.

verify() runs your agent's edits against filesystem reality and tells you what the agent got wrong — the file, the line, the expected value, the actual value. No LLM in the verification path. The answer is not "probably."

Over time, these measurements accumulate into a reliability profile: how this agent fails on this codebase. What it hallucinates. Where it drops edits. Which patterns it repeats.

No other tool builds this model. Linters know your code has problems. Tests know your code produces wrong output. Verify knows why the agent was wrong.

Using verify? We'd love to hear what you're building. Join the discussion

See it in action

npx @sovereign-labs/verify demo

Three failure modes your current stack misses:

The Agent Said Done

The agent claims it saved a file. It didn't. Verify checks the filesystem.

Without verify:
  Agent says: "Report saved successfully."
  $ ls reports/weekly.md
  ls: cannot access 'reports/weekly.md': No such file or directory

With verify:
  Trace 1: Agent claims completion without creating the file.
  [FAIL] Filesystem gate: reports/weekly.md does not exist.
  Trace 2: Injecting constraints and re-running. Agent creates the file.
  [PASS] All gates passed (12 checks)

Wrong World Model

The agent writes valid CSS targeting a selector that doesn't exist. Verify knows what's actually in your code.

Without verify:
  $ grep '.profile-nav' server.js     # CSS rule exists
  $ grep -c 'class="profile-nav"'     # 0 — element doesn't exist

With verify:
  Trace 1: Agent uses selector .profile-nav
  [FAIL] Grounding: .profile-nav does not exist in source
  Trace 2: Agent uses a.nav-link — exists in reality.
  [PASS] All gates passed (12 checks)

The Silent Drift

The agent completed the task. But it also quietly changed your config. Verify catches the undeclared mutation.

Without verify:
  $ diff config.json.orig config.json
  - "darkMode": true
  + "darkMode": false
  - "analytics": false
  + "analytics": true

With verify:
  Trace 1: Agent edits server.js and config.json.
  [FAIL] Containment: 2 undeclared file mutations detected
  Trace 2: Agent edits server.js only.
  [PASS] All gates passed (11 checks)

Run all three: npx @sovereign-labs/verify demo --scenario=liar|world|drift

What it does

import { verify } from '@sovereign-labs/verify';

const result = await verify(edits, predicates, { appDir: './my-app' });
// result.success → true/false
// result.attestation → human-readable summary
// result.narrowing → what to try next (on failure)

26 checks run in sequence. First failure stops the pipeline and tells you exactly what went wrong.

Can the edit be applied? Does the search string exist in the file?
Is the edit safe? No XSS, no SQL injection, no leaked secrets, no broken accessibility.
Did the edit work? CSS selector has the right value. HTTP endpoint returns 200. Database column exists. File was created.
Did the edit break anything else? Health checks pass. File integrity holds. Config is consistent.

On failure: returns the problem + what to try next. On repeat failure: learns from mistakes — attempt N+1 won't repeat attempt N's error.

Install

npm install @sovereign-labs/verify
# or
bun add @sovereign-labs/verify

Quick Start

1. As a library

import { verify } from '@sovereign-labs/verify';

const result = await verify(
  // Edits: search-and-replace mutations
  [
    { file: 'server.js', search: 'color: blue', replace: 'color: red' },
    { file: 'server.js', search: 'Hello', replace: 'Welcome' },
  ],
  // Predicates: what should be true after the edits
  [
    { type: 'css', selector: 'h1', property: 'color', expected: 'red' },
    { type: 'content', file: 'server.js', pattern: 'Welcome' },
    { type: 'http', path: '/health', method: 'GET', expect: { status: 200 } },
  ],
  // Config
  { appDir: './my-app' }
);

if (result.success) {
  console.log(result.attestation);
} else {
  console.log(result.narrowing.resolutionHint);
}

2. Convergence loop — `govern()`

verify() is a single pass. govern() wraps it in a convergence loop — ground reality, plan, verify, narrow, retry. The agent learns from every failure.

import { govern } from '@sovereign-labs/verify';

const result = await govern({
  appDir: './my-app',
  goal: 'Change the button color to orange',
  maxAttempts: 3,

  // Your agent — one method: plan
  agent: {
    plan: async (goal, context) => {
      // context.grounding — CSS, HTML, routes, DB schema
      // context.narrowing — what failed last time and why
      // context.constraints — what's banned and why (K5)

      return {
        edits: [{ file: 'style.css', search: 'blue', replace: 'orange' }],
        predicates: [{ type: 'css', selector: '.btn', property: 'color', expected: 'orange' }],
      };
    },
  },
});

if (result.success) {
  console.log(`Converged in ${result.attempts} attempt(s)`);
} else {
  console.log(`Stopped: ${result.stopReason}`);
  // 'exhausted' | 'stuck' | 'empty_plan_stall' | 'approval_aborted'
}

3. As a CLI

npx @sovereign-labs/verify init          # Create .verify/check.json
npx @sovereign-labs/verify check         # Run verification
npx @sovereign-labs/verify demo          # See what it catches
npx @sovereign-labs/verify ground        # Scan CSS/HTML/routes
npx @sovereign-labs/verify self-test     # Run 2,800+ scenario harness
git diff | npx @sovereign-labs/verify check --diff   # Pipe git diff

4. As an MCP server

{
  "mcpServers": {
    "verify": {
      "command": "npx",
      "args": ["@sovereign-labs/verify", "mcp"]
    }
  }
}

Tools: verify_ground, verify_read, verify_submit

Multi-agent

Multiple agents editing the same codebase? Verify them in sequence — each agent sees the filesystem the previous agent left behind.

import { verifyBatch } from '@sovereign-labs/verify';

const result = await verifyBatch([
  { agent: 'planner', edits: [...], predicates: [...] },
  { agent: 'coder', edits: [...], predicates: [...] },
], { appDir: './my-app', stopOnFailure: true });

If Agent A's changes invalidate Agent B's predicates, the grounding gate catches it. No new infrastructure — the existing gates handle multi-agent conflicts naturally.

Beyond code edits

The checks are domain-agnostic:

File system agents — move, rename, organize files
Infrastructure agents — don't delete the production database
Communication agents — message the right channel, no forbidden content
Document agents — don't overwrite the wrong cells

Migration verification (DM-18)

When a PR contains .sql migration files, verify also runs the migration verification pipeline — a separate set of gates that parse the migration with libpg-query, replay the schema from prior migrations on the base branch, and check the new migration against that schema.

The first shipped rule is DM-18 (NOT NULL without safe preconditions): ADD COLUMN x NOT NULL without a DEFAULT, or SET NOT NULL on a nullable column with no default. Both will fail on any non-empty production table — the classic 3am migration failure.

Measured precision: 19 true positives, 0 false positives across 761 production migrations from cal.com, formbricks, and supabase. See MEASURED-CLAIMS.md for full methodology and reproduction steps.

DM-18 is the first vertical of verify's three-vertical product strategy (code-edit verification, database migration verification, HTTP contract verification). Eight other migration shapes (DM-01..05 grounding, DM-15..17, DM-19 safety) are implemented and shipping in CI as warnings while they're calibrated against the corpus. See the Database Migration Failures section of FAILURE-TAXONOMY.md for the full shape catalog.

Findings can be acknowledged in the migration file with -- verify: ack DM-XX <reason> to downgrade them to warnings (audit trail) rather than blocks.

Real-world validation: 33,056 agent PRs scanned

We scanned every PR in the AIDev-POP dataset — 33,056 real pull requests from 5 AI coding agents across 2,807 popular open-source repos. Deterministic pipeline, $0 cost, no LLM calls.

High-confidence structural finding rates:

Agent	PRs	Finding Rate	Top Issue
Devin	4,800	8.2%	Unbounded queries
Claude Code	457	8.5%	Path/permission
Copilot	4,496	4.8%	Path/permission
Cursor	1,539	4.4%	Unbounded queries
Codex	21,764	1.9%	Unbounded queries

3.4% of all agent PRs have high-confidence structural issues that existing CI doesn't catch. See METHODOLOGY.md for full details.

GitHub Action

- uses: Born14/verify@v0.8.2

Runs verify on every PR. Posts gate results as a comment. Three modes:

Structural (default, free) — diff-only analysis, no API key needed
Intent — extracts predicates from PR title/description (Gemini, OpenAI, or Anthropic)
Staging — Docker build + runtime verification

Full Documentation

FAILURE-TAXONOMY.md — Reference catalog of failure shapes verify's gates can detect, with calibration status per section. Includes the new Database Migration Failures section (DM-01..19).
MEASURED-CLAIMS.md — DM-18 measured precision (19 TP / 0 FP / 761 migrations) with full methodology and reproduction steps. The first shape in the taxonomy with a published false-positive rate.
REFERENCE.md — Gates, predicates, configuration, CLI, fault management
HOW-IT-WORKS.md — Architecture, the 8-stage autonomous loop, migration verification pipeline
METHODOLOGY.md — AIDev-POP scan methodology and reproducibility (separate from migration corpus methodology, which is in MEASURED-CLAIMS.md)
PARITY-GRID.md — 8×10 capability × failure class coverage matrix
ASSESSMENT.md — What verify is and isn't
ROADMAP.md — Current state and priorities
GLOSSARY.md — Terms and definitions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github/workflows		.github/workflows
data/aidev-scan		data/aidev-scan
demo		demo
dist/action		dist/action
experiments/n1-convergence-proof		experiments/n1-convergence-proof
fixtures		fixtures
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
FAILURE-TAXONOMY.md		FAILURE-TAXONOMY.md
GLOSSARY.md		GLOSSARY.md
HOW-IT-WORKS.md		HOW-IT-WORKS.md
LICENSE		LICENSE
METHODOLOGY.md		METHODOLOGY.md
PARITY-GRID.md		PARITY-GRID.md
README.md		README.md
REFERENCE.md		REFERENCE.md
SCANNER-INCIDENTS.md		SCANNER-INCIDENTS.md
action.yml		action.yml
bun.lock		bun.lock
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@sovereign-labs/verify

See it in action

The Agent Said Done

Wrong World Model

The Silent Drift

What it does

Install

Quick Start

1. As a library

2. Convergence loop — `govern()`

3. As a CLI

4. As an MCP server

Multi-agent

Beyond code edits

Migration verification (DM-18)

Real-world validation: 33,056 agent PRs scanned

GitHub Action

Full Documentation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@sovereign-labs/verify

See it in action

The Agent Said Done

Wrong World Model

The Silent Drift

What it does

Install

Quick Start

1. As a library

2. Convergence loop — govern()

3. As a CLI

4. As an MCP server

Multi-agent

Beyond code edits

Migration verification (DM-18)

Real-world validation: 33,056 agent PRs scanned

GitHub Action

Full Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Convergence loop — `govern()`

Packages