Make your AI coding agent test its own work. Automatically.
Quick Start · The Problem · Testing Pyramid · Installation · Experiment · Languages · Copy-Paste Prompt
Get CTRL running in any project in one command:
# Universal installer (auto-detects language)
curl -fsSL https://github.com/henrino3/ctrl/master/scripts/ctrl-bootstrap.sh | bash# Or via npx (Node.js projects)
npx close-the-loop initThat's it. Your AI agent now knows how to test its own code. ✅
AI coding agents ship broken code because they never check if it works.
Traditional software testing assumes a human will run the tests. But when an AI agent writes code autonomously — in Cursor, Copilot, Claude Code, Codex, or OpenClaw — nobody is checking. The agent writes, commits, and moves on.
"Code works well with AI because it's verifiable. You can compile it, run it, test it. That's the loop. You have to close the loop." — Peter Steinberger, creator of OpenClaw, who ships code he doesn't read and merged 600 commits in a single day
CTRL closes that loop. It gives your agent the instructions, commands, and CI pipeline to verify its own work — without a human in the middle.
┌─────────────┐
│ 🐳 Docker │ Cold-start deployment
│ Tests │ 5-10 min · CI/CD
├─────────────┤
│ 🌐 Live │ Real 3rd-party APIs
│ Tests │ 5-30 min · Pre-release
├───────────────┤
│ 🔗 E2E │ Routes, auth, DB state
│ Tests │ 1-5 min · Before push
├─────────────────┤
│ ⚡ Unit │ Pure logic, validation
│ Tests │ < 1 sec · Every commit
└───────────────────┘
| Layer | What it tests | Speed | Cost | Frequency |
|---|---|---|---|---|
| ⚡ Unit | Pure functions, validation, parsing, security | < 1s | Free | Every commit |
| 🔗 E2E | API routes, auth flows, DB queries, webhooks | 1-5 min | Free | Before push |
| 🌐 Live | Real APIs (Stripe, OpenAI, etc.), rate limits | 5-30 min | $ | Pre-release |
| 🐳 Docker | Full cold-start in clean container | 5-10 min | Free | CI/CD |
Start with Unit. It gives you 80% of the value at < 1% of the cost.
Works with any language — JS/TS, Python, Go, Rust, PHP, Ruby, Java, C#.
curl -fsSL https://github.com/henrino3/ctrl/master/scripts/ctrl-bootstrap.sh | bash📦 What it creates
The installer auto-detects your project's language and generates:
| File | Purpose |
|---|---|
AGENTS.md |
Build, test, and gate commands for your AI agent |
TESTING.md |
What to test, what NOT to test, testing conventions |
copilot-instructions.md |
Anti-redundancy rules, colocated test pattern |
.cursorrules / .clauderc |
Editor-specific agent instructions |
.github/workflows/ctrl.yml |
CI/CD pipeline that enforces the gates |
npx close-the-loop initgit clone https://github.com/henrino3/ctrl.git
cd ctrl
./scripts/ctrl-bootstrap.sh /path/to/your/project --mode mvp⚙️ Manual package.json setup
{
"scripts": {
"test:unit": "vitest run",
"test:e2e": "playwright test",
"test:live": "vitest run --config vitest.live.config.ts",
"test:docker": "./scripts/ctrl-docker-smoke.sh",
"ctrl:gate": "npm run build && npm run test:unit && npm run test:e2e",
"ctrl:full": "npm run ctrl:gate && npm run test:live && npm run test:docker"
},
"ctrl": {
"mode": "production"
}
}Two gates keep your agent honest:
# ⚡ Fast gate — run before every push
npm run ctrl:gate # build + unit + e2e
# 🏭 Full gate — run before release
npm run ctrl:full # gate + live + dockerFor Python: pytest, tox, etc. are configured automatically by the installer.
Not every project needs the same rigor.
| MVP Mode | Production Mode | |
|---|---|---|
| Use for | Demos, prototypes, rapid validation | Customer-facing, revenue-generating |
| Unit tests | Recommended | Mandatory |
| E2E tests | Optional | Mandatory |
| Coverage | None required | 60%+ critical paths |
| Gate | Build must pass | All gates must pass |
Rule of thumb: If failure would damage relationships, revenue, or reputation → Production. Otherwise → MVP.
Set the mode:
{ "ctrl": { "mode": "mvp" } }We didn't just build this — we proved it catches bugs.
- Project: Entity (Next.js + TypeScript monorepo)
- Tests written: 64 across 3 files
- Execution time: < 500ms
- Wrote comprehensive tests for 2 modules (64 tests)
- Verified all pass ✅
- Introduced 6 deliberate bugs — logic inversions, missing cases, off-by-ones, and a critical security bypass
- Ran tests to see what gets caught
| # | Bug Introduced | Type | Caught? |
|---|---|---|---|
| 1 | blog type returns 'prd' |
Logic inversion | ✅ |
| 2 | Henry agent detection removed | Missing case | ✅ |
| 3 | Tag filter >2 → >3 |
Off-by-one | ❌ |
| 4 | Path traversal bypass | 🚨 Security critical | ✅ |
| 5 | 'secret' removed from redaction list |
Missing config | ✅ |
| 6 | assertSourceEnabled inverted |
Logic inversion | ✅ |
The test caught a critical security vulnerability (path traversal bypass) that would have shipped silently without CTRL.
The one miss: a subtle off-by-one where we didn't test the boundary value. Lesson: always test boundaries.
The universal installer auto-detects and configures:
| Language | Test Runner | Gate Command |
|---|---|---|
| JavaScript / TypeScript | Vitest, Jest, Mocha | npm run ctrl:gate |
| Python | pytest, unittest, tox | pytest && tox |
| Go | go test |
go test ./... |
| Rust | cargo test |
cargo test |
| PHP | PHPUnit | phpunit |
| Ruby | RSpec, Minitest | rspec / rails test |
| Java | JUnit, Maven, Gradle | mvn test / gradle test |
| C# / .NET | xUnit, NUnit | dotnet test |
| File | Purpose | Why It Matters |
|---|---|---|
| AGENTS.md | Build, test, and dev commands | Agent knows exactly what commands to run |
| TESTING.md | What to test, conventions, anti-patterns | Agent knows how to write good tests |
| copilot-instructions.md | Colocated tests, anti-redundancy | Agent doesn't duplicate or skip |
| package.json / config | Gate scripts + mode setting | CI enforces the rules automatically |
Peter Steinberger's OpenClaw has 1,376 test files and a 21KB
AGENTS.md. The pattern works at scale.
1. Colocated tests → source.test.ts sits next to source.ts
2. Close the loop → Write code → run tests → fix failures → don't ask the human
3. Full gate on push → build + lint + test must all pass
4. Anti-redundancy → Search for existing helpers before creating new ones
Copy-paste this into any AI coding agent's system prompt:
📋 Click to expand the full prompt
You are working on my project which has a [YOUR STACK].
I want to implement the "Close the Loop" methodology, where you (the agent)
test and verify your own work autonomously via CLI commands, without needing
a human to check every change.
TESTING STRUCTURE
- Write a test file for every file you create or modify
- Colocate tests next to the source file they test
- Name them: source.test.ts next to source.ts
- Start with unit tests only
CLOSE THE LOOP
- After writing any code, run the tests via CLI before considering the task done
- If tests fail, fix the code and run again — do not ask me to check
- Only report back when tests are passing
COMMANDS TO RUN
- After writing code: run tests, linter, type checker
- Before any PR: run full gate (build + lint + test)
- Never push failing code
ANTI-REDUNDANCY
- Before creating any helper or utility, search for existing ones first
- If a function already exists, import it — do not duplicate it
- Extract shared test fixtures into test-helpers files when used in 3+ tests
CTRL works with any agent that reads instruction files:
| Agent / Editor | How It Works |
|---|---|
| OpenClaw / Pi | Reads AGENTS.md automatically. Geordi adapter runs ctrl:gate after every task. |
| Cursor | Reads .cursorrules for project-level instructions. |
| Claude Code | Reads AGENTS.md and .clauderc in the project root. |
| GitHub Copilot | Reads copilot-instructions.md for workspace rules. |
| Codex CLI | Reads AGENTS.md for build/test commands. |
- Speed matters. If tests take > 1 minute, the loop is too slow. Ours run in < 500ms.
- Quality > Quantity. 64 tests missed one bug because we didn't test boundaries. Edge cases are the differentiator.
- Colocated = discovered. When tests sit next to source files, agents naturally find and run them.
- Security bugs get caught. The most important bug in our experiment (path traversal) was caught instantly.
- It's not magic. It works because code is verifiable. The agent can objectively check if its work is correct.
This methodology was researched after Henry Mascot and Kinan Zayat reverse-engineered Peter Steinberger's approach by studying the OpenClaw codebase. Peter runs 3-8 AI agents in parallel and merged 600 commits in a single day — every agent writes tests, runs them, and only reports back when everything passes.
- 📝 Peter Steinberger — "Just Talk To It"
- 🎙️ The Pragmatic Engineer Podcast
- 📖 Blog Post — CTRL Testing Pyramid
- 🦞 OpenClaw GitHub
- 📋 Peter's AGENTS.md gist
Built by the Enterprise Crew 🚀
Ada 🔮 · Spock 🖖 · Scotty 🔧
MIT License · superada.ai