Skip to content

chubes4/wp-codebox

Repository files navigation

WP Codebox

WP Codebox unlocks secure WordPress code execution from anywhere. Run agents, accept untrusted patches, evaluate plugins, reproduce bugs, or experiment freely — every sandbox is a disposable WordPress Playground that can't touch its caller. Your host can be a CLI, CI job, mobile app, Node service, WordPress plugin, or anything else that can shell out or hit an API.

WordPress has historically lacked a clean scratch space for code execution. Modern dev workflows assume one — Node has npm install per project, Python has venvs, containers have ephemeral filesystems. WordPress Playground finally provides that primitive: real WordPress, PHP-in-WASM, fully ephemeral, no host filesystem access except via declared mounts. WP Codebox wraps Playground into a usable runtime contract so any product — WordPress or not — can offer code execution against a real WordPress instance without risking the caller.

WP Codebox is the runtime boundary for agent-built or workflow-built outputs. It is not the agent framework, the review UI, the deploy system, or the production site mutator. The WordPress plugin in this repo is one optional host adapter (useful when the host is a WordPress site); the core CLI/runtime works anywhere node can run.

Any host: CLI, CI, mobile, Node service, WP plugin, GitHub Action, ...
  -> WP Codebox
    -> disposable WordPress Playground sandbox
      -> mounted inputs, plugins, tools, and optional agent stack
      -> controlled commands or agent task
      -> artifact bundle
  -> review, replay, apply, export, or discard outside the sandbox

Product Use Cases

What you can build on top of WP Codebox:

  • Agentic coding against a WordPress site. Let users describe a change in chat — from any host: a WordPress plugin, a mobile app, a desktop tool, a Slack/Discord bot. Dispatch a sandbox with the target site's stack mounted, capture an artifact with a live Playground preview URL, open a PR via mounted GitHub tooling. The contributor never needs shell access.
  • Agent training and evaluation. Run the same WordPress task side by side across multiple models in isolated Playground workspaces. Capture each model's output, grade against hidden quality checks, and produce per-model PRs as review surface. See wp-gym.
  • Long-running terrariums. Boot a Playground that an agent evolves over time — software, content, configuration — with day-cycle automation driven from CI. See world-of-wordpress.
  • Static-site / WordPress-import factories. Generate raw HTML/CSS sites in CI, validate them via Playground + WordPress import, post Playground preview links as PR evidence. See wp-site-generator.
  • Untrusted patch evaluation. Plugin and theme authors can accept community-submitted patches, run them in a sandbox, capture artifacts (diffs, test results, screenshots), and review before merging. The reviewing tool can be anything.
  • "Try it in a sandbox first." Before installing a plugin, theme, or update on a production site, run it in a disposable Playground and see what happens.
  • Reproduction harness for bug reports. Ship a recipe with an issue so any contributor can replay the bug deterministically in a clean WordPress instance. The replay can be triggered from a CLI, a CI job, or a maintainer's IDE plugin.
  • Hosting provider integrations. "Test this change in a sandbox" buttons in admin dashboards or hosting panels.
  • CI/CD safety net. Dispatch a sandbox from a GitHub Action or other CI runner to evaluate a PR's runtime behavior against real WordPress before merging.
  • Mobile / native dev tools. A mobile coding assistant or desktop dev tool can drive sandboxes for real WordPress execution without bundling PHP locally.
  • Education. Real WordPress instances per student per exercise, fully disposable.
  • Security research. Detonate suspicious plugins, themes, or patches in isolation.

Runtime Capabilities

What WP Codebox provides for product use cases:

  • Run a PHP or WP-CLI probe against mounted WordPress code.
  • Execute a WordPress Ability inside a disposable Playground runtime.
  • Run repeatable workspace recipes that mount plugins, seed workspaces, and capture outputs.
  • Drive stateful runtime episodes with reset, step, observe, snapshot, artifact, and close operations.
  • Launch sandboxed Data Machine / Agents API coding-agent tasks from the CLI or WordPress ability surface.
  • Fan out several task descriptions into separate isolated sandboxes.
  • Produce artifact bundles — patches, diffs, test results, live Playground preview URLs — that a parent product can review, replay, apply, or discard.

Why A WordPress Plugin?

The WordPress plugin is useful when the host experience should live inside a WordPress site. It turns WP Codebox into a WordPress-native control plane for reviewed coding workflows without asking users to run a terminal.

Good fits include:

  • Letting non-technical site owners request, review, and approve bounded coding changes from a WordPress UI or chat surface.
  • Running a managed coding community where users submit tasks and maintainers approve sandboxed artifacts before anything reaches real code.
  • Supporting WordPress contributors with disposable reproduction, patch, review, and test-result artifacts that can be shared before commit or PR creation.

These are host-product use cases. The core runtime still works anywhere the CLI can run.

Repo Components

These are local workspace components in this repo. The CLI and WordPress plugin now have package artifact validation, but publication and release automation are still explicit release-manager steps.

  • packages/runtime-core: backend-agnostic runtime interfaces and shared types.
  • packages/runtime-playground: WordPress Playground backend adapter.
  • packages/cli: source for the local wp-codebox CLI used through npm run wp-codebox.
  • packages/wordpress-plugin: parent-site WordPress ability surface.

Local Setup

npm install
npm run build
npm run check

npm run check runs the TypeScript build, policy validation smoke test, WordPress plugin smoke test, and a real Playground-backed CLI smoke test.

Distribution Artifacts

The CLI package is prepared as @chubes4/wp-codebox-cli and exposes the wp-codebox binary from packages/cli/dist/index.js.

npm run build
npm pack --workspace @chubes4/wp-codebox-cli --dry-run --json

The WordPress plugin zip is built from packages/wordpress-plugin with only the installable plugin files under a top-level wp-codebox/ directory.

npm run package:wordpress-plugin
unzip -Z1 packages/wordpress-plugin/dist/wp-codebox.zip

npm run package-distribution-smoke validates both artifact shapes. It checks that the CLI pack includes package.json, README.md, and compiled dist/ files without TypeScript source, then builds the WordPress plugin zip and checks that it contains the plugin bootstrap, README, and PHP sources without package metadata or generated artifacts.

Versioning and release policy:

  1. Release the workspace packages together from one git tag so @chubes4/wp-codebox-cli, @chubes4/wp-codebox-core, and @chubes4/wp-codebox-playground stay on the same version.
  2. Keep packages/wordpress-plugin/wp-codebox.php Version: aligned with the package version used for the matching plugin zip.
  3. Treat the npm package and plugin zip as one release unit: publish the CLI, build the plugin zip from the same commit, and attach the zip to the release.
  4. Use conventional semver: patch for fixes and docs-only distribution updates, minor for new commands or artifact fields, major for runtime contract breaks.

Install notes by environment:

  1. Self-hosted WordPress control planes should install the CLI on the same host that runs PHP, install packages/wordpress-plugin/dist/wp-codebox.zip as the parent-site plugin, then set wp_codebox_bin to the resolved wp-codebox binary path. Component paths can be supplied through wp_codebox_component_paths or the matching filter.
  2. Studio or local development environments can run from a checkout with npm install, npm run build, and npm run wp-codebox -- ...; install the plugin zip into the local parent site and point wp_codebox_bin at either the global binary or the checkout wrapper command used by that site.
  3. Hosted control planes should provide WP Codebox as managed infrastructure: deploy the vetted CLI package and plugin zip from the same release, configure the binary and component paths centrally with options or filters, and expose only the parent-site abilities to products.

CLI binary discovery from the plugin is intentionally host-configurable. The runner resolves the binary from ability input first, then wp_codebox_bin, then the wp_codebox_bin filter; multisite installs read the option from network options because the executable path is host-level configuration.

Release checklist:

  1. Run npm run check from a clean checkout.
  2. Review npm pack --workspace @chubes4/wp-codebox-cli --dry-run --json before publishing the CLI package.
  3. Build packages/wordpress-plugin/dist/wp-codebox.zip with npm run package:wordpress-plugin and inspect unzip -Z1 packages/wordpress-plugin/dist/wp-codebox.zip.
  4. Confirm package and plugin versions are aligned on the release commit.
  5. Install the CLI in the target environment and configure the WordPress plugin wp_codebox_bin option or filter to the resolved wp-codebox binary path.
  6. Install the plugin zip on the parent site and run the WordPress plugin smoke or equivalent ability registration check in that environment.

Quick Start

Run PHP inside a disposable WordPress Playground runtime with a local plugin mounted:

npm run wp-codebox -- run \
  --mount ./examples/simple-plugin:/wordpress/wp-content/plugins/simple-plugin \
  --command wordpress.run-php \
  --arg code-file=./examples/simple-plugin/probe.php \
  --artifacts ./artifacts \
  --json

Expected shape:

{
  "success": true,
  "runtime": {
    "backend": "wordpress-playground",
    "status": "destroyed"
  },
  "execution": {
    "command": "wordpress.run-php",
    "exitCode": 0
  },
  "artifacts": {
    "id": "artifact-bundle-sha256-...",
    "contentDigest": "...",
    "directory": "./artifacts/runtime-...",
    "manifestPath": "./artifacts/runtime-.../manifest.json",
    "blueprintAfterPath": "./artifacts/runtime-.../blueprint.after.json",
    "capturedMountsPath": "./artifacts/runtime-.../files/mounted-files.json",
    "diffsPath": "./artifacts/runtime-.../files/diffs.json",
    "changedFilesPath": "./artifacts/runtime-.../files/changed-files.json",
    "patchPath": "./artifacts/runtime-.../files/patch.diff",
    "testResultsPath": "./artifacts/runtime-.../files/test-results.json"
  }
}

WP Codebox boots Playground lazily on the first command, captures artifacts after execution, and disposes the runtime when the run completes.

For interactive review, pass --preview-hold <duration> to keep the live Playground server available briefly after artifact capture. The command emits artifacts.preview.url and files/review.json includes a matching top-level preview object with lifecycle and expiry details.

npm run wp-codebox -- run \
  --mount ./examples/simple-plugin:/wordpress/wp-content/plugins/simple-plugin \
  --command wordpress.run-php \
  --arg code-file=./examples/simple-plugin/probe.php \
  --artifacts ./artifacts \
  --preview-hold 15m \
  --json

The v1 preview is a held live Playground runtime. When --preview-hold is omitted, the preview field still records the URL observed during capture, but the runtime is destroyed on command completion and the URL is marked expired-on-completion. Artifact replay from blueprint.after.json remains partial and is a separate future preview mode.

When a caller exposes the local Playground through a tunnel or proxy, pass --preview-public-url <url> to report that public URL in artifacts.preview.url, metadata.json, and files/review.json. WP Codebox also passes the same URL to Playground as site-url, so WordPress-generated links can align with the public preview where Playground supports that option. The local Playground URL remains recorded as preview.localUrl.

Remote preview access still requires an external tunnel or proxy. WP Codebox does not claim true bind-host support: a --preview-bind style option depends on upstream WordPress Playground exposing a host/bind API. Track upstream support in WordPress/wordpress-playground#3681.

Runtime Episodes

Use createRuntimeEpisode() when a caller needs a stateful sandbox loop instead of a one-shot command or recipe. The episode wrapper is generic: it records reset observations, step executions, optional per-step observations, snapshots, and artifact bundles without knowing benchmark, reward, or scenario semantics.

import { createRuntimeEpisode } from "@chubes4/wp-codebox-core"
import { createPlaygroundRuntimeBackend } from "@chubes4/wp-codebox-playground"

const episode = await createRuntimeEpisode(
  {
    runtime: {
      backend: "wordpress-playground",
      environment: { kind: "wordpress", version: "7.0", blueprint: { steps: [] } },
      policy: {
        network: "deny",
        filesystem: "readwrite-mounts",
        commands: ["wordpress.wp-cli", "wordpress.run-php"],
        secrets: "none",
        approvals: "never",
      },
    },
    stepObservation: { type: "runtime-info" },
  },
  createPlaygroundRuntimeBackend(),
)

await episode.step({ command: "wordpress.wp-cli", args: ["command=post list"] })
const artifacts = await episode.collectArtifacts({ includeLogs: true })
const trace = await episode.trace()
await episode.close()

Products such as eval harnesses can project this generic episode trace into their own action, observation, reward, and report schemas outside WP Codebox.

CLI Commands

commands

Discover the supported runtime and recipe commands without launching Playground.

npm run wp-codebox -- commands --json

JSON output uses wp-codebox/command-catalog/v1 and includes each command id, description, accepted args, known output shape, and runtime policy requirement. Human output is a concise command list.

schema recipe

Print the JSON Schema for wp-codebox/workspace-recipe/v1 without reading a recipe or launching Playground.

npm run wp-codebox -- schema recipe --json

The schema covers runtime, inputs.mounts, inputs.workspaces, inputs.extra_plugins / inputs.extraPlugins, inputs.secretEnv, workflow.steps, and artifacts.

run

Run one command in a disposable runtime.

npm run wp-codebox -- run \
  --mount <host-path>:<sandbox-path>[:readonly|readwrite] \
  --command <command> \
  --arg <key=value> \
  --preview-public-url <public-tunnel-url> \
  --json

Supported runtime commands today:

  • inspect-mounted-inputs: list mounted input entries from inside Playground.
  • wordpress.run-php: run PHP; accepts code=<php> or code-file=<path>.
  • wordpress.wp-cli: run WP-CLI; accepts command='wp option get home' or plain args.
  • wordpress.ability: execute a registered WordPress Ability; accepts name=<ability> and optional JSON input=<object>.

wordpress.run-php loads /wordpress/wp-load.php by default. Use --arg bootstrap=none for raw PHP.

wordpress.wp-cli automatically enables Playground's wp-cli extra library when the command is allowed by runtime policy.

WP Codebox defaults to WordPress 7.0 because the agent and AI plugin stacks need the modern WordPress AI surface. Override with --wp trunk, --wp nightly, or another supported Playground version.

--preview-public-url is metadata and site-url alignment only; it does not make Playground listen on a public interface. Use a tunnel/proxy for remote access.

recipe validate

Validate a workspace recipe without launching Playground.

npm run wp-codebox -- recipe validate \
  --recipe ./examples/recipes/simple-plugin.json \
  --json

Validation checks schema, source paths, extra plugin entrypoints, workspace seeds, supported workflow commands, JSON ability inputs, and command arguments.

Use recipe validate when you only need a pass/fail contract for authoring or CI. It does not resolve the execution plan beyond validation summaries.

recipe-run

Run a repeatable recipe.

npm run wp-codebox -- recipe-run \
  --recipe ./examples/recipes/simple-plugin.json \
  --preview-hold 15m \
  --preview-public-url https://example-tunnel.test/ \
  --json

Recipes are JSON declarations for a sandbox setup plus workflow steps. They can mount existing directories, create disposable plugin/theme workspaces, activate extra plugins, allow-list selected secret environment variable names, and capture the output as artifacts.

Pass --dry-run --json to validate the same recipe and emit the resolved plan without booting Playground, creating temp workspaces, mounting files, executing commands, or writing artifacts:

npm run wp-codebox -- recipe-run \
  --recipe ./examples/recipes/simple-plugin.json \
  --dry-run \
  --json

Dry-run output uses wp-codebox/recipe-run-dry-run/v1 and includes resolved mounts, planned workspaces, extra plugins, workflow steps with parsed and resolved command args, allowed secret environment variable names without values, and per-step policy status. Use recipe-run without --dry-run when you want the real Playground-backed execution and artifact bundle.

Mount entries may include opaque metadata that is preserved in runtime observations, artifact provenance, and captured mount files. Product-specific callers can use this to map sandbox paths back to source repositories without WP Codebox knowing about the caller's deployment environment:

{
  "source": "/srv/site/wp-content/plugins/example",
  "target": "/wordpress/wp-content/plugins/example",
  "mode": "readwrite",
  "metadata": {
    "kind": "component",
    "slug": "example-plugin",
    "repo": "org/example-plugin",
    "default_branch": "main",
    "repo_root_relative_to_mount": ".",
    "editable": true
  }
}

Example recipes:

  • examples/recipes/simple-plugin.json: mount and probe the fixture plugin.
  • examples/recipes/wp-cli.json: prove WP-CLI commands mutate the same runtime observed by later steps.
  • examples/recipes/seeded-plugin-workspace.json: create a disposable plugin scaffold, mutate it, and capture diffs.
  • examples/recipes/datamachine-agent-bundle.json: mount Agents API and Data Machine, then import a bundle through wordpress.ability.

Supported workspace seeds:

  • plugin_scaffold: creates <slug>.php and README.md, mounted by default at /wordpress/wp-content/plugins/<slug>.
  • theme_scaffold: creates style.css, index.php, and README.md, mounted by default at /wordpress/wp-content/themes/<slug>.
  • directory: copies seed.source into a disposable workspace and requires an explicit sandbox target.

Bench Recipes

Run wordpress.bench from a recipe workflow to execute plugin tests/bench/*.php workloads in a disposable WP Codebox runtime and emit the Homeboy-compatible BenchResults envelope.

npm run wp-codebox -- recipe-run \
  --recipe ./examples/recipes/bench-plugin.json \
  --json

Each workload file returns a callable. The callable may return numeric metrics directly or a payload with metrics and metadata keys. The recipe output reports duration percentiles, custom metric aggregates, peak memory, runtime artifacts, and the parsed benchResults object in JSON output when a single wordpress.bench step runs.

agent-runtime-probe

Boot a sandbox with Agents API, Data Machine, and Data Machine Code mounted, then verify the stack loads.

npm run wp-codebox -- agent-runtime-probe \
  --agents-api ../agents-api \
  --data-machine ../data-machine \
  --data-machine-code ../data-machine-code \
  --json

agent-sandbox-run

Run one natural-language task through a sandboxed agent stack.

npm run wp-codebox -- agent-sandbox-run \
  --agents-api ../agents-api \
  --data-machine ../data-machine \
  --data-machine-code ../data-machine-code \
  --agent sandbox-agent \
  --task "Add a Dry Rub filter to the wing locations map" \
  --provider openai \
  --model gpt-5.5 \
  --json

Useful options:

  • --provider-plugin <path>: mount an AI provider plugin. Repeatable.
  • --secret-env <NAME>: expose a parent process environment variable by name, such as --secret-env GITHUB_TOKEN. Repeatable. Values are read from the process environment and are not accepted in JSON payloads.
  • --mount <host:vfs[:mode]>: mount extra task inputs.
  • --session-id <id>: continue an existing sandbox conversation session.
  • --max-turns <n>: bound the agent loop.

--code and --code-file still exist on the CLI for operator/debug use after the agent stack boots. They are not accepted by the parent-site wp-codebox/run-agent-task ability.

agent-sandbox-batch

Run several task descriptions, one isolated sandbox per task, with bounded concurrency.

npm run wp-codebox -- agent-sandbox-batch \
  --agents-api ../agents-api \
  --data-machine ../data-machine \
  --data-machine-code ../data-machine-code \
  --task "Fix issue A" \
  --task "Investigate issue B" \
  --concurrency 2 \
  --json

Use --tasks-json or --tasks-file when the task list is generated by another system.

Artifact Bundles

Artifact capture is owned by WP Codebox because WP Codebox knows what was mounted, what executed, and what must survive teardown.

Current bundles include:

  • manifest.json: artifact index with content types and the content digest used for the bundle id.
  • metadata.json: runtime, policy, mounts, and caller metadata.
  • blueprint.after.json: partial WordPress Playground replay blueprint for captured text files.
  • blueprint.after-notes.json: replay limitations and next capture targets.
  • events.jsonl, commands.jsonl, observations.jsonl: runtime evidence streams.
  • logs/runtime.log, logs/commands.log: human-readable logs.
  • files/mounts.json: mounted input list.
  • files/mounted-files.json: captured readwrite mount files with size, SHA-256, target path, and replayability metadata.
  • files/changed-files.json: canonical changed-files manifest for review and apply-back consumers.
  • files/patch.diff: canonical combined text patch for changed readwrite mounts that declare a baseline.
  • files/test-results.json: normalized test-results artifact with schema, summary counts, suites, and raw log references. When WP Codebox has not run test-aware commands, the artifact is present with status: "unknown", zero counts, an empty suites array, and pointers to raw command logs instead of inferred pass/fail data.
  • files/review.json: frontend-oriented review payload with summary, progress labels, changed file labels, evidence links, and approval actions.
  • files/diffs.json: diff index for readwrite mounts that declare a baseline.
  • files/diffs/<mount>.patch: unified text diff from a seeded baseline to the sandbox output.
  • files/mounts/<index>/...: copied file contents from readwrite mounts.

metadata.json points to the canonical changed-files, patch, test-results, review, and mount-diff artifact paths under artifacts. It also includes provenance derived from data WP Codebox already has: task input/context where available, WP Codebox runtime version, WordPress version, mounted component/mount metadata, and agent/provider/model fields passed to the sandbox runner. files/diffs/<mount>.patch remains available for per-mount detail; files/patch.diff is the combined review/apply-back patch surface.

files/test-results.json

files/test-results.json is the normalized contract for future test-aware commands. The artifact exists even when no command produced structured test output, so artifact consumers can read one stable path and treat status: "unknown" as "no structured test result was captured."

{
  "schema": "wp-codebox/test-results/v1",
  "status": "unknown",
  "summary": { "total": 0, "passed": 0, "failed": 0, "skipped": 0, "unknown": 0 },
  "suites": [],
  "rawLogReferences": [
    { "path": "commands.jsonl", "kind": "commands-jsonl" },
    { "path": "logs/commands.log", "kind": "commands-log" }
  ]
}

Artifact bundle ids are content-addressed for the apply-back contract. The runtime writes manifest.id as artifact-bundle-sha256-<digest>, where <digest> is SHA-256 over the exact bytes of files/changed-files.json and files/patch.diff with the wp-codebox/artifact-content/v1 domain separator. The same value is exposed as manifest.contentDigest.value, metadata.contentDigest.value, the CLI artifacts.contentDigest field, and files/review.json evidence. Approval and apply-back consumers must recompute it before trusting an approved artifact.

files/review.json

files/review.json is the frontend contract for chat and owner review flows. It is derived from canonical artifacts and should be safe for a generic frontend to render without parsing unified diffs.

{
  "schema": "wp-codebox/artifact-review/v1",
  "artifactId": "artifact-bundle-...",
  "provenance": {
    "task": { "kind": "agent-sandbox-run", "input": "Add a Dry Rub filter..." },
    "runtime": {
      "backend": "wordpress-playground",
      "version": "0.0.0",
      "wordpressVersion": "7.0"
    },
    "agent": { "agent": "sandbox-agent", "provider": "openai", "model": "gpt-5.5" },
    "mounts": [
      {
        "type": "directory",
        "source": "/path/to/data-machine-code",
        "target": "/wordpress/wp-content/plugins/data-machine-code",
        "mode": "readonly",
        "metadata": { "kind": "component", "slug": "data-machine-code" }
      }
    ]
  },
  "summary": "Sandbox produced changes in 1 file.",
  "stats": { "added": 1, "modified": 0, "deleted": 0, "total": 1 },
  "changedFiles": [
    {
      "path": "/wordpress/wp-content/plugins/example/generated.txt",
      "status": "added",
      "label": "added generated.txt",
      "mountTarget": "/wordpress/wp-content/plugins/example",
      "relativePath": "generated.txt"
    }
  ],
  "progress": [
    { "type": "boot", "label": "Spinning up a test copy of your site..." },
    { "type": "artifact", "label": "Saving the result for review..." },
    { "type": "complete", "label": "Ready for your review." }
  ],
  "actions": [
    { "kind": "approve", "label": "Approve all changes", "requiresApprovedFiles": true },
    { "kind": "approve-files", "label": "Approve selected files", "requiresApprovedFiles": true },
    { "kind": "discard", "label": "Discard changes" },
    { "kind": "iterate", "label": "Request changes" }
  ],
  "evidence": {
    "patch": "files/patch.diff",
    "patchSha256": "...",
    "artifactContentDigest": "...",
    "changedFiles": "files/changed-files.json",
    "testResults": "files/test-results.json"
  },
  "riskFlags": []
}

Review actions are declarative. Frontends call wp-codebox/apply-approved-artifact with artifact_id and an explicit approved_files[] list for approve actions, call wp-codebox/discard-artifact for discard, and start a new sandbox task for iterate/request-changes flows.

Binary files and oversized files are copied when allowed by capture limits but are not embedded into blueprint.after.json. Database exports, option diffs, uploaded media, active plugin/theme state, screenshots, parsed test command output, and redaction guarantees are still future artifact targets.

WordPress Plugin

The WordPress plugin registers parent-site abilities:

  • wp-codebox/run-agent-task
  • wp-codebox/run-agent-task-batch
  • wp-codebox/list-artifacts
  • wp-codebox/get-artifact
  • wp-codebox/discard-artifact
  • wp-codebox/apply-approved-artifact
  • wp-codebox/stage-artifact-apply

These abilities shell out to the local wp-codebox CLI, boot disposable Playground sandboxes, mount the configured agent stack, invoke the sandbox agent through agents/chat, and return artifact metadata.

wp-codebox/run-agent-task accepts the stable task input contract below. Existing callers that still send task as a string are normalized to the same shape with goal populated from task. It rejects raw code and code_file fields so frontend/chat callers cannot pass arbitrary PHP through the product ability path. Operators can still use CLI debug commands directly when they need raw PHP probes.

{
  "schema": "wp-codebox/task-input/v1",
  "goal": "Add a focused product feature and return a reviewable patch.",
  "target": {
    "kind": "plugin",
    "path": "wp-content/plugins/example"
  },
  "allowed_tools": ["workspace.read", "workspace.write", "tests.run"],
  "expected_artifacts": ["patch", "tests", "review"],
  "policy": {
    "applyBack": "reviewed"
  },
  "context": {
    "issue": "https://github.com/chubes4/wp-codebox/issues/29"
  }
}

target.kind is caller-defined but should use repo, site, plugin, or theme when possible. allowed_tools and expected_artifacts are advisory contract fields for the product caller and sandboxed agent loop; the host runner still decides the private CLI invocation and returns the normalized task_input alongside the string task it passed into the current CLI bridge.

Component paths can come from ability input, the wp_codebox_component_paths option, or the wp_codebox_component_paths filter.

Expected component keys:

  • agents_api
  • data_machine
  • data_machine_code
  • provider_plugins (optional list)

The CLI binary can come from ability input, the wp_codebox_bin option, or the wp_codebox_bin filter.

Data Machine Code is a mounted coding-tools component inside the sandbox. It provides workspace/file/GitHub tools to the sandboxed agent. WP Codebox owns the parent-site ability surface, sandbox lifecycle, and artifact capture boundary.

Apply-back is intentionally not part of run-agent-task. Sandbox execution returns proposed outputs and evidence. list-artifacts, get-artifact, and discard-artifact manage captured artifact bundles under the configured artifact root. apply-approved-artifact validates artifact_id plus an explicit approved_files[] list against canonical changed-files.json, recomputes the artifact content digest from changed-files.json and the exact patch.diff the reviewer approved, and delegates to the wp_codebox_apply_approved_artifact filter. PR creation, direct deploy, package export, and bot identity policy live in adapters behind that reviewed boundary.

When Data Machine is present, stage-artifact-apply stages that same apply input as a Data Machine pending action with kind wp_codebox_apply_back. Its preview includes files/review.json, canonical changed files, normalized test results, and the explicit approved file list. Accepting the pending action resolves through Data Machine's generic resolver and calls the existing apply-approved-artifact path; rejecting it leaves the artifact untouched. This keeps approval lifecycle in Data Machine without making WP Codebox depend on Homeboy or any site-specific apply adapter.

Runtime Policy

RuntimePolicy declares the intended sandbox boundary and command allow-list.

const policy = {
  network: "deny",
  filesystem: "readwrite-mounts",
  commands: ["wordpress.run-php"],
  secrets: "none",
  approvals: "never",
}

Current enforcement:

Field Status
commands Enforced by assertRuntimeCommandAllowed() before backend execution.
network Shape validated; real network enforcement is backend/control-plane work.
filesystem Shape validated; mount/write enforcement is backend/control-plane work.
secrets Shape validated; selected env var injection is allow-list based today.
approvals Shape validated; product approval UX is still separate work.

Boundaries

WP Codebox owns:

  • Disposable runtime lifecycle.
  • Mounting inputs into Playground.
  • Controlled command execution.
  • Runtime event and artifact capture.
  • Parent-site ability surface for launching sandbox tasks.

WP Codebox does not own:

  • Agent identity, sessions, or model loop internals. Agents API and Data Machine own those.
  • Model provider authentication. Provider plugins and parent control planes own credentials.
  • Production mutation or deploy. Apply-back must be separate and reviewed.
  • CI/eval orchestration. Homeboy, wp-gym, or other consumers can invoke WP Codebox.
  • Frontend review UX. WP Codebox should produce renderable artifacts for those UIs.

Near-Term Gaps

  • Define redaction guarantees.
  • Define multi-user sandbox session lifecycle, retention, quotas, cancellation, and audit records.
  • Define reviewed apply-back adapters for bot-authored PRs, direct apply, and package export.
  • Add visual previews, parsed test command output, and richer risk flags to frontend review payloads.

Development Notes

  • Keep the runtime contract consumer-agnostic. Data Machine, Homeboy, Studio, wp-gym, and WordPress.com are consumers or mounted tools, not owners of the core artifact contract.
  • Prefer small seams: runtime lifecycle, command handlers, artifact capture, recipes, WordPress integration, and apply-back should stay separate.
  • When adding a new command or artifact type, update this README and npm run check.

About

Disposable WordPress Playground sandboxes for securely isolated agentic workflows.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors