Skip to content

Expose active workspace identity (repo, handle, branch) to AI execution context #423

@chubes4

Description

@chubes4

Problem

Data Machine Code already tracks workspace identity end-to-end:

  • `WorktreeContextInjector` persists per-worktree metadata (`repo`, `branch`, `handle`, `path`, `origin_site`, `origin_agent`, `origin_user`, `origin_task`) in the `datamachine_worktree_metadata` site option and the `WorktreeInventoryRepository` table
  • Workspace abilities (`datamachine/workspace-*`) take a workspace `handle` input and resolve it through `WorkspaceAliasResolver` against that metadata
  • The CI runner (`homeboy-extensions/datamachine-agent-ci.yml`) configures a `runner_workspace` block describing the target repo to clone for an agent run

But none of this surfaces into AI execution payloads. When an AI step fires during a CI-driven agent run (e.g. `Automattic/docs-agent` against `Extra-Chill/extrachill-artist-platform`), the AI directives and the agent's tool calls cannot ask "which repo are we documenting?" because that identity lives only in DMC's workspace layer, never in DM's engine_data or directive payload.

Concrete consumer

`Extra-Chill/extrachill-docs#33` (just merged as #35) registers a `docs` agent execution mode that injects platform-wide voice rules into the AI's system context whenever `agent_modes: ['docs']` is active. The voice rules are network-wide and ship as-is.

The next layer — per-target context — needs to tell the agent "you are currently documenting the Artist Platform; your readers are musicians and artist managers" vs "you are currently documenting the Newsletter; your readers are subscribers managing their preferences." That per-target message is straightforward to compose from `runner-configs/platform-map.yml` in `extrachill-docs`, but the directive callback needs to know which repo the current run is against.

Today there is no public path to that knowledge in PHP. The runner_workspace config exists only in YAML/JSON at the homeboy-extensions layer; it never enters PHP runtime state.

Proposal

Expose active workspace identity into Data Machine's engine_data at agent-run start so any directive, ability, or tool call can read it.

Shape

When a workspace is bound to an AI execution context (CI runner setup, chat session referencing a workspace, system task running against a worktree), DMC writes a structured `active_workspace` entry into the job's engine_data:

```json
{
"active_workspace": {
"handle": "extrachill-artist-platform@docs-agent-run-12345",
"repo": "extrachill-artist-platform",
"owner": "Extra-Chill",
"full_name": "Extra-Chill/extrachill-artist-platform",
"branch": "docs-agent-run-12345",
"path": "/var/lib/datamachine/workspace/extrachill-artist-platform@docs-agent-run-12345",
"origin_site": "...",
"task_url": "..."
}
}
```

Directives and abilities read it via the standard `$engine->get('active_workspace')` accessor that already exists on the EngineData object (see `GitHubAbilities` reading `run_artifact_egress_policy` the same way).

Where the write happens

The natural injection point is wherever the CI runner currently configures `runner_workspace` to seed the worktree. Today that's the homeboy-extensions workflow constructing a runner config JSON. DMC needs a bootstrap hook that:

  1. Reads `runner_workspace` from the runner config (or any equivalent bootstrap argument when the runtime isn't CI — e.g. chat or system tasks)
  2. Resolves the matching `WorktreeContextInjector` metadata record (or constructs one for the bound workspace)
  3. Writes the `active_workspace` entry into engine_data via `EngineData::merge` at job start

Concretely this is probably a new `Runtime/WorkspaceBootstrap.php` class in DMC that hooks into job creation (Action Scheduler new-job event, or DM's `datamachine_job_start` action if one exists; if not, add one) and writes the entry when a workspace is in scope.

Generic, not docs-specific

The entry shape and the writer should be generic — nothing about docs, documentation, voice rules, or any consumer-specific concern leaks into DMC. `active_workspace` is just "here's the workspace this run is bound to," available to any consumer.

This issue is explicitly NOT about layering docs/voice/audience concerns into DMC. Those stay in their own plugin. DMC's job is to surface the workspace identity DMC already knows about.

Filter for extension

A filter like `datamachine_code_active_workspace` lets other plugins enrich the entry without forking DMC. Example downstream uses:

  • `extrachill-docs` reads `active_workspace.full_name`, looks up the matching `platform-map.yml` entry, and stacks per-target context onto its existing `docs` mode guidance
  • A future security ability uses `active_workspace.owner` to scope tool access
  • Audit logging includes the workspace identity in every recorded tool call

Why this belongs in DMC

DMC already owns:

  • Workspace registry (`WorkspaceRepositoryLifecycle`, `Workspace.php`)
  • Worktree metadata (`WorktreeContextInjector`)
  • Inventory storage (`WorktreeInventoryRepository`)
  • Alias resolution (`WorkspaceAliasResolver`)
  • The integration with the CI runner (`runner_workspace` JSON is consumed by DMC-supplied tools like `workspace_worktree_add`)

Surfacing the active workspace into engine_data closes the loop between DMC's workspace-side knowledge and DM's AI-side runtime. No other plugin has the right combination of metadata and lifecycle hooks to do this.

DM core stays generic. Consumers of `active_workspace` read it through the existing `EngineData` accessor; they don't need to know it came from DMC.

Implementation sketch

  • New file: `inc/Runtime/WorkspaceBootstrap.php` — single static class with a `bootstrap_for_job( int $job_id, array $context ): void` method that builds the `active_workspace` entry from available metadata and calls `EngineData::merge`
  • Trigger: hook into job-creation flow. If DM exposes a `datamachine_job_created` action, use it. If not, this issue depends on adding one to DM (small upstream change)
  • Runner config consumption: when the CI driver builds the WP runtime, it currently writes a runner config JSON containing `runner_workspace`. That JSON is read by the ci-driver fixture. The fixture should call `WorkspaceBootstrap::bootstrap_for_job` with the runner_workspace payload converted into the active_workspace shape
  • Filter: `apply_filters( 'datamachine_code_active_workspace', $entry, $context )` before persisting
  • Tests: unit tests for the shape; an end-to-end test that runs a fake job with a bound workspace and asserts the entry lands in engine_data

Acceptance criteria

  • `active_workspace` shape documented in `Runtime/WorkspaceBootstrap.php` docblock
  • Bootstrap fires at job start when a workspace is bound, no-ops otherwise
  • CI runner path (homeboy-extensions `runner_workspace` → DMC bootstrap) works end-to-end
  • Chat and system task paths (when an explicit handle is provided) also populate `active_workspace`
  • `datamachine_code_active_workspace` filter documented and tested
  • At least one downstream consumer demonstrated — likely `extrachill-docs` reading `full_name` to stack per-target context onto its `docs` mode
  • No DM core changes required, OR if a `datamachine_job_created` hook is needed, that's filed as a tiny upstream DM PR first
  • No regression: jobs without bound workspaces still run cleanly

Non-goals

  • Layering docs / voice / audience concerns into DMC. Stay generic.
  • Mutating workspace state from the directive layer. `active_workspace` is read-only context.
  • Persisting `active_workspace` past job completion. It's per-job runtime context.
  • Cross-job workspace sharing. One workspace identity per job.

Downstream blockers

This issue is the missing leg for per-target context in `Extra-Chill/extrachill-docs#33` (merged in #35 as platform-wide voice only). Once this lands, extrachill-docs adds a second callback on `datamachine_agent_mode_docs` at a later priority that reads `active_workspace.full_name`, looks up `runner-configs/platform-map.yml`, and stacks per-target context onto the same mode.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions