Skip to content

feat(engine): add datamachine_engine_snapshot filter at job init#2071

Merged
chubes4 merged 1 commit into
mainfrom
feat-engine-snapshot-filter
May 18, 2026
Merged

feat(engine): add datamachine_engine_snapshot filter at job init#2071
chubes4 merged 1 commit into
mainfrom
feat-engine-snapshot-filter

Conversation

@chubes4
Copy link
Copy Markdown
Member

@chubes4 chubes4 commented May 18, 2026

Adds a one-line extension point to `RunFlowAbility::execute()` so downstream plugins can enrich the engine snapshot at job initialization without forking DM or building parallel persistence machinery.

Motivation

DM core has no way for an external plugin to inject runtime context into a job's `engine_data` at the moment the job is created. Today the only public path is the `initial_data` input on `run-flow` — which is fine when the caller knows the context, but useless for cases where the context lives in a third plugin that the caller doesn't know about.

Concrete driver: Data Machine Code already tracks every workspace and worktree's identity (`WorktreeContextInjector`), but that knowledge never reaches AI directives, abilities, or tool calls during a job because there's no point in the run-flow pipeline where DMC can stamp "hey, this job is bound to `extrachill-artist-platform@docs/agent-run-123`" into `engine_data`. See Extra-Chill/data-machine-code#423.

What this adds

One `apply_filters()` call at the right spot:

```php
$filtered_snapshot = apply_filters( 'datamachine_engine_snapshot', $engine_snapshot, $job_id, $flow, $pipeline );
if ( is_array( $filtered_snapshot ) ) {
$engine_snapshot = $filtered_snapshot;
}
```

Fires immediately before `datamachine_set_engine_data()`. Filters receive the assembled snapshot, the job ID, and the source flow + pipeline rows. Returning a modified array enriches the snapshot. Returning a non-array silently preserves the prior snapshot so a misbehaving filter cannot corrupt `engine_data`.

Why this shape

Alternative considered Why rejected
New schema / table for "runtime context" Massive overkill for what's a one-time-per-job context attachment
New CLI surface No reason — the filter is invisible to operators
New ability Same job/snapshot context, abilities can't observe the in-flight assembly
Action hook only (no return value) Filters compose better — multiple plugins can stack enrichments
Filter after persist Too late — `engine_data` is already the source of truth by then

The filter pattern matches existing DM conventions (`datamachine_directives`, `datamachine_memory_files`, `datamachine_agent_modes`, `datamachine_agent_mode_{slug}`). One line, fully backward-compatible — DM runs unchanged with no callbacks registered.

Downstream consumers

Smoke testing

`php -l` clean. Pattern is structurally identical to the existing `datamachine_directives` filter loop already in DM; no new code paths.

End-to-end smoke (a filter callback that mutates `engine_data` and an AI step reading it back) is exercised by the companion DMC PR's projector against the artist-platform docs-agent pilot.

Scope

One file, one filter, 24 net lines (mostly the docblock). No tests added in this PR because the change is purely a passthrough; the actual behavior under test belongs to the downstream consumer.

Refs Extra-Chill/data-machine-code#423
Refs Extra-Chill/extrachill-docs#31

Adds a single filter hook in RunFlowAbility::execute() that fires once
per job, immediately before the engine snapshot is persisted. Filters
receive the assembled snapshot, the job ID, and the source flow + pipeline
rows. Returning a modified array enriches engine_data with context that
lives outside DM core.

Driving use case: Data Machine Code projects active workspace identity
(repo, handle, branch, path) into engine_data so AI directives, abilities,
and tool calls can answer "which repo is this job operating against" —
context DMC already collects but never surfaces into the AI runtime. See
Extra-Chill/data-machine-code#423.

Pattern is intentionally tiny — one apply_filters() call, no new classes,
no schema changes, no CLI surface. The filter is the public extension
point; downstream plugins own the semantics of what they project. DM
core stays agnostic about workspace, repo, or any consumer-specific
concept.

Defensive: filter return value is validated as array; non-array returns
silently preserve the prior snapshot so a misbehaving filter cannot
corrupt engine_data.

Refs Extra-Chill/data-machine-code#423
Refs Extra-Chill/extrachill-docs#31
@homeboy-ci
Copy link
Copy Markdown
Contributor

homeboy-ci Bot commented May 18, 2026

Homeboy Results — data-machine

Lint

lint — passed

ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 3f8abcb

Test

test — passed

ℹ️ No impacted tests found for --changed-since 3f8abcb
ℹ️ Run full suite if needed: homeboy test data-machine
Deep dive: homeboy test data-machine --changed-since 3f8abcb

Audit

audit — passed

  • requested_detectors — 6 finding(s)
  • Total: 6 finding(s)

Deep dive: homeboy audit data-machine --changed-since 3f8abcb

Tooling versions
  • Homeboy CLI: homeboy 0.182.0+1f8c0496
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: dd47f26a
  • Action: unknown@unknown

@chubes4 chubes4 merged commit 839f8f1 into main May 18, 2026
5 checks passed
@chubes4 chubes4 deleted the feat-engine-snapshot-filter branch May 18, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant