Skip to content

feat(tdbg): schedule dedup and force-CAN commands#19

Open
chaptersix wants to merge 21 commits into
mainfrom
tools/schedule-spec-util
Open

feat(tdbg): schedule dedup and force-CAN commands#19
chaptersix wants to merge 21 commits into
mainfrom
tools/schedule-spec-util

Conversation

@chaptersix
Copy link
Copy Markdown
Owner

@chaptersix chaptersix commented Apr 9, 2026

What changed?

  • Added schedule dedup and schedule force-can subcommands to tdbg.
  • The implementation lives in tools/schedutil/ (library) and tools/tdbg/schedule_dedup_commands.go (CLI wiring). All operations go through the workflowservice gRPC API directly — no SDK client.
  • dedup targets a single schedule (--schedule-id), piped IDs from stdin, or all schedules in the namespace. Without --execute it writes before/after JSON artifacts and exits without modifying anything.
  • dedup --recreate handles schedules too degraded to process an update: reads StartScheduleArgs from workflow history, deduplicates proto-level StructuredCalendar entries, then (with --execute) deletes and recreates the schedule via CreateSchedule. High watermark resets to now; actions that would have fired during the degraded period will not fire.
  • force-can sends a force-continue-as-new signal to the scheduler workflow. Supports the same targeting model and dry-run toggle as dedup.
  • Dedup uses proto.Equal after normalizing range defaults (End, Step) to match cleanSpec semantics. Entries that differ only in proto default representation are treated as equal; entries with different Comment fields are treated as distinct.
  • Added FlagExecute and FlagRecreate to tools/tdbg/flags.go.

How did you test it?

  • built
  • added new unit tests (tools/schedutil/schedutil_test.go)
  • added new functional tests (tests/schedutil_test.go)
go test ./tools/schedutil/... ./tools/tdbg/... -count=1
go test -tags "disable_grpc_modules,test_dep" ./tests/ -run TestSchedUtil -count=1

Sample commands

# Single schedule — dry run (writes before/after JSON, no changes applied)
tdbg -namespace prod schedule dedup --schedule-id my-sched

# Single schedule — apply
tdbg -namespace prod schedule dedup --schedule-id my-sched --execute

# Subset via pipe
temporal schedule list -n prod -o json | jq -r '.[].scheduleId' | \
    tdbg -namespace prod schedule dedup --execute

# Recreate a degraded schedule (dry run then apply)
tdbg -namespace prod schedule dedup --schedule-id my-sched --recreate
tdbg -namespace prod schedule dedup --schedule-id my-sched --recreate --execute

# Force continue-as-new
tdbg -namespace prod schedule force-can --schedule-id my-sched --execute

@chaptersix chaptersix force-pushed the tools/schedule-spec-util branch from 34829cb to 74f51c5 Compare April 9, 2026 23:09
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
@chaptersix chaptersix force-pushed the tools/schedule-spec-util branch from ad6744a to 404b1c9 Compare April 10, 2026 16:12
@chaptersix chaptersix marked this pull request as ready for review April 10, 2026 17:36
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
@chaptersix chaptersix marked this pull request as draft April 13, 2026 16:22
@chaptersix chaptersix marked this pull request as ready for review April 13, 2026 16:28
Comment thread go.mod Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment on lines +346 to +349
attrs := resp.History.Events[0].GetWorkflowExecutionStartedEventAttributes()
if attrs == nil {
return errors.New("first event is not WorkflowExecutionStarted")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd actually want to look at the latest memo, not the first (so basically the top event that has a full schedule in it). Otherwise, we'd potentially miss an update, aside from the inadvertent accumulation of duplicates.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memo isn't enough, right? I would look from back to front for the last update signal payload. (And then replay any patch signals over that)

Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
// query size limit), deduplicates the spec, and (if execute) deletes the broken
// schedule and recreates it with the clean spec. Use when the workflow is too
// degraded to process an update signal.
func RunDedupRecreate(ctx context.Context, cl sdkclient.Client, namespace, scheduleID, outDir string, execute bool) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that the way we'll run this is with a small hand-evaluated file of schedules to target, versus "entire namespace", correct? One thing that might be useful as a failsafe would be to check and see if the schedule "looks" stuck (e.g., if the top events aren't "timer started" after "WFT completion"). I don't want to go overkill on a purpose-built tool like this, but I think the usefulness of the failsafe scales with the volume of input we plan to feed into it. WDYT?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'd only run this one a subset of schedules. Easiest indicator will be if it can't respond to a describe schedule request. For debup without recreate and execute it will iterate over all schedules in a namespace and describe them. it will return an error (probably not one that's easy to parse) if it can not describe the schedule. We can use recreate on those.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the piped input to run this on a larger subset.

Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
chaptersix and others added 9 commits April 13, 2026 14:44
Adds schedutil, a CLI tool with two commands:

  dedup      - prints the current spec as JSON, deduplicates
               StructuredCalendar and Interval entries client-side,
               then sends an UpdateSchedule. Works without any
               server-side fix.
  force-can  - sends force-continue-as-new to the scheduler workflow.

Three targeting modes:
  --schedule-id <id>     single schedule
  --ids-file <file>      file of IDs, one per line ('-' for stdin)
  (neither)              all schedules in the namespace

Namespace-wide and file modes require --yes; without it the command
lists affected schedules and exits.

Flags and env vars mirror tdbg (TEMPORAL_CLI_ADDRESS, TLS certs,
TEMPORAL_CLI_NAMESPACE, TEMPORAL_CONTEXT_TIMEOUT).
Co-authored-by: Lina Jodoin <lina.jodoin@temporal.io>
…n mode

Without --execute the command always describes each schedule, writes
before/after JSON files to a temp directory, and exits without applying
any changes. With --execute the same files are written and changes are
applied.

- Renames --yes to --execute
- RunDedup writes <ns>_<sid>-before.json and <ns>_<sid>-after.json to
  a shared temp dir regardless of execute flag
- RunForceCAN prints what would be signalled in dry-run mode
- ForEachSchedule drops the yes/execute param — callers own that logic
- Adds dry-run integration tests for both dedup and force-can
Removes --ids-file. Without --schedule-id, reads IDs one per line
from stdin if piped, otherwise operates on all schedules in the
namespace. Standard Unix behavior, no extra flag needed.
When a schedule workflow is too degraded to respond to queries or
signals (spec too large), --recreate reads the schedule state directly
from workflow history, deduplicates the spec, deletes the broken
workflow, and recreates the schedule fresh.

Reads StartScheduleArgs from the WorkflowExecutionStarted event using
payloads.Decode (binary/protobuf encoding used by the frontend).
Deduplicates StructuredCalendar and Interval entries using proto.Equal
directly on the proto types — no normalization needed since all entries
from workflow history are in identical wire form.

Preserves the full schedule state (action, policies, paused status,
remaining actions) since StartScheduleArgs carries the current mutable
state via ContinueAsNew arguments.

Adds unit tests for the proto dedup helpers and functional tests for
both dry-run and execute modes.
@chaptersix chaptersix force-pushed the tools/schedule-spec-util branch from 2cd70e1 to 0a0533b Compare April 13, 2026 19:44
Comment thread cmd/tools/schedutil/main.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment on lines +346 to +349
attrs := resp.History.Events[0].GetWorkflowExecutionStartedEventAttributes()
if attrs == nil {
return errors.New("first event is not WorkflowExecutionStarted")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memo isn't enough, right? I would look from back to front for the last update signal payload. (And then replay any patch signals over that)

Comment thread tools/schedutil/schedutil.go Outdated
@chaptersix chaptersix changed the title feat(tools): schedutil — schedule spec dedup and force-CAN feat(tdbg): schedule dedup and force-CAN commands Apr 13, 2026
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Comment thread tools/schedutil/schedutil.go Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants