Skip to content

[refactor] Semantic function clustering: consolidate verified duplicates & clean up scattered helpersΒ #35822

@github-actions

Description

@github-actions

πŸ”§ Semantic Function Clustering Analysis

Automated analysis of github/gh-aw β€” clustering Go functions by name/purpose to surface duplicates, outliers, and scattered helpers.

A fan-out audit of the non-test Go sources in pkg/ (β‰ˆ800 files, ~4,500 top-level functions across pkg/workflow, pkg/cli, pkg/parser, pkg/console) was performed, followed by direct code verification of each candidate. The codebase is well-organized overall β€” engine files cluster cleanly by {engine}_{feature}.go, validation already lives in domain-specific *_validation.go files (with a documented validation_helpers.go consolidation), and config preprocessing is centralized. The findings below are the verified refactoring opportunities; low-signal and idiomatic patterns (interface implementations, build-tag stubs, New*/String()/MarshalJSON) were filtered out.

Summary

Metric Value
Verified duplicate clusters 4
Scattered-helper clusters 3
Dead / empty placeholder files 4
Overall organization βœ… Good β€” targeted cleanups only

Critical / High-Confidence Findings

1. Cross-package duplicate: resolveWorkflowEngineID (verified identical logic)

Two independent copies resolve the engine ID with the same EngineConfig.ID β†’ AI precedence, differing only in their nil/empty default:

  • pkg/workflow/observability_otlp.go:776 β€” returns "" on miss
  • pkg/cli/mcp_inspect.go:224 β€” returns "unknown" on miss

Recommendation: Export one canonical helper from pkg/workflow (e.g. ResolveEngineID) and have pkg/cli call it, applying the "unknown" fallback at the call site. Removes a silent drift risk between two packages.

2. Byte-identical YAML single-quote escapers (verified identical bodies)

Two functions are literally return strings.ReplaceAll(x, "'", "''"):

  • pkg/workflow/observability_otlp.go:24 β€” escapeYAMLSingleQuotedScalar
  • pkg/workflow/central_slash_command_workflow.go:500 β€” escapeSingleQuotedYAMLString

A third, semantically different variant uses backslash escaping: pkg/workflow/redact_secrets.go:22 escapeSingleQuote (\ then ' β†’ \').

Recommendation: Collapse the two identical YAML-doubling functions into one (e.g. escapeYAMLSingleQuoted in strings.go) and update both call sites. Keep the backslash variant separate but rename it to make the differing semantics explicit (e.g. escapeSingleQuoteBackslash).

3. Dead placeholder files (verified β€” 1 line each, only package cli)

  • pkg/cli/audit_report_render_guard.go
  • pkg/cli/audit_report_render_jobs.go
  • pkg/cli/audit_report_render_overview.go
  • pkg/cli/audit_report_render_tools.go

Recommendation: Delete them, or implement the intended split out of audit_report_render.go. As-is they imply structure that does not exist.

4. Near-duplicate JSON builders: buildCustomSafeOutput{Jobs,Actions}JSON (~90% identical, verified)

  • pkg/workflow/safe_outputs_config_helpers.go:24 β€” buildCustomSafeOutputJobsJSON
  • pkg/workflow/safe_outputs_actions.go:464 β€” buildCustomSafeOutputActionsJSON

Both normalize identifiers, sort keys, rebuild an ordered map, and json.Marshal. The only differences are the source field (SafeOutputs.Jobs vs .Actions), the map value ("" vs the normalized name), and the logger.

Recommendation: Extract a shared helper, e.g. buildNormalizedSortedJSON(names []string, valueFn func(string) string) string, and call it from both.

Code comparison (finding 4)
// safe_outputs_config_helpers.go:24
for jobName := range data.SafeOutputs.Jobs {
    normalizedName := stringutil.NormalizeSafeOutputIdentifier(jobName)
    jobMapping[normalizedName] = ""            // <- only real difference
}
// ... sort keys, rebuild ordered map, json.Marshal

// safe_outputs_actions.go:464
for actionName := range data.SafeOutputs.Actions {
    normalizedName := stringutil.NormalizeSafeOutputIdentifier(actionName)
    actionMapping[normalizedName] = normalizedName  // <- only real difference
}
// ... sort keys, rebuild ordered map, json.Marshal (identical)

Scattered Helpers (Medium Confidence)

5. Import-path resolution split across three implementations in pkg/cli
  • pkg/cli/imports.go:35 β€” resolveImportPath(importPath, workflowPath string) string (clean filepath + forward-slash normalization)
  • pkg/cli/run_push.go:375 β€” resolveImportPathLocal(importPath, baseDir string) string (adds logging, git-root handling, #Section stripping)
  • pkg/cli/dependency_graph.go:232 β€” (*DependencyGraph) resolveImportPath(importPath, baseDir string) string (adds file-existence checks + parser.ResolveIncludePath fallback)

Recommendation: Make imports.go the single core resolver; have the other two delegate to it and layer their extra behavior (logging, section stripping, existence checks) on top. Reduces behavioral drift between features.

6. marshalSorted: generic family in parser vs. ad-hoc copy in workflow
  • pkg/parser/frontmatter_hash.go:46-116 β€” generic marshalSorted / marshalSortedMap / marshalSortedSlice / marshalSortedValue (canonical sorted-key JSON for hashing)
  • pkg/workflow/action_cache.go:191 β€” (*ActionCache) marshalSorted() ([]byte, error) (hand-rolled sorted-key JSON for the same canonicalization purpose)

Recommendation: Promote the generic parser family to pkg/jsonutil (e.g. jsonutil.MarshalSorted) and have both parser and workflow/action_cache.go consume it. Single source of truth for canonical JSON.

7. String-sanitization helpers that belong in pkg/stringutil
  • pkg/cli/add_workflow_pr.go:30 β€” sanitizeBranchName(name string) string (generic, reusable)
  • pkg/cli/update_command.go:292 β€” sanitizeRepoPath(repo string) string (replaces special chars with __)

Recommendation: Move sanitizeBranchName to pkg/stringutil (it is pure and reusable); relocate sanitizeRepoPath too if it gains a second caller. Improves discoverability and avoids re-implementation.

8. Thin RenderMCPConfig wrappers across engine *_mcp.go files

Five *_mcp.go files (claude_mcp.go, gemini_mcp.go, crush_mcp.go, antigravity_mcp.go, opencode_mcp.go) each contain a ~3-line RenderMCPConfig that logs and delegates to renderDefaultJSONMCPConfig(...) with an engine-specific path. These are interface methods (so they cannot simply be deleted), but the boilerplate could be reduced with a small shared helper, e.g. renderEngineMCPConfig(log, path).

Confidence: Medium β€” borderline; only worth doing if MCP rendering grows. Listed for completeness.

Already Well-Organized (No Action Needed)

Confirmed-good clusters
  • Engine clustering β€” {engine}_{feature}.go split (copilot's _installation/_execution/_tools split is exemplary). Repeated names like GetExecutionSteps, RenderMCPConfig, GetModelEnvVarName are interface implementations, not duplicates.
  • Validation domain layer β€” domain-specific *_validation.go files with a documented validation_helpers.go that already consolidated 70+ duplicate patterns. *_validation_wasm.go stubs are idiomatic build-tag pairs, not duplicates.
  • Config preprocessing β€” config_preprocessing.go helpers correctly centralized and reused 50+ times.
  • Audit math helpers β€” audit_math_helpers.go is a clean, domain-scoped extraction.

Recommendations (Prioritized)

Priority 1 β€” High impact, low effort

  • Consolidate resolveWorkflowEngineID (finding 1)
  • Merge the two identical YAML escapers; rename the backslash variant (finding 2)
  • Delete or implement the 4 empty audit_report_render_*.go files (finding 3)
  • Extract shared helper for buildCustomSafeOutput{Jobs,Actions}JSON (finding 4)

Priority 2 β€” Medium impact

  • Unify the three import-path resolvers behind one core function (finding 5)
  • Promote marshalSorted family to pkg/jsonutil (finding 6)
  • Move sanitizeBranchName to pkg/stringutil (finding 7)

Priority 3 β€” Optional

  • Consider a shared RenderMCPConfig delegation helper (finding 8)

Each change should preserve behavior and be covered by existing tests; verify with make test / make lint before merging.

Analysis Metadata

  • Scope: non-test .go files in pkg/ (excluded *_test.go)
  • Method: 6 parallel semantic finder agents (by code area) β†’ direct source verification of each high-confidence candidate
  • Functions cataloged: ~4,500 (pkg/workflow, pkg/cli, pkg/parser, pkg/console)
  • Verified findings: 4 duplicate clusters, 3 scattered-helper clusters, 4 dead files
  • Analysis date: 2026-05-30

References: Β§26668482309

Generated by πŸ”§ Semantic Function Refactoring Β· opus48 2.3M Β· β—·

  • expires on Jun 1, 2026, 12:14 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions