Skip to content

[refactor] Semantic function clustering: duplicate, outlier & scattered-helper consolidation across pkg/ #38748

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Repository: github/gh-aw — 2026-06-12

Clusters Go functions across pkg/ by name and purpose to surface near-duplicate logic, misplaced (outlier) functions, scattered helpers, and generics opportunities. Every finding was verified by reading the implementation; only high-confidence, actionable items are included. Intentional patterns (per-codemod files, *_wasm.go build-tag variants, constructors/method sets, dual Markdown/Pretty render pairs) were excluded as not problems.

Summary

Metric Value
Non-test .go files analyzed ~814 (workflow 400, cli 319, parser 43, console 26, agentdrain 10, utils 16)
Functions cataloged ~5,300
Confirmed findings 12 (3 duplicate · 2 outlier · 4 scattered-helper · 3 generics)
Overall health ✅ Already heavily refactored — these are incremental cleanups, not structural problems
1 Duplicate / Near-Duplicate Functions

1.1 Engine capability validatorspkg/workflow/agent_validation.go

  • (*Compiler).validateMaxTurnsSupport (:115) & validateMaxContinuationsSupport (:141) are ~90% identical: extract config → nil if unset → log → if !engine.GetCapabilities().<Flag> return "<feature> not supported...". Differ only in config field, capability flag, feature name.
  • Fix: extract validateCapabilitySupport(engine, name, isSet, enabled) error; both delegate. (Leave validateMaxToolDenialsSupport :163 — diverges with Copilot-SDK logic.)

1.2 events.jsonl scanner looppkg/cli

  • parseEventsJSONL (gateway_logs_timeline.go:396) & parseEventsJSONLFile (copilot_events_jsonl.go:173) both open file, set maxScannerBufferSize, loop scan→TrimSpace→skip non-{json.Unmarshal into copilotEventsJSONLEntry with the same malformed-line log. The second inlines the loop then accumulates metrics.
  • Fix: parseEventsJSONLFile calls parseEventsJSONL for entries, then builds LogMetrics.

1.3 Inline-section extractor + validator triplespkg/parser

  • ExtractInlineSkills (inline_skill_extractor.go:77) vs ExtractInlineSubAgents (sub_agent_extractor.go:222) are structurally identical (FindAllStringSubmatchIndex → validate-unique → trim → collectH2PositionsextractInlineSection loop); only regex + result type differ. Validation trios mirror 1:1 (:18/:28/:44 vs :103/:123/:143).
  • Fix: generic extractInlineSections[T](re, func(name,content) T) + shared validateFrontmatterFields(allowed, label); files already share extractInlineSection/collectH2Positions.
2 Outlier Functions (wrong file)

2.1 summarizeCommandOutput (pkg/cli/update_extension_check.go:547) — a generic trim+truncate-to-300+ellipsis helper, nothing version-check-specific; duplicates stringutil.Truncate (pkg/stringutil/stringutil.go:20).

  • Fix: stringutil.Truncate(strings.TrimSpace(output), 300), or move trim-first variant into stringutil.

2.2 Per-engine RenderMCPConfig boilerplate (5 files)gemini_mcp.go:12, claude_mcp.go:12, antigravity_mcp.go:12 (path ${RUNNER_TEMP}/...), opencode_mcp.go:12, crush_mcp.go:12 (path /tmp/...) all just log + delegate to renderDefaultJSONMCPConfig(...). Only the log prefix and one of two literal paths vary. (Codex/Copilot legitimately differ — keep theirs.)

  • Fix: shared renderStandardJSONMCPConfig(engineName, destPath) so each file is a one-line path declaration.
3 Scattered Helpers

3.1 Three K/M number abbreviatorspkg/cli: formatTokens (health_metrics.go:220, int), formatCompactAIC (logs_format_compact.go:386, float, 5 uses), formatForecastAIC (forecast.go:1390, float, 11 uses) — same magnitude-tiered abbreviation, differing thresholds/precision.

  • Fix: one abbreviateNumber(value float64, opts); all delegate.

3.2 Ad-hoc truncate vs stringutil.Truncatelogs_format_tsv.go:145 (displayTitle[:47]+"...") and update_extension_check.go:547 (2.1). Canonical helper exists and is tested.

  • Fix: replace with stringutil.Truncate.

3.3 Two GitHub-URL → owner/repo extractorsparseRepoFromURL (outcome_eval.go:342, simple strings.Cut) vs parseGitHubRepoSlugFromURL (git.go:58, host-aware enterprise/SSH).

  • Fix: parseRepoFromURL delegates to the complete parseGitHubRepoSlugFromURL, or document the host-agnostic intent.

3.4 Hand-rolled dedup vs sliceutilpkg/parser: uniqueClosestScopeSuggestions (schema_errors.go:415) hand-rolls a seen map while sibling schema_suggestions.go:316 already uses sliceutil.Deduplicate; mergeAllowedArrays (tools_merger.go:184) reimplements merge+dedup over []any strings.

  • Fix: use sliceutil.Deduplicate / sliceutil.MergeUnique.
4 Generics Opportunities (lower priority)

4.1 (GitHubAllowedTools).ToStringSlice (tools_types.go:249) & (GitHubToolsets).ToStringSlice (:264) — identical string(elem) loops over distinct ~string slices → func toStringSlice[T ~string](s []T) []string.

4.2 percentileInt/meanStdDevInt (forecast_montecarlo.go:272/251) & medianFloat (outcome_eval.go:308) differ only by element type → generic Percentile[T]/MeanStdDev[T]/Median[T] in a numstat helper. Low payoff (int versions use milli-AIC units) — optional.

4.3 FormatNumber (pkg/console/render.go:591) repeats the same 3-way precision block per k/M/B tier → factor scaleWithSuffix(f, divisor, suffix) (~40 lines → ~10).

Moderate / Optional
  • Package-validator skeletonvalidateNpxPackages (npm_validation.go:48), validatePipPackages/validateUvPackages (pip_validation.go:101/130) share loop→check→accumulate→NewValidationError; divergent registry semantics limit the win.
  • TSV renderersrenderLogsTSV (logs_format_tsv.go:23) & renderLogsTSVVerbose (:107) duplicate per-row fallbacks + insights trailer (:83-88 vs :176-181); extract tsvRowDefaults + writeTSVInsights.

Recommendations (prioritized)

P1 — high impact, low risk

  • Reuse stringutil.Truncate at the 3 sites (2.1, 3.2)
  • parseEventsJSONLFileparseEventsJSONL (1.2)
  • Single abbreviateNumber helper (3.1)
  • Dedup loops → sliceutil.Deduplicate/MergeUnique (3.4)

P2 — medium impact

  • validateCapabilitySupport helper (1.1)
  • Shared renderStandardJSONMCPConfig (2.2)
  • Generic extractInlineSections[T] + validateFrontmatterFields (1.3)
  • Consolidate GitHub-URL extractors (3.3)

P3 — optional

  • Generics 4.1 / 4.2 / 4.3 · package-validator & TSV scaffolding

Metadata

Detection: parallel semantic sweep (naming-pattern clustering + read-and-verify). Excluded as intentional: codemod_*.go, *_wasm.go, constructors/method sets, dual render pairs, per-feature families (audit_*, gateway_logs_*, compile_*, add_interactive_*). No misplaced functions found in pkg/workflow.

References: §27385605537

Generated by 🔧 Semantic Function Refactoring · 574.3 AIC · ⌖ 9.19 AIC · ⊞ 9.8K ·

  • expires on Jun 13, 2026, 4:20 PM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions