🔧 Semantic Function Clustering Analysis
Repository: github/gh-aw — 2026-06-12
Clusters Go functions across pkg/ by name and purpose to surface near-duplicate logic, misplaced (outlier) functions, scattered helpers, and generics opportunities. Every finding was verified by reading the implementation; only high-confidence, actionable items are included. Intentional patterns (per-codemod files, *_wasm.go build-tag variants, constructors/method sets, dual Markdown/Pretty render pairs) were excluded as not problems.
Summary
| Metric |
Value |
Non-test .go files analyzed |
~814 (workflow 400, cli 319, parser 43, console 26, agentdrain 10, utils 16) |
| Functions cataloged |
~5,300 |
| Confirmed findings |
12 (3 duplicate · 2 outlier · 4 scattered-helper · 3 generics) |
| Overall health |
✅ Already heavily refactored — these are incremental cleanups, not structural problems |
1 Duplicate / Near-Duplicate Functions
1.1 Engine capability validators — pkg/workflow/agent_validation.go
(*Compiler).validateMaxTurnsSupport (:115) & validateMaxContinuationsSupport (:141) are ~90% identical: extract config → nil if unset → log → if !engine.GetCapabilities().<Flag> return "<feature> not supported...". Differ only in config field, capability flag, feature name.
- Fix: extract
validateCapabilitySupport(engine, name, isSet, enabled) error; both delegate. (Leave validateMaxToolDenialsSupport :163 — diverges with Copilot-SDK logic.)
1.2 events.jsonl scanner loop — pkg/cli
parseEventsJSONL (gateway_logs_timeline.go:396) & parseEventsJSONLFile (copilot_events_jsonl.go:173) both open file, set maxScannerBufferSize, loop scan→TrimSpace→skip non-{→json.Unmarshal into copilotEventsJSONLEntry with the same malformed-line log. The second inlines the loop then accumulates metrics.
- Fix:
parseEventsJSONLFile calls parseEventsJSONL for entries, then builds LogMetrics.
1.3 Inline-section extractor + validator triples — pkg/parser
ExtractInlineSkills (inline_skill_extractor.go:77) vs ExtractInlineSubAgents (sub_agent_extractor.go:222) are structurally identical (FindAllStringSubmatchIndex → validate-unique → trim → collectH2Positions → extractInlineSection loop); only regex + result type differ. Validation trios mirror 1:1 (:18/:28/:44 vs :103/:123/:143).
- Fix: generic
extractInlineSections[T](re, func(name,content) T) + shared validateFrontmatterFields(allowed, label); files already share extractInlineSection/collectH2Positions.
2 Outlier Functions (wrong file)
2.1 summarizeCommandOutput (pkg/cli/update_extension_check.go:547) — a generic trim+truncate-to-300+ellipsis helper, nothing version-check-specific; duplicates stringutil.Truncate (pkg/stringutil/stringutil.go:20).
- Fix:
stringutil.Truncate(strings.TrimSpace(output), 300), or move trim-first variant into stringutil.
2.2 Per-engine RenderMCPConfig boilerplate (5 files) — gemini_mcp.go:12, claude_mcp.go:12, antigravity_mcp.go:12 (path ${RUNNER_TEMP}/...), opencode_mcp.go:12, crush_mcp.go:12 (path /tmp/...) all just log + delegate to renderDefaultJSONMCPConfig(...). Only the log prefix and one of two literal paths vary. (Codex/Copilot legitimately differ — keep theirs.)
- Fix: shared
renderStandardJSONMCPConfig(engineName, destPath) so each file is a one-line path declaration.
3 Scattered Helpers
3.1 Three K/M number abbreviators — pkg/cli: formatTokens (health_metrics.go:220, int), formatCompactAIC (logs_format_compact.go:386, float, 5 uses), formatForecastAIC (forecast.go:1390, float, 11 uses) — same magnitude-tiered abbreviation, differing thresholds/precision.
- Fix: one
abbreviateNumber(value float64, opts); all delegate.
3.2 Ad-hoc truncate vs stringutil.Truncate — logs_format_tsv.go:145 (displayTitle[:47]+"...") and update_extension_check.go:547 (2.1). Canonical helper exists and is tested.
- Fix: replace with
stringutil.Truncate.
3.3 Two GitHub-URL → owner/repo extractors — parseRepoFromURL (outcome_eval.go:342, simple strings.Cut) vs parseGitHubRepoSlugFromURL (git.go:58, host-aware enterprise/SSH).
- Fix:
parseRepoFromURL delegates to the complete parseGitHubRepoSlugFromURL, or document the host-agnostic intent.
3.4 Hand-rolled dedup vs sliceutil — pkg/parser: uniqueClosestScopeSuggestions (schema_errors.go:415) hand-rolls a seen map while sibling schema_suggestions.go:316 already uses sliceutil.Deduplicate; mergeAllowedArrays (tools_merger.go:184) reimplements merge+dedup over []any strings.
- Fix: use
sliceutil.Deduplicate / sliceutil.MergeUnique.
4 Generics Opportunities (lower priority)
4.1 (GitHubAllowedTools).ToStringSlice (tools_types.go:249) & (GitHubToolsets).ToStringSlice (:264) — identical string(elem) loops over distinct ~string slices → func toStringSlice[T ~string](s []T) []string.
4.2 percentileInt/meanStdDevInt (forecast_montecarlo.go:272/251) & medianFloat (outcome_eval.go:308) differ only by element type → generic Percentile[T]/MeanStdDev[T]/Median[T] in a numstat helper. Low payoff (int versions use milli-AIC units) — optional.
4.3 FormatNumber (pkg/console/render.go:591) repeats the same 3-way precision block per k/M/B tier → factor scaleWithSuffix(f, divisor, suffix) (~40 lines → ~10).
Moderate / Optional
- Package-validator skeleton —
validateNpxPackages (npm_validation.go:48), validatePipPackages/validateUvPackages (pip_validation.go:101/130) share loop→check→accumulate→NewValidationError; divergent registry semantics limit the win.
- TSV renderers —
renderLogsTSV (logs_format_tsv.go:23) & renderLogsTSVVerbose (:107) duplicate per-row fallbacks + insights trailer (:83-88 vs :176-181); extract tsvRowDefaults + writeTSVInsights.
Recommendations (prioritized)
P1 — high impact, low risk
P2 — medium impact
P3 — optional
Metadata
Detection: parallel semantic sweep (naming-pattern clustering + read-and-verify). Excluded as intentional: codemod_*.go, *_wasm.go, constructors/method sets, dual render pairs, per-feature families (audit_*, gateway_logs_*, compile_*, add_interactive_*). No misplaced functions found in pkg/workflow.
References: §27385605537
Generated by 🔧 Semantic Function Refactoring · 574.3 AIC · ⌖ 9.19 AIC · ⊞ 9.8K · ◷
🔧 Semantic Function Clustering Analysis
Repository:
github/gh-aw— 2026-06-12Clusters Go functions across
pkg/by name and purpose to surface near-duplicate logic, misplaced (outlier) functions, scattered helpers, and generics opportunities. Every finding was verified by reading the implementation; only high-confidence, actionable items are included. Intentional patterns (per-codemod files,*_wasm.gobuild-tag variants, constructors/method sets, dual Markdown/Pretty render pairs) were excluded as not problems.Summary
.gofiles analyzed1 Duplicate / Near-Duplicate Functions
1.1 Engine capability validators —
pkg/workflow/agent_validation.go(*Compiler).validateMaxTurnsSupport(:115) &validateMaxContinuationsSupport(:141) are ~90% identical: extract config → nil if unset → log → if!engine.GetCapabilities().<Flag>return"<feature> not supported...". Differ only in config field, capability flag, feature name.validateCapabilitySupport(engine, name, isSet, enabled) error; both delegate. (LeavevalidateMaxToolDenialsSupport:163— diverges with Copilot-SDK logic.)1.2 events.jsonl scanner loop —
pkg/cliparseEventsJSONL(gateway_logs_timeline.go:396) &parseEventsJSONLFile(copilot_events_jsonl.go:173) both open file, setmaxScannerBufferSize, loop scan→TrimSpace→skip non-{→json.UnmarshalintocopilotEventsJSONLEntrywith the same malformed-line log. The second inlines the loop then accumulates metrics.parseEventsJSONLFilecallsparseEventsJSONLfor entries, then buildsLogMetrics.1.3 Inline-section extractor + validator triples —
pkg/parserExtractInlineSkills(inline_skill_extractor.go:77) vsExtractInlineSubAgents(sub_agent_extractor.go:222) are structurally identical (FindAllStringSubmatchIndex → validate-unique → trim →collectH2Positions→extractInlineSectionloop); only regex + result type differ. Validation trios mirror 1:1 (:18/:28/:44vs:103/:123/:143).extractInlineSections[T](re, func(name,content) T)+ sharedvalidateFrontmatterFields(allowed, label); files already shareextractInlineSection/collectH2Positions.2 Outlier Functions (wrong file)
2.1
summarizeCommandOutput(pkg/cli/update_extension_check.go:547) — a generic trim+truncate-to-300+ellipsis helper, nothing version-check-specific; duplicatesstringutil.Truncate(pkg/stringutil/stringutil.go:20).stringutil.Truncate(strings.TrimSpace(output), 300), or move trim-first variant intostringutil.2.2 Per-engine
RenderMCPConfigboilerplate (5 files) —gemini_mcp.go:12,claude_mcp.go:12,antigravity_mcp.go:12(path${RUNNER_TEMP}/...),opencode_mcp.go:12,crush_mcp.go:12(path/tmp/...) all just log + delegate torenderDefaultJSONMCPConfig(...). Only the log prefix and one of two literal paths vary. (Codex/Copilot legitimately differ — keep theirs.)renderStandardJSONMCPConfig(engineName, destPath)so each file is a one-line path declaration.3 Scattered Helpers
3.1 Three K/M number abbreviators —
pkg/cli:formatTokens(health_metrics.go:220, int),formatCompactAIC(logs_format_compact.go:386, float, 5 uses),formatForecastAIC(forecast.go:1390, float, 11 uses) — same magnitude-tiered abbreviation, differing thresholds/precision.abbreviateNumber(value float64, opts); all delegate.3.2 Ad-hoc truncate vs
stringutil.Truncate—logs_format_tsv.go:145(displayTitle[:47]+"...") andupdate_extension_check.go:547(2.1). Canonical helper exists and is tested.stringutil.Truncate.3.3 Two GitHub-URL → owner/repo extractors —
parseRepoFromURL(outcome_eval.go:342, simplestrings.Cut) vsparseGitHubRepoSlugFromURL(git.go:58, host-aware enterprise/SSH).parseRepoFromURLdelegates to the completeparseGitHubRepoSlugFromURL, or document the host-agnostic intent.3.4 Hand-rolled dedup vs
sliceutil—pkg/parser:uniqueClosestScopeSuggestions(schema_errors.go:415) hand-rolls aseenmap while siblingschema_suggestions.go:316already usessliceutil.Deduplicate;mergeAllowedArrays(tools_merger.go:184) reimplements merge+dedup over[]anystrings.sliceutil.Deduplicate/sliceutil.MergeUnique.4 Generics Opportunities (lower priority)
4.1
(GitHubAllowedTools).ToStringSlice(tools_types.go:249) &(GitHubToolsets).ToStringSlice(:264) — identicalstring(elem)loops over distinct~stringslices →func toStringSlice[T ~string](s []T) []string.4.2
percentileInt/meanStdDevInt(forecast_montecarlo.go:272/251) &medianFloat(outcome_eval.go:308) differ only by element type → genericPercentile[T]/MeanStdDev[T]/Median[T]in anumstathelper. Low payoff (int versions use milli-AIC units) — optional.4.3
FormatNumber(pkg/console/render.go:591) repeats the same 3-way precision block per k/M/B tier → factorscaleWithSuffix(f, divisor, suffix)(~40 lines → ~10).Moderate / Optional
validateNpxPackages(npm_validation.go:48),validatePipPackages/validateUvPackages(pip_validation.go:101/130) share loop→check→accumulate→NewValidationError; divergent registry semantics limit the win.renderLogsTSV(logs_format_tsv.go:23) &renderLogsTSVVerbose(:107) duplicate per-row fallbacks + insights trailer (:83-88vs:176-181); extracttsvRowDefaults+writeTSVInsights.Recommendations (prioritized)
P1 — high impact, low risk
stringutil.Truncateat the 3 sites (2.1, 3.2)parseEventsJSONLFile→parseEventsJSONL(1.2)abbreviateNumberhelper (3.1)sliceutil.Deduplicate/MergeUnique(3.4)P2 — medium impact
validateCapabilitySupporthelper (1.1)renderStandardJSONMCPConfig(2.2)extractInlineSections[T]+validateFrontmatterFields(1.3)P3 — optional
Metadata
Detection: parallel semantic sweep (naming-pattern clustering + read-and-verify). Excluded as intentional:
codemod_*.go,*_wasm.go, constructors/method sets, dual render pairs, per-feature families (audit_*,gateway_logs_*,compile_*,add_interactive_*). No misplaced functions found in pkg/workflow.References: §27385605537