feat: skill command — auto-generated prompt matches hand-crafted benchmark#123
feat: skill command — auto-generated prompt matches hand-crafted benchmark#123jonathanpopham wants to merge 15 commits intosupermodeltools:mainfrom
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdds a new CLI subcommand Changes
Sequence Diagram(s)(Skipped — changes are primarily tests, scripts, docs, and a small CLI command; no new multi-component sequential flow requiring visualization.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cmd/skill.go (1)
45-47: UseRunEwithcmd.OutOrStdout()for better testability and output handling.Right now,
fmt.Printlnjust dumps text to stdout without checking if the write succeeded. In Cobra, it's cleaner to useRunE(which can return errors) paired withcmd.OutOrStdout(). Here's why it matters:
- It's testable: Tests can inject a mock writer instead of hitting real stdout
- It respects output redirection: Users can pipe the output or redirect it without issues
- It handles errors: If the pipe breaks or write fails, you'll know about it instead of silently ignoring it
Think of it like the difference between throwing a letter into the void vs. getting a delivery confirmation—
cmd.OutOrStdout()gives you feedback that the write actually worked.Suggested diff
- Run: func(cmd *cobra.Command, args []string) { - fmt.Println(skillPrompt) - }, + RunE: func(cmd *cobra.Command, args []string) error { + _, err := fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) + return err + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/skill.go` around lines 45 - 47, Replace the anonymous Run handler with RunE so the command can return write errors and use the command's output writer; specifically, change the current Run function (the block containing fmt.Println(skillPrompt)) to RunE and write the prompt to cmd.OutOrStdout(), capturing and returning any error (e.g., call fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) and return the error) so tests can inject writers and write failures are surfaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/skill.go`:
- Around line 34-50: The skill command is missing from the noConfigCommands
exemption and is being blocked by the PersistentPreRunE preflight; add "skill"
to the noConfigCommands map in cmd/root.go so the command created in init() (the
cobra.Command with Use: "skill" that prints skillPrompt) can run without config.
Locate the noConfigCommands map in cmd/root.go and add the same string key used
for other exempt commands (e.g., "version", "help") so the PersistentPreRunE
allows the skill command to execute on fresh installs.
---
Nitpick comments:
In `@cmd/skill.go`:
- Around line 45-47: Replace the anonymous Run handler with RunE so the command
can return write errors and use the command's output writer; specifically,
change the current Run function (the block containing fmt.Println(skillPrompt))
to RunE and write the prompt to cmd.OutOrStdout(), capturing and returning any
error (e.g., call fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) and return the
error) so tests can inject writers and write failures are surfaced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| func init() { | ||
| c := &cobra.Command{ | ||
| Use: "skill", | ||
| Short: "Print agent awareness prompt for graph files", | ||
| Long: `Prints a prompt that teaches AI coding agents how to use Supermodel's | ||
| graph files. Pipe into your agent's instructions: | ||
|
|
||
| supermodel skill >> CLAUDE.md | ||
| supermodel skill >> AGENTS.md | ||
| supermodel skill >> .cursorrules`, | ||
| Args: cobra.NoArgs, | ||
| Run: func(cmd *cobra.Command, args []string) { | ||
| fmt.Println(skillPrompt) | ||
| }, | ||
| } | ||
|
|
||
| rootCmd.AddCommand(c) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether `skill` is exempted from config/API-key preflight checks.
rg -n -C4 '\bnoConfigCommands\b|PersistentPreRunE' cmd/root.go
rg -n -C2 '"skill"' cmd/root.goRepository: supermodeltools/cli
Length of output: 874
🏁 Script executed:
cat cmd/root.go | sed -n '14,30p'Repository: supermodeltools/cli
Length of output: 846
🏁 Script executed:
sed -n '14,40p' cmd/root.goRepository: supermodeltools/cli
Length of output: 1225
Add skill to the noConfigCommands map in cmd/root.go.
The skill command is blocked by the PersistentPreRunE preflight check because it's not listed in noConfigCommands (lines 16-30). Since skill just prints an informational prompt with zero setup or API dependencies, it should be exempted so users can run supermodel skill on fresh installs without config. Add it to the map like the other no-config commands (version, help, etc.).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/skill.go` around lines 34 - 50, The skill command is missing from the
noConfigCommands exemption and is being blocked by the PersistentPreRunE
preflight; add "skill" to the noConfigCommands map in cmd/root.go so the command
created in init() (the cobra.Command with Use: "skill" that prints skillPrompt)
can run without config. Locate the noConfigCommands map in cmd/root.go and add
the same string key used for other exempt commands (e.g., "version", "help") so
the PersistentPreRunE allows the skill command to execute on fresh installs.
…askKey - restore/render.go: countTokens used len(text)/4 (bytes) instead of utf8.RuneCountInString(text)/4 (runes) — inflated estimate for any text containing multi-byte characters (CJK, emoji, accented chars) - api/client.go: HTTP error snippet was truncated at byte offset 300, risking invalid UTF-8 when a multi-byte character straddles that boundary - setup/wizard.go: maskKey sliced the key at byte positions 8 and len-4; same class of bug fixed elsewhere in the codebase Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… estimate uses bytes - find/handler.go: dedupSorted used ss[:1] as the output head, sharing the backing array with the input slice. Appends to out overwrote the caller's map value. Fixed by making an explicit copy before sorting. Adds TestDedupSorted_DoesNotMutateInput regression test. - focus/handler.go: reachableImports iterated all relationships for every node in the BFS queue (O(depth × queue × rels)). Pre-index imports edges by source node to reduce the BFS to O(rels + visited). - focus/handler.go: estimateTokens used len(s)/4 (bytes) instead of utf8.RuneCountInString(s)/4 (runes), inflating the hint for non-ASCII paths or function names. - Adds TestCountTokens_MultiByteChars to guard the rune-count fix in restore/render.go (landed in previous commit). - Adds wizard_test.go with full coverage for maskKey including a multi-byte rune regression test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If io.Copy succeeded but src.Close() failed, the close error was silently discarded. Preserve the intent of not using defer (to avoid accumulating open handles across Walk iterations) while returning the close error when no copy error occurred. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d shards - api: add GraphFromShardIR tests (Nodes, Rels, RepoID, NodeByID, NodesByLabel, empty) - blastradius: new zip_test.go covering isGitRepo, isWorktreeClean, walkZip (hidden files, skip dirs, createZip round-trip) - cache: add PutJSON/GetJSON round-trip, miss, and overwrite tests - mcp: add boolArg and intArg tests - render/funcs: add 23 tests for totalTime, formatDuration, seq, dict, reverseStrings, minInt/maxInt, length, entity accessors (fieldAccess, sectionAccess, getStringSlice, hasField, getInt, getFloat), jsonMarshal/toJSON, defaultVal/ternary/hasKey, parseQuantity, parseUnit, parseIngredient*, fractionDisplay, scaleQty - shards: new graph_test.go (25 tests for Cache.Build, SourceFiles, FuncName, TransitiveDependents, computeStats, isShardPath, firstString, intProp) - shards: extend render_test.go (CommentPrefix, ShardFilename, Header, sortedUnique, sortedBoolKeys, formatLoc, renderDepsSection, renderImpactSection, RenderGraph, WriteShard, updateGitignore) Coverage deltas: render 11%→45%, blastradius 32%→60%, cache 61%→78% Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- setup: add tests for boolPtr, detectCursor, installHook (fresh install, idempotency, existing settings preservation, invalid JSON guard) - schema: add NewGenerator, GenerateWebSiteSchema, GenerateBreadcrumbSchema, GenerateFAQSchema, GenerateItemListSchema, MarshalSchemas tests (21% → 51%) - output: new output_test.go with tests for GenerateRobotsTxt (allow-all, standard bots, extra bots), GenerateManifest (JSON validity), NewSitemapEntry (basic, root path, trailing slash), chunkEntries, GenerateSitemapFiles (single file, multiple files with index, XML validity) (23% → 74%) - taxonomy: add toStringSlice, HubPageURL, LetterPageURL, FindEntry, ComputePagination (single page, multi-page, empty) tests (21% → ~50%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…functions - config: add tests for ShardsEnabled (nil/true/false), applyEnv (SUPERMODEL_API_KEY, SUPERMODEL_API_BASE, SUPERMODEL_SHARDS=false), applyDefaults (from file) - focus: add extractTypes tests (class detection, other-file exclusion) and extract with includeTypes=true integration - build: add shareImageURL, countTaxEntries, countFieldDistribution (basic, limit, empty), toBreadcrumbItems tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- compact: nextShortName now at 100% (two-char overflow, skip existing, skip builtins) - deadcode: new zip_test.go covers isGitRepo, isWorktreeClean, walkZip, createZip (0% → 65%) - mcp: new zip_test.go covers same zip helpers (26% → 41%) - graph2md: pure function tests for getStr, getNum, mermaidID, generateSlug (100% each); Run-based tests for Class/Type/Domain/Subdomain/Directory nodes (40% → 64%) - output: GenerateRSSFeeds with disabled/main/default-path/category-feed cases (74% → 87%) - render: BuildFuncMap smoke test; sliceHelper with all 3 slice types + passthrough; shareimage functions svgEscape, truncate, renderBarsSVG, all Generate*ShareSVG variants (45% → 70%) - build: toTemplateHTML, writeShareSVG, maybeWriteShareSVG with enabled/disabled (6% → 7.5%) - auth: Logout_AlreadyLoggedOut branch (17% → 19%) - setup: detectClaude with ~/.claude present (23% → 25%) - analyze: TestIsWorktreeClean_NonGitDir added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- schema: GenerateRecipeSchema (basic + author/times/nutrition) and GenerateCollectionPageSchema (50% → 87%) - taxonomy: BuildAll (basic, min_entities filter, multi-value, empty), extractValues (single, missing, enrichment), getEnrichmentOverrides (simple, missing, array path, not-array cases) → 51% → 97% - shards/render: RenderAll tests (empty, writes shards, dry-run, unknown file) → 36% → 43% - shards/zip: new zip_test.go covering isShardFile, matchPattern, shouldInclude (basic/dir/ext/shard/minified/large), buildExclusions (no config, custom config, invalid JSON) - find/zip: add TestIsWorktreeClean_NonGitDir to cover previously-uncovered branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…overage Adds 11 375 lines of new tests spanning 39 files, covering previously untested packages and edge cases throughout the codebase. Highlights: - internal/audit: new test file from 0% → 85.5% (RenderHealth, RenderRunPrompt, RenderImprovePrompt, Analyze, EnrichWithImpact, and all health-scoring helpers) - internal/mcp: 80.9% → 85.6% via cache-hit tests for toolDeadCode and toolBlastRadius (pre-populate JSON cache, bypass API calls) - internal/archdocs/graph2md: 2 451-line test file covering all major node-type renderers, mermaid diagram generation, and arch-map output - internal/shards: daemon UUID, graph traversal, WriteShard, zip helpers - internal/restore, focus, find, compact, cache, config, auth, blastradius, deadcode, setup, status: targeted tests for all reachable pure functions and edge-case branches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…laude Code Includes Dockerfiles, entrypoints, task definition, run scripts, raw output logs, summary, and blog post draft showing 40% cost reduction and 4x speed improvement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes accidentally committed macOS and Python cache files from benchmark/.
Adds .DS_Store, __pycache__/*.pyc, cover*.out, and *.graph.{go,py} to .gitignore.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Revised the generic skill prompt based on benchmark trace analysis. Three changes: teach the .graph naming convention so agents construct paths directly, bold the read-order directive, and tell agents to check graph files before grepping for structure. Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns) Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns
Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (14)
internal/graph/handler_test.go-367-383 (1)
367-383:⚠️ Potential issue | 🟡 MinorFallback-label test is too weak right now.
At Line 381, checking for
"node-without-name"anywhere can pass even if fallback labeling breaks, because the node ID is always emitted as the DOT node key.
Please assert the label field specifically.Suggested assertion tightening
func TestWriteDOT_NodeWithNoNameUsesID(t *testing.T) { @@ out := buf.String() - if !strings.Contains(out, "node-without-name") { - t.Errorf("node ID should appear as fallback label:\n%s", out) + expected := `"node-without-name" [label="node-without-name"];` + if !strings.Contains(out, expected) { + t.Errorf("node ID should be used as fallback label, got:\n%s", out) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/graph/handler_test.go` around lines 367 - 383, The test TestWriteDOT_NodeWithNoNameUsesID is too weak because it only checks for the node ID anywhere in the DOT output; update the assertion to verify the DOT node's label attribute specifically by inspecting the string emitted by writeDOT for the node key (e.g., "node-without-name") followed by a label assignment like [label="..."] or label=<...>; change the check to look for the exact label field (for example assert that the node line contains something like `node-without-name` followed by `label="node-without-name"` or the appropriate DOT label syntax) so the test fails if labeling falls back to the ID incorrectly.internal/archdocs/pssg/taxonomy/taxonomy_test.go-44-49 (1)
44-49:⚠️ Potential issue | 🟡 MinorAvoid direct indexing before asserting taxonomy count
Line 44 and Line 61 index
taxes[0]without first assertinglen(taxes). IfBuildAllbehavior shifts, these tests will panic instead of failing with a clear message.Suggested fix
func TestBuildAll_MinEntitiesFilter(t *testing.T) { @@ tc := config.TaxonomyConfig{Name: "cuisine", Field: "cuisine", MinEntities: 2} taxes := BuildAll(entities, []config.TaxonomyConfig{tc}, nil) + if len(taxes) != 1 { + t.Fatalf("expected 1 taxonomy, got %d", len(taxes)) + } @@ if len(taxes[0].Entries) != 1 { t.Errorf("expected 1 entry (only Japanese), got %d", len(taxes[0].Entries)) } @@ func TestBuildAll_MultiValue(t *testing.T) { @@ tc := config.TaxonomyConfig{Name: "tags", Field: "tags", MultiValue: true, MinEntities: 1} taxes := BuildAll(entities, []config.TaxonomyConfig{tc}, nil) + if len(taxes) != 1 { + t.Fatalf("expected 1 taxonomy, got %d", len(taxes)) + } @@ if len(taxes[0].Entries) != 3 { t.Errorf("multi-value: expected 3 entries, got %d: %v", len(taxes[0].Entries), taxes[0].Entries) } }Also applies to: 61-63
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 44 - 49, The test indexes taxes[0] before asserting the length; modify the test in taxonomy_test.go to first check len(taxes) (e.g., require at least 1) and fail fast (t.Fatalf or t.Fatalf-style message) if it's not as expected before accessing taxes[0], and do the same for the second occurrence around lines that check taxes[0].Entries—ensure you validate len(taxes) and then validate len(taxes[0].Entries) to avoid panics if BuildAll changes behavior.internal/archdocs/pssg/enrichment/cache_test.go-3-8 (1)
3-8:⚠️ Potential issue | 🟡 MinorUnreadable-dir test skip condition is too narrow and can be flaky.
Line 114 only checks
CI, but chmod-based permission tests are also unreliable on some platforms. Add an OS guard to reduce false failures.Suggested patch
import ( "encoding/json" "os" "path/filepath" + "runtime" "testing" ) @@ func TestReadAllCaches_UnreadableDir(t *testing.T) { - if os.Getenv("CI") != "" { - t.Skip("skipping chmod-based test in CI") + if os.Getenv("CI") != "" || runtime.GOOS == "windows" { + t.Skip("skipping chmod-based test on CI/Windows") }Also applies to: 113-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 3 - 8, The unreadable-dir permission test currently only skips when CI is set, which is flaky on platforms that don't support chmod semantics; modify the test skip condition in cache_test.go for the unreadable-dir test to also detect unsupported OSes by importing runtime and skipping when runtime.GOOS is "windows" (and optionally other non-Unix OSes like "plan9" or "js") in addition to the existing CI check; update the test function that sets directory permissions to return t.Skipf(...) when runtime.GOOS indicates an unsupported platform and ensure the runtime import is added to the import block.internal/archdocs/pssg/enrichment/cache_test.go-79-87 (1)
79-87:⚠️ Potential issue | 🟡 MinorUse a deterministic non-existent path instead of a hardcoded absolute path.
"/nonexistent-enrichment-dir-xyz"is brittle and platform-dependent. Build a guaranteed-missing path fromt.TempDir()for stable behavior.Suggested patch
func TestReadAllCaches_NonExistentDir(t *testing.T) { - result, err := ReadAllCaches("/nonexistent-enrichment-dir-xyz") + nonExistentDir := filepath.Join(t.TempDir(), "does-not-exist") + result, err := ReadAllCaches(nonExistentDir) if err != nil { t.Fatalf("non-existent dir should return empty result, not error: %v", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 79 - 87, The test TestReadAllCaches_NonExistentDir uses a hardcoded absolute path which is brittle; change it to create a deterministic missing path by using t.TempDir() and joining a guaranteed-nonexistent subdirectory (e.g. filepath.Join(t.TempDir(), "nonexistent-enrichment-dir-xyz")) before calling ReadAllCaches, and update the test to import filepath if required; keep the same assertions (expect no error and empty result).internal/archdocs/pssg/loader/loader_test.go-194-202 (1)
194-202:⚠️ Potential issue | 🟡 Minor
TestDeriveSlug_FromFieldcurrently can’t catch fallback behavior regressions.At Line 199–Line 201, the test only checks non-empty output, and the filename/title can normalize to the same slug. That can mask a bug where
field:titleis ignored.Suggested test hardening
func TestDeriveSlug_FromField(t *testing.T) { l := &MarkdownLoader{Config: &config.Config{ Data: config.DataConfig{EntitySlug: config.EntitySlug{Source: "field:title"}}, }} fields := map[string]interface{}{"title": "My Recipe!"} - slug := l.deriveSlug("/data/my-recipe.md", fields) - if slug == "" { - t.Error("expected non-empty slug from field") + slug := l.deriveSlug("/data/fallback-file.md", fields) + if slug == "" || slug == "fallback-file" { + t.Errorf("expected slug derived from field:title, got %q", slug) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/loader/loader_test.go` around lines 194 - 202, TestDeriveSlug_FromField is too weak because it only asserts the slug is non-empty and the input filename could normalize to the same slug; update the test to assert the exact expected slug derived from the title field (e.g., expect "my-recipe" when fields["title"]="My Recipe!") and change the input path passed to l.deriveSlug (e.g., "/data/unrelated-file.md") so the filename would not normalize to the same slug; this ensures deriveSlug (method deriveSlug on MarkdownLoader) actually uses the field: compare slug == "my-recipe" and fail the test if it does not.internal/archdocs/pssg/loader/loader_test.go-321-345 (1)
321-345:⚠️ Potential issue | 🟡 MinorUnreadable-file test is environment-sensitive and can flake locally.
At Line 330+,
chmod 0000is not a reliable unreadability guarantee across OS/user contexts. The test should verify unreadability before asserting skip behavior.Suggested stabilization
import ( "os" "path/filepath" + "runtime" "testing" @@ func TestLoad_UnreadableFile(t *testing.T) { if os.Getenv("CI") != "" { t.Skip("skipping chmod-based test in CI") } + if runtime.GOOS == "windows" { + t.Skip("chmod-based unreadable-file semantics are not reliable on Windows") + } dir := t.TempDir() f := filepath.Join(dir, "locked.md") @@ if err := os.Chmod(f, 0000); err != nil { t.Fatal(err) } + if _, err := os.ReadFile(f); err == nil { + t.Skip("file is still readable in this environment; skipping unreadable-file assertion") + } t.Cleanup(func() { os.Chmod(f, 0600) }) //nolint:errcheck🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/loader/loader_test.go` around lines 321 - 345, The TestLoad_UnreadableFile test sets file mode to 0000 but doesn't verify the file is actually unreadable on the current environment; update TestLoad_UnreadableFile to assert unreadability before relying on the behavior of MarkdownLoader.Load: after os.Chmod(f, 0000) attempt to open the file (or read it) and assert that it returns a permission error, and if it does not (i.e., the file is still readable) call t.Skipf to skip the test on this environment; keep the existing t.Cleanup that restores permissions and then run l.Load() expecting a warn+skip and zero entities (MarkdownLoader.Load remains unchanged).internal/shards/graph_test.go-519-528 (1)
519-528:⚠️ Potential issue | 🟡 MinorTest doesn't actually verify anything meaningful.
func TestComputeStats_FromCache(t *testing.T) { // ... stats.FromCache = true if !stats.FromCache { t.Error("FromCache should be settable") } }This test sets a field and then checks if it was set. That's testing Go's struct assignment, not your code. If you want to verify
FromCachebehavior, you'd need to test whatever code path actually sets this flag based on cache state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/graph_test.go` around lines 519 - 528, The test TestComputeStats_FromCache is only assigning stats.FromCache directly instead of exercising computeStats' cache-detection logic; update the test to set up a real cache hit/miss and call computeStats to assert its behavior: create an api.ShardIR with content that will be cached, use NewCache() and call c.Build(ir) or pre-populate the cache entry so computeStats(ir, c) returns a stats whose FromCache is true (and add a complementary case where cache is empty and FromCache is false); reference the functions/values computeStats, NewCache, c.Build, and stats.FromCache when implementing the assertions.internal/shards/watcher_test.go-256-262 (1)
256-262:⚠️ Potential issue | 🟡 MinorSwallowed error in test setup could hide failures.
The
runCmdhelper ignores the error fromcmd.CombinedOutput():runCmd := func(args ...string) { cmd := exec.Command(args[0], args[1:]...) cmd.Dir = dir cmd.CombinedOutput() //nolint:errcheck }If
git addorgit commitfails, the test will proceed with incorrect state and might pass or fail for the wrong reasons. Consider checking the error like the helperrunWatcherGitdoes at line 297-299.Suggested fix
runCmd := func(args ...string) { + t.Helper() cmd := exec.Command(args[0], args[1:]...) cmd.Dir = dir - cmd.CombinedOutput() //nolint:errcheck + if out, err := cmd.CombinedOutput(); err != nil { + t.Fatalf("%v: %v\n%s", args, err, out) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/watcher_test.go` around lines 256 - 262, The test helper runCmd currently swallows errors from cmd.CombinedOutput(), which can hide git failures; update runCmd to behave like runWatcherGit by calling cmd.CombinedOutput(), checking the returned error, and failing the test with t.Fatalf (including the error and output) if it is non-nil; also mark the helper with t.Helper() so failures point to the caller. Use the existing runWatcherGit approach (lines around runWatcherGit and its CombinedOutput usage) as the pattern for error handling.internal/shards/render_test.go-817-822 (1)
817-822:⚠️ Potential issue | 🟡 MinorPlaceholder test with no assertions.
This test has a comment explaining what it should do but doesn't actually test anything. Either implement it or remove it to avoid giving false confidence in test coverage.
func TestHook_InvalidJSONExitsCleanly(t *testing.T) { // Hook reads from stdin; we test via the exported function with invalid data. // The function must return nil (never break the agent) on bad input. // We can't easily inject stdin, but we test the underlying validation logic // directly by calling with a mock via the export test file. }If you can't easily test this now, consider adding
t.Skip("TODO: implement stdin injection for Hook testing")so it's clear this is intentionally deferred.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/render_test.go` around lines 817 - 822, The test TestHook_InvalidJSONExitsCleanly is a no-op placeholder; either implement it to exercise the Hook logic by feeding invalid JSON into the exported function that accepts an io.Reader (e.g., call Hook or the exported validation helper such as ValidateHookInput with a bytes.NewBufferString(`invalid`) and assert it returns nil/no panic and no unwanted side effects), or if stdin injection is not feasible yet, replace the body with t.Skip("TODO: implement stdin injection for Hook testing") so the intent is explicit; locate TestHook_InvalidJSONExitsCleanly and modify it accordingly, referencing the Hook function or the exported validation helper used elsewhere in tests.internal/cache/cache_test.go-358-452 (1)
358-452:⚠️ Potential issue | 🟡 MinorThese chmod-based failure tests are still a bit flaky.
Using
0555/0000to force write and read failures is not reliable on every runner. Elevated users and some platform permission models can still pass the I/O calls, which makes these assertions intermittently false. I’d prefer deterministic failure setups here, or at least a shared skip helper for environments where permission semantics are unreliable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cache/cache_test.go` around lines 358 - 452, Replace flaky chmod-based skips with a deterministic permissions-check helper: add a helper (e.g., skipIfPermissionsUnreliable) that attempts to enforce and validate permission behavior (create a temp dir/file, chmod to the intended restrictive mode, then attempt an operation that should fail—write for 0555 tests, read for 0000 test); if the operation still succeeds, call t.Skip to skip the test. Update TestPutJSON_WriteFileError, TestPut_WriteFileError and TestHashFile_ReadError to call this helper at the start instead of directly skipping on CI or relying on chmod alone, and keep the existing cleanup (t.Cleanup) logic for restoring perms when the helper confirms the environment enforces permissions.internal/find/zip_test.go-120-172 (1)
120-172:⚠️ Potential issue | 🟡 MinorGuard test setup against missing
gitbinary.The
initCleanFindGitRepohelper shells out togitdirectly viaexec.Commandwithout checking whethergitis installed first. In minimal test environments (CI runners, containers, etc.), a missinggitbinary will crash the test, which is confusing—the test infrastructure fails even though your code isn't broken.Add a quick
exec.LookPath("git")check at the start of the function and skip the test ifgitisn't found. That way, the test suite stays portable and only runs when it can actually run:if _, err := exec.LookPath("git"); err != nil { t.Skip("git not available") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/find/zip_test.go` around lines 120 - 172, The test helper initCleanFindGitRepo currently shells out to git without verifying git is installed; add a preflight check using exec.LookPath("git") at the start of initCleanFindGitRepo and call t.Skip("git not available") when LookPath returns an error so tests that require git are skipped instead of failing in environments without the git binary.benchmark/results/blog-post-draft.md-53-53 (1)
53-53:⚠️ Potential issue | 🟡 MinorAdd language labels to fenced blocks to satisfy markdownlint.
Both code fences should declare a language (for example
text) to fix MD040.Suggested diff
-``` +```text Bash: run tests → see 8 errors ... Bash: run tests → all pass-
+text
Bash: run tests → see 8 errors
...
Bash: run tests → all passAlso applies to: 72-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/results/blog-post-draft.md` at line 53, The fenced code blocks in benchmark/results/blog-post-draft.md that show the test output (the blocks containing the lines starting "Bash: run tests → see 8 errors" and "Bash: run tests → all pass") are missing language labels and trigger MD040; update both opening fences from ``` to ```text (or another appropriate language such as ```bash) so the blocks read ```text, ensuring both occurrences are changed to satisfy markdownlint.benchmark/results/blog-post-draft.md-49-70 (1)
49-70:⚠️ Potential issue | 🟡 MinorNarrative metrics conflict with the results table.
The table reports Naked =
20 turns / $0.30and Crafted =9 turns / $0.12, but section headings say13 turns / $0.22and7 turns / $0.13. Please align these numbers so readers don’t get mixed signals.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/results/blog-post-draft.md` around lines 49 - 70, The section headings "### Without Supermodel (13 turns, $0.22)" and "### With Supermodel (7 turns, $0.13)" conflict with the results table entries "Naked = 20 turns / $0.30" and "Crafted = 9 turns / $0.12"; pick the authoritative source (table or narrative) and update the mismatched values so they match exactly—edit the headings (the "Without Supermodel" and "With Supermodel" lines) or the table rows ("Naked" and "Crafted") to make the turn counts and dollar amounts consistent across both places. Ensure formatting/parentheses remain identical when you change the numbers.benchmark/compare.sh-82-93 (1)
82-93:⚠️ Potential issue | 🟡 MinorCost delta parsing is inconsistent and mislabels equal-cost runs.
This block only reads
costUSD(unlikeextract_cost()), anddelta == 0gets reported as “more expensive”.Proposed fix
-naked_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || echo "") -sm_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || echo "") +naked_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true) +[[ -z "$naked_cost" ]] && naked_cost=$(grep -oP '"cost_usd"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true) +[[ -z "$naked_cost" ]] && naked_cost=$(grep -oP 'Total cost[^0-9]*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true) + +sm_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true) +[[ -z "$sm_cost" ]] && sm_cost=$(grep -oP '"cost_usd"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true) +[[ -z "$sm_cost" ]] && sm_cost=$(grep -oP 'Total cost[^0-9]*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true) if [[ -n "$naked_cost" && -n "$sm_cost" ]]; then python3 - <<PYEOF naked = float("$naked_cost") sm = float("$sm_cost") delta = naked - sm pct = (delta / naked * 100) if naked > 0 else 0 -sign = "cheaper" if delta > 0 else "more expensive" -print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) {sign} than naked") +if delta > 0: + print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) cheaper than naked") +elif delta < 0: + print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) more expensive than naked") +else: + print("supermodel had the same cost as naked") PYEOF fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/compare.sh` around lines 82 - 93, The comparison block reads costUSD with a different regex than extract_cost() and treats delta == 0 as "more expensive"; change it to reuse the same extraction logic as extract_cost() (or copy its exact grep/regex) to populate naked_cost and sm_cost, compute delta = naked - sm, set sign to "same cost" when delta == 0, "cheaper" when delta > 0, and "more expensive" when delta < 0, and ensure pct calculation uses naked only when naked > 0 to avoid division by zero; update the print message to reflect the "same cost" case.
🧹 Nitpick comments (19)
internal/archdocs/pssg/taxonomy/taxonomy_test.go (2)
305-336: Extract the repeated entity-builder helperThe same
e := func(n int) []*entity.Entity { ... }helper is repeated in multiple tests. Pulling it into one file-level helper keeps this test file easier to maintain.Refactor sketch
+func makeEntities(n int) []*entity.Entity { + s := make([]*entity.Entity, n) + for i := range s { + s[i] = &entity.Entity{} + } + return s +} + func TestComputePagination_SinglePage(t *testing.T) { - e := func(n int) []*entity.Entity { - s := make([]*entity.Entity, n) - for i := range s { - s[i] = &entity.Entity{} - } - return s - } - entry := Entry{Slug: "italian", Entities: e(5)} + entry := Entry{Slug: "italian", Entities: makeEntities(5)} @@ func TestComputePagination_MultiPage(t *testing.T) { - e := func(n int) []*entity.Entity { - s := make([]*entity.Entity, n) - for i := range s { - s[i] = &entity.Entity{} - } - return s - } - entry := Entry{Slug: "italian", Entities: e(10)} + entry := Entry{Slug: "italian", Entities: makeEntities(10)} @@ func TestTopEntries(t *testing.T) { - e := func(n int) []*entity.Entity { - s := make([]*entity.Entity, n) - for i := range s { - s[i] = &entity.Entity{} - } - return s - } entries := []Entry{ - {Name: "A", Slug: "a", Entities: e(3)}, - {Name: "B", Slug: "b", Entities: e(10)}, - {Name: "C", Slug: "c", Entities: e(1)}, + {Name: "A", Slug: "a", Entities: makeEntities(3)}, + {Name: "B", Slug: "b", Entities: makeEntities(10)}, + {Name: "C", Slug: "c", Entities: makeEntities(1)}, }Also applies to: 408-414
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 305 - 336, Extract the repeated inline helper e := func(n int) []*entity.Entity { ... } used in TestComputePagination_SinglePage and TestComputePagination_MultiPage (and other tests around lines 408-414) into a single file-level helper function (e.g., makeEntities or buildEntities) declared outside the test functions; replace the inline e closures in each test with calls to that new helper, keeping the same signature and returning []*entity.Entity so ComputePagination and the tests keep working unchanged.
364-366: Drop hard-coded source line references in commentsComments like “covers L73-76” will get stale as soon as the implementation file moves. Better to describe behavior only.
Also applies to: 380-382, 396-398, 433-434
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 364 - 366, Remove the hard-coded source line references from the test comments (e.g., the comment above TestBuildAll_InvertMode that says “covers L73-76”) and replace them with a behavior-focused description; for example, for TestBuildAll_InvertMode and the other affected comment blocks (near the comments at the ranges you flagged) state that the test "verifies that when tc.Invert is true the entity loop hits continue and no entries are added" or similar behavior-only wording, ensuring you reference the test name (TestBuildAll_InvertMode) and the condition (tc.Invert true) rather than file line numbers.internal/compact/handler_test.go (1)
725-741: Test name and comment are a bit misleading.The comment says this tests that
_(blank identifier) is skipped if produced, butnextShortNameonly ever generates names froma-z(and thenaa,ab, etc.). So_would never be generated in the first place.What this test actually does is verify that when 25 out of 26 single-char slots are occupied, the function correctly returns the one remaining slot (
z). That's the same behavior asTestNextShortName_SkipsExisting—just at a larger scale.Consider renaming to something like
TestNextShortName_FindsLastFreeSlotor updating the comment to match what's really being tested. Not a functional bug, just a clarity thing for future readers.✏️ Suggested comment fix
func TestNextShortName_SkipsBuiltins(t *testing.T) { - // '_' is a Go builtin; if the counter somehow produces it, it must be skipped. - // We test indirectly by filling every single-char slot except 'z' with existing - // names, then verifying we get 'z' (the only remaining free single-char slot). + // Verify that when all single-char slots except 'z' are taken, nextShortName + // correctly skips occupied names and returns the only free slot. existing := map[string]bool{}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/compact/handler_test.go` around lines 725 - 741, The test name and comment are misleading: TestNextShortName_SkipsBuiltins and its comment claim to validate skipping the blank identifier `_`, but nextShortName only generates `a`–`z` and the test actually verifies that when 25 of 26 single-letter names are taken the remaining letter (`z`) is returned; update the test to reflect intent by either renaming the test to TestNextShortName_FindsLastFreeSlot (or similar) or by editing the comment to describe that it fills every single-char slot except one and asserts the last free single-char slot is returned; reference the existing test name TestNextShortName_SkipsBuiltins and the function nextShortName when making the change.internal/archdocs/pssg/enrichment/cache_test.go (1)
136-143: Tighten the mixed-validity assertion.Nice coverage here; small add: assert that the returned key is
"good"so this test catches wrong-key regressions too.Suggested patch
if len(result) != 1 { t.Errorf("expected 1 valid entry, got %d", len(result)) } + if _, ok := result["good"]; !ok { + t.Errorf("expected key %q in result, got %v", "good", result) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 136 - 143, Tighten the assertion after calling ReadAllCaches: in addition to checking len(result) == 1, assert that the single returned key is "good" by inspecting result (the map/slice returned by ReadAllCaches) and failing the test if the key differs; update the test after the existing len check to compare the returned key name (from result) to "good" and call t.Fatalf/t.Errorf with a clear message on mismatch so wrong-key regressions are caught.internal/archdocs/pssg/loader/loader_test.go (1)
388-412:faqbranch is configured but not asserted inTestParseSections_AllTypes.Line 396 sets up
Type: "faq", but no assertion checkssections["faqs"]. That leaves a mapping gap in this end-to-end section test.Suggested assertion add-on
import ( "os" "path/filepath" "testing" "github.com/supermodeltools/cli/internal/archdocs/pssg/config" + "github.com/supermodeltools/cli/internal/archdocs/pssg/entity" ) @@ if items, ok := sections["steps"].([]string); !ok || len(items) != 2 { t.Errorf("steps: expected []string with 2 items, got %v", sections["steps"]) } + if faqs, ok := sections["faqs"].([]entity.FAQ); !ok || len(faqs) != 1 || faqs[0].Question != "What temp?" { + t.Errorf("faqs: expected []entity.FAQ with 1 item, got %v", sections["faqs"]) + } if notes, ok := sections["notes"].(string); !ok || notes == "" { t.Errorf("notes: expected non-empty string, got %v", sections["notes"]) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/loader/loader_test.go` around lines 388 - 412, TestParseSections_AllTypes sets up a BodySection with Type "faq" but never asserts the parsed result; update the test (TestParseSections_AllTypes) to verify sections["faqs"] returned by l.parseSections contains the expected FAQ data: assert the key "faqs" exists in the sections map, assert its type (e.g., map[string]string or whatever parseSections returns for FAQs), and assert it contains the question "What temp?" with the answer "350°F" (or at minimum that the faqs entry is non-nil/non-empty) so the FAQ branch is covered.internal/shards/graph_test.go (1)
457-477: Cycle test logic is a bit odd but functional.The test spawns a goroutine and then does:
select { case <-done: default: // immediate completion is fine <-done }This is essentially just
<-donewith extra steps. Thedefaultcase immediately falls through to<-doneanyway. It works, but could be simplified to just:done := make(chan struct{}) go func() { c.TransitiveDependents("a.go") close(done) }() select { case <-done: // completed without infinite loop case <-time.After(time.Second): t.Fatal("TransitiveDependents did not terminate (possible infinite loop)") }The current version will block forever if there's an infinite loop, whereas adding a timeout makes the failure more obvious.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/graph_test.go` around lines 457 - 477, TestTransitiveDependents_Cycle currently uses a convoluted select that effectively just blocks on <-done and will hang silently if TransitiveDependents enters a loop; change the test to wait for the done channel with a timeout instead: after launching the goroutine that calls c.TransitiveDependents("a.go") and closes done, replace the existing select with one that either receives from done (success) or from time.After(1 * time.Second) and fails the test (t.Fatal) on timeout to detect infinite loops reliably.internal/shards/zip_test.go (3)
340-349: SomeopenZipEntrieserrors are ignored.In several places you do:
r, _ := openZipEntries(t, path)While it's unlikely to fail after
CreateZipFilesucceeds, silently ignoring the error could mask issues. Consider using:r, err := openZipEntries(t, path) if err != nil { t.Fatal(err) }Or at minimum, the helper could call
t.Fatalinternally on error since it already takes*testing.T.Also applies to: 370-376, 394-401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/zip_test.go` around lines 340 - 349, Tests currently ignore errors from openZipEntries (e.g., uses r, _ := openZipEntries(t, path)); update each call (including occurrences around the other noted blocks) to capture the error and fail the test on error (r, err := openZipEntries(t, path); if err != nil { t.Fatal(err) }), or alternatively modify the openZipEntries helper (which already receives *testing.T) to call t.Fatal on internal errors so callers can continue using r := openZipEntries(t, path) without dropping errors; reference the openZipEntries helper and all its test-call sites to apply one consistent approach.
681-692: Test comment doesn't match the assertion.The comment says "same count (1 each), should sort a before b alphabetically", but then the assertion expects
gobeforets:// b.go, a.ts — same count (1 each), should sort a before b alphabetically files := []string{"b.go", "a.ts"} // ... if stats[0].Ext != "go" || stats[1].Ext != "ts" { // alphabetically "go" < "ts", so go comes firstThe comment mentions sorting files "a before b" but you're actually sorting by extension name (
govsts), not by filename. The inline comment clarifies this, but the test function's leading comment is a bit misleading. Minor nit, but could confuse someone reading just the comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/zip_test.go` around lines 681 - 692, The test TestLanguageStats_TiesSortedAlphabetically is misleading: it says "should sort a before b alphabetically" but LanguageStats sorts by file extension (e.g., "go" vs "ts"); update the test comment to clearly state that ties are broken by extension name (e.g., "b.go, a.ts — same count (1 each), must sort by extension alphabetically so 'go' comes before 'ts'") or alternatively change the sample filenames to match the comment intent; keep assertions as-is and reference LanguageStats and stats in the updated comment.
713-725: Helper could handle errors internally.Since
openZipEntriesalready takes*testing.T, it could callt.Fatalon error instead of returning it. This would simplify all the call sites and ensure errors are never silently ignored.func openZipEntries(t *testing.T, path string) map[string]bool { t.Helper() r, err := zip.OpenReader(path) if err != nil { t.Fatalf("failed to open zip: %v", err) } defer r.Close() m := make(map[string]bool, len(r.File)) for _, f := range r.File { m[f.Name] = true } return m }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/zip_test.go` around lines 713 - 725, The helper openZipEntries currently returns (map[string]bool, error) but takes *testing.T; change it to handle errors internally by removing the error return, calling t.Fatalf when zip.OpenReader(path) fails, and returning just map[string]bool; update its signature to func openZipEntries(t *testing.T, path string) map[string]bool, keep t.Helper(), still defer r.Close(), and update all call sites to drop error handling since openZipEntries will now fail the test itself.internal/shards/watcher_test.go (1)
99-110: Time-based test can be flaky in CI.Using
time.Sleep(20 * time.Millisecond)to verify the ticker fires is a common pattern but can be flaky on slow CI runners or under load. The test will likely pass most of the time, but just be aware if you see intermittent failures here.One alternative is to use a channel signal from within
poll()or to mock the ticker, but that's a bigger refactor. For now this is probably fine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shards/watcher_test.go` around lines 99 - 110, TestWatcher_RunPollsOnTick uses time.Sleep which is flaky; modify the test to wait on a deterministic signal from the Watcher instead of sleeping: add a lightweight test hook (e.g., a channel or callback field) to NewWatcher/Watcher that poll() signals when it runs (e.g., OnPoll chan struct{} or OnPoll func()), update poll() to send/close/invoke that hook, and then in TestWatcher_RunPollsOnTick wait on that channel/callback with a timeout (context or select) to assert poll was called; keep existing cancellation logic for Run.internal/archdocs/pssg/affiliate/affiliate_test.go (1)
27-36: Strengthen special-character assertion inGenerateLinktest.Right now this passes on any non-empty URL. Consider asserting encoded content (e.g.,
%26for&) so the test actually protects encoding behavior.Suggested tweak
func TestGenerateLink_SpecialChars(t *testing.T) { @@ url := p.GenerateLink("bread & butter") - if url == "" { - t.Error("expected non-empty URL") + if !strings.Contains(url, "bread+%26+butter") { + t.Errorf("expected encoded term in URL, got: %s", url) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/affiliate/affiliate_test.go` around lines 27 - 36, Update TestGenerateLink_SpecialChars to assert that Provider.GenerateLink returns an encoded term rather than merely a non-empty string: call p.GenerateLink("bread & butter") and assert the resulting URL contains the expected percent-encoded segment (e.g., "%26" or the full "bread%20%26%20butter") so the test verifies proper URL encoding behavior in Provider.GenerateLink.internal/archdocs/pssg/output/rss_test.go (1)
22-24: Make the category assertion more precise.The substring check works, but asserting the XML element (e.g.,
<category>Italian</category>) makes this test less prone to false positives.Suggested tweak
- if !strings.Contains(out, "Italian") { + if !strings.Contains(out, "<category>Italian</category>") { t.Errorf("expected category in output:\n%s", out) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/output/rss_test.go` around lines 22 - 24, The test currently uses strings.Contains(out, "Italian") which is too broad; update the assertion in rss_test.go to check for the exact XML category element (e.g., assert that out contains "<category>Italian</category>") or parse the generated RSS and assert the category element value. Locate the assertion using the out variable in the test (the if !strings.Contains(...) block) and replace it with a precise check for the full element string or use an XML parser to verify the <category> node equals "Italian".internal/archdocs/pssg/schema/jsonld_test.go (1)
113-127: Type assertion without safety check could cause confusing test failures.If the implementation of
GenerateBreadcrumbSchemaever changes ands["itemListElement"]isn't a[]map[string]interface{}, this test will panic with a cryptic message instead of a clear test failure.Think of it like this: if you're expecting a pizza and someone hands you a burrito, you want to say "hey, this isn't pizza!" rather than just choking on it.
💡 Safer type assertion pattern
- list := s["itemListElement"].([]map[string]interface{}) + list, ok := s["itemListElement"].([]map[string]interface{}) + if !ok { + t.Fatalf("itemListElement: want []map[string]interface{}, got %T", s["itemListElement"]) + }This same pattern appears in a few other places (lines 141, 172, 405) — consider applying it there too for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/archdocs/pssg/schema/jsonld_test.go` around lines 113 - 127, The test directly type-asserts s["itemListElement"] to []map[string]interface{} which will panic on mismatch; change the assertion in the Test (around the use of s and "itemListElement") to a safe comma-ok form (e.g., v, ok := s["itemListElement"].([]map[string]interface{})) and call t.Fatalf with a clear message if ok is false so the failure is explicit; apply the same safe-assertion pattern to the other similar tests that access "itemListElement" (the other occurrences in this file) to avoid panics and produce readable test errors.internal/blastradius/zip_test.go (1)
121-293: Worth pulling the shared ZIP/git harness into a test helper.This file now repeats the same repo bootstrap, zip-reader helper, and most walk/create cases already added in
internal/analyze/zip_test.go,internal/find/zip_test.go, andinternal/deadcode/zip_test.go. A shared test helper would make future ZIP behavior changes a one-file update instead of a four-file update.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/blastradius/zip_test.go` around lines 121 - 293, Multiple test files duplicate the same ZIP/git harness (functions like initCleanBlastGitRepo and readBlastZipEntries and repeated walk/create test cases); extract those shared helpers into a single test helper file and update each test file to call them. Create a shared test helper (e.g., package-level test helper file) that defines reusable functions initCleanBlastGitRepo, readBlastZipEntries and any common setup/teardown utilities, make them exported or placed in the same test package so other tests can call them, remove the duplicated implementations from internal/blastradius/zip_test.go and the other zip_test.go files, and update tests that reference walkZip, createZip, gitArchive, and isWorktreeClean to use the centralized helpers. Ensure imports and test package visibility are adjusted so all test files compile against the single helper.internal/mcp/server_test.go (1)
682-829: Make these cache-hit tests hermetic.Right now they fingerprint whatever
os.Getwd()points at, so the setup depends on how the test runner launches the package and on the contents of the real checkout. Building a tiny temp repo/dir here, like the ZIP suites do, would make the cache key andserver.dirmuch more predictable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/mcp/server_test.go` around lines 682 - 829, Tests use repoDir which returns os.Getwd() making cache keys non-hermetic; change the tests to create an isolated temp repo directory (use t.TempDir()), populate it with a minimal deterministic file tree (small go files and a README), initialize a git repo there (git init, git add, git commit) so cache.RepoFingerprint(dir) yields a stable fingerprint, and pass that dir into the server (server{..., dir: dir}) and into cache.AnalysisKey; update or replace repoDir(t *testing.T) to construct/return this temp repo or inline the temp-repo setup at the start of TestToolDeadCode_CacheHit, TestToolDeadCode_CacheHitWithArgs, TestToolBlastRadius_CacheHit, and TestToolBlastRadius_CacheHitWithTarget so each test uses the hermetic repo.benchmark/Dockerfile.supermodel (1)
18-21: Use--no-install-recommendsin runtime apt installs too.Same as the naked image: this trims unnecessary packages and keeps the benchmark runtime image leaner.
Suggested diff
-RUN apt-get update && apt-get install -y curl ca-certificates git && \ +RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates git && \ curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \ - apt-get install -y nodejs && \ + apt-get install -y --no-install-recommends nodejs && \ rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/Dockerfile.supermodel` around lines 18 - 21, Update the Dockerfile RUN command to add --no-install-recommends to the apt installs so runtime image stays lean: modify the apt-get install invocations in the RUN line that currently installs curl ca-certificates git and nodejs (the multi-step RUN block) to use apt-get install -y --no-install-recommends for both installs while keeping the existing curl | bash node setup and rm -rf /var/lib/apt/lists/* cleanup.benchmark/Dockerfile.naked (1)
5-8: Reduce package footprint in apt installs.Use
--no-install-recommendson bothapt-get installcalls to keep the benchmark image smaller and reduce unnecessary packages.Suggested diff
-RUN apt-get update && apt-get install -y curl ca-certificates git && \ +RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates git && \ curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \ - apt-get install -y nodejs && \ + apt-get install -y --no-install-recommends nodejs && \ rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/Dockerfile.naked` around lines 5 - 8, The RUN instruction that executes apt-get installs (the line starting with "RUN apt-get update && apt-get install -y curl ca-certificates git && \") should be changed to use --no-install-recommends on both apt-get install invocations (the initial "apt-get install -y curl ca-certificates git" and the subsequent "apt-get install -y nodejs") so that unnecessary recommended packages aren't pulled into the benchmark image; update those two install commands to include the --no-install-recommends flag while preserving the existing chaining, curl bootstrap for Node.js, and the cleanup step.benchmark/run.sh (1)
28-39: Keep full Docker build logs for debugging.Piping directly to
tail -3hides useful failure context. Consider teeing full logs to files, then tailing for console brevity.Suggested diff
docker build \ -f "$SCRIPT_DIR/Dockerfile.naked" \ -t bench-naked \ "$SCRIPT_DIR" \ - 2>&1 | tail -3 + 2>&1 | tee "$RESULTS_DIR/build-naked.log" | tail -3 @@ docker build \ -f "$SCRIPT_DIR/Dockerfile.supermodel" \ -t bench-supermodel \ "$REPO_ROOT" \ - 2>&1 | tail -3 + 2>&1 | tee "$RESULTS_DIR/build-supermodel.log" | tail -3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/run.sh` around lines 28 - 39, The current docker build invocations for bench-naked and bench-supermodel pipe output to "tail -3", which discards full build logs; change the two docker build steps (the commands that use "$SCRIPT_DIR/Dockerfile.naked" -> -t bench-naked and "$SCRIPT_DIR/Dockerfile.supermodel" -> -t bench-supermodel with "$REPO_ROOT") to write full logs to persistent files (e.g., via tee to separate log files per build) and still print a short summary to the console by tailing the saved log files; ensure the build failure paths use those saved log files for debugging instead of losing output.benchmark/results/naked.txt (1)
11-44: Consider committing a sanitized benchmark log instead of full raw transcript.This includes full session/tool payloads and identifiers that aren’t required for the comparison script’s core metrics. Keeping only test summary + cost/token fields would make diffs smaller and reduce data retention risk.
Also applies to: 91-91
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/results/naked.txt` around lines 11 - 44, The benchmark log currently contains full raw session/tool payloads (session_id, tool traces, model fields, etc.); replace the file content with a sanitized summary that preserves only the test summary lines (e.g., "Ran X tests", pass/fail counts) and the aggregated cost/token metrics, removing or redacting session_id, tool payloads, and any identifiers such as "session_id", "model", "tools", and raw tool inputs/outputs; update benchmark/results/naked.txt so it contains only the concise test summary plus explicit cost/token fields (and a minimal timestamp if needed) to keep diffs small and avoid leaking identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e252eb40-6c77-45a2-999a-4986183f525d
⛔ Files ignored due to path filters (1)
benchmark/results/benchmark_results.zipis excluded by!**/*.zip
📒 Files selected for processing (78)
.gitignorebenchmark/CLAUDE.mdbenchmark/CLAUDE.skill.mdbenchmark/Dockerfile.nakedbenchmark/Dockerfile.supermodelbenchmark/change_tracking/__init__.pybenchmark/change_tracking/models.pybenchmark/change_tracking/tests.pybenchmark/compare.shbenchmark/django_app/manage.pybenchmark/django_app/mysite/__init__.pybenchmark/django_app/mysite/settings.pybenchmark/django_app/mysite/urls.pybenchmark/django_app/mysite/wsgi.pybenchmark/django_app/requirements.txtbenchmark/django_app/todos/__init__.pybenchmark/django_app/todos/admin.pybenchmark/django_app/todos/apps.pybenchmark/django_app/todos/models.pybenchmark/django_app/todos/tests.pybenchmark/django_app/todos/urls.pybenchmark/django_app/todos/views.pybenchmark/entrypoint.naked.shbenchmark/entrypoint.supermodel.shbenchmark/results/blog-post-draft.mdbenchmark/results/naked.txtbenchmark/results/skill-v2.txtbenchmark/results/summary.mdbenchmark/results/supermodel.txtbenchmark/run.shbenchmark/task.mdcmd/restore.gocmd/skill.gointernal/analyze/zip_test.gointernal/api/client.gointernal/api/types_test.gointernal/archdocs/graph2md/graph2md_test.gointernal/archdocs/handler_test.gointernal/archdocs/pssg/affiliate/affiliate_test.gointernal/archdocs/pssg/build/build_test.gointernal/archdocs/pssg/config/config_test.gointernal/archdocs/pssg/enrichment/cache_test.gointernal/archdocs/pssg/entity/entity_test.gointernal/archdocs/pssg/loader/loader_test.gointernal/archdocs/pssg/output/llmstxt_test.gointernal/archdocs/pssg/output/output_test.gointernal/archdocs/pssg/output/rss_test.gointernal/archdocs/pssg/render/funcs_test.gointernal/archdocs/pssg/render/render_test.gointernal/archdocs/pssg/render/shareimage_test.gointernal/archdocs/pssg/schema/jsonld_test.gointernal/archdocs/pssg/taxonomy/taxonomy_test.gointernal/archdocs/zip_test.gointernal/audit/audit_test.gointernal/auth/handler_test.gointernal/blastradius/zip_test.gointernal/cache/cache_test.gointernal/compact/handler_test.gointernal/config/config_test.gointernal/deadcode/zip_test.gointernal/find/handler.gointernal/find/handler_test.gointernal/find/zip_test.gointernal/focus/handler.gointernal/focus/handler_test.gointernal/graph/handler_test.gointernal/mcp/server_test.gointernal/mcp/zip_test.gointernal/restore/render.gointernal/restore/restore_test.gointernal/setup/wizard.gointernal/setup/wizard_test.gointernal/shards/daemon_test.gointernal/shards/graph_test.gointernal/shards/render_test.gointernal/shards/watcher_test.gointernal/shards/zip_test.gointernal/status/handler_test.go
✅ Files skipped from review due to trivial changes (7)
- benchmark/change_tracking/models.py
- .gitignore
- benchmark/task.md
- benchmark/CLAUDE.md
- benchmark/CLAUDE.skill.md
- benchmark/results/summary.md
- internal/audit/audit_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/skill.go
| class Todo(models.Model): | ||
| title = models.CharField(max_length=200) | ||
| completed = models.BooleanField(default=False) | ||
| created_at = models.DateTimeField(auto_now_add=True) | ||
|
|
There was a problem hiding this comment.
priority field required by tests is missing.
Line [4]-Line [8] define Todo without priority, but benchmark/django_app/todos/tests.py (Lines [72-91]) expects low|medium|high with default medium. This will break the priority feature path.
Suggested fix
class Todo(models.Model):
+ PRIORITY_LOW = 'low'
+ PRIORITY_MEDIUM = 'medium'
+ PRIORITY_HIGH = 'high'
+ PRIORITY_CHOICES = [
+ (PRIORITY_LOW, 'Low'),
+ (PRIORITY_MEDIUM, 'Medium'),
+ (PRIORITY_HIGH, 'High'),
+ ]
+
title = models.CharField(max_length=200)
completed = models.BooleanField(default=False)
+ priority = models.CharField(
+ max_length=10,
+ choices=PRIORITY_CHOICES,
+ default=PRIORITY_MEDIUM,
+ )
created_at = models.DateTimeField(auto_now_add=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/django_app/todos/models.py` around lines 4 - 8, The Todo model is
missing the required priority field that tests expect; add a CharField named
priority to the Todo class with choices set to ('low','medium','high') (use a
PRIORITY_CHOICES tuple) and default 'medium' (e.g., priority =
models.CharField(max_length=6, choices=PRIORITY_CHOICES, default='medium')),
then create and apply a Django migration so the DB schema matches the model and
tests pass.
| def todo_to_dict(todo): | ||
| return { | ||
| 'id': todo.id, | ||
| 'title': todo.title, | ||
| 'completed': todo.completed, | ||
| 'created_at': todo.created_at.isoformat(), | ||
| } |
There was a problem hiding this comment.
Include priority in serialized responses.
todo_to_dict() omits priority, so create/list/detail responses miss a required field expected by benchmark/django_app/todos/tests.py (Lines 93-143).
Proposed fix
def todo_to_dict(todo):
return {
'id': todo.id,
'title': todo.title,
'completed': todo.completed,
'created_at': todo.created_at.isoformat(),
+ 'priority': todo.priority,
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/django_app/todos/views.py` around lines 9 - 15, The serialized
output from todo_to_dict is missing the required "priority" field; update the
todo_to_dict function to include a 'priority' key (e.g., 'priority':
todo.priority) in the returned dict so create/list/detail responses include that
field (handle nullable values as appropriate, e.g., pass through None or cast to
the expected type) to satisfy the tests referenced in
benchmark/django_app/todos/tests.py.
| def get(self, request): | ||
| todos = Todo.objects.all() | ||
| return JsonResponse({'todos': [todo_to_dict(t) for t in todos]}) |
There was a problem hiding this comment.
Add ?priority= filtering in list endpoint.
The list view currently ignores the priority query param, so GET /api/todos/?priority=high won’t behave as the tests require.
Proposed fix
class TodoListView(View):
def get(self, request):
todos = Todo.objects.all()
+ priority = request.GET.get('priority')
+ if priority:
+ todos = todos.filter(priority=priority)
return JsonResponse({'todos': [todo_to_dict(t) for t in todos]})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get(self, request): | |
| todos = Todo.objects.all() | |
| return JsonResponse({'todos': [todo_to_dict(t) for t in todos]}) | |
| def get(self, request): | |
| todos = Todo.objects.all() | |
| priority = request.GET.get('priority') | |
| if priority: | |
| todos = todos.filter(priority=priority) | |
| return JsonResponse({'todos': [todo_to_dict(t) for t in todos]}) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/django_app/todos/views.py` around lines 20 - 22, The GET list
handler in TodoListView (the get method) ignores the priority query param;
update the get(self, request) implementation to read request.GET.get('priority')
and, when present, filter the queryset (Todo.objects) by that priority before
serializing (e.g., Todo.objects.filter(priority=priority_value)); ensure you
still return the JsonResponse with the list using todo_to_dict(t) for the
filtered queryset.
| def post(self, request): | ||
| body = json.loads(request.body) | ||
| todo = Todo.objects.create(title=body['title']) | ||
| return JsonResponse(todo_to_dict(todo), status=201) |
There was a problem hiding this comment.
POST should accept and persist priority.
Create currently drops priority from request JSON, so test_create_todo_with_priority will fail.
Proposed fix
def post(self, request):
body = json.loads(request.body)
- todo = Todo.objects.create(title=body['title'])
+ todo = Todo.objects.create(
+ title=body['title'],
+ priority=body.get('priority', 'medium'),
+ )
return JsonResponse(todo_to_dict(todo), status=201)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def post(self, request): | |
| body = json.loads(request.body) | |
| todo = Todo.objects.create(title=body['title']) | |
| return JsonResponse(todo_to_dict(todo), status=201) | |
| def post(self, request): | |
| body = json.loads(request.body) | |
| todo = Todo.objects.create( | |
| title=body['title'], | |
| priority=body.get('priority', 'medium'), | |
| ) | |
| return JsonResponse(todo_to_dict(todo), status=201) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/django_app/todos/views.py` around lines 24 - 27, The POST handler
in the TodoView.post currently ignores the incoming priority field and only
creates the model with title, so update the Todo.objects.create call in
post(self, request) to include the priority from the request JSON (e.g., use
body.get('priority') or body['priority'] as appropriate) so the Todo instance
persists the priority; ensure any validation/defaulting logic you need matches
the Todo model and that todo_to_dict(todo) will reflect the saved priority.
| def patch(self, request, pk): | ||
| try: | ||
| todo = Todo.objects.get(pk=pk) | ||
| except Todo.DoesNotExist: | ||
| return JsonResponse({'error': 'Not found'}, status=404) | ||
| body = json.loads(request.body) | ||
| if 'completed' in body: | ||
| todo.completed = body['completed'] | ||
| todo.save() |
There was a problem hiding this comment.
PATCH should support priority updates too.
Detail PATCH only updates completed; it should also handle priority to keep API behavior consistent with the priority feature scope.
Proposed fix
def patch(self, request, pk):
try:
todo = Todo.objects.get(pk=pk)
except Todo.DoesNotExist:
return JsonResponse({'error': 'Not found'}, status=404)
body = json.loads(request.body)
if 'completed' in body:
todo.completed = body['completed']
+ if 'priority' in body:
+ todo.priority = body['priority']
todo.save()
return JsonResponse(todo_to_dict(todo))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/django_app/todos/views.py` around lines 39 - 47, The PATCH handler
in the Todo view currently only updates completed; modify the patch(self,
request, pk) method to also check for 'priority' in the parsed body and assign
it to the Todo instance (e.g., if 'priority' in body: todo.priority =
body['priority']) before calling todo.save(), ensuring the Todo model's priority
field is updated alongside completed.
| func TestLoginWithToken_SaveError(t *testing.T) { | ||
| if os.Getenv("CI") != "" { | ||
| t.Skip("skipping chmod-based test in CI") | ||
| } | ||
| tmp := t.TempDir() | ||
| t.Setenv("HOME", tmp) | ||
| t.Setenv("SUPERMODEL_API_KEY", "") | ||
| cfgDir := filepath.Join(tmp, ".supermodel") | ||
| if err := os.MkdirAll(cfgDir, 0o700); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if err := os.Chmod(cfgDir, 0o555); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| t.Cleanup(func() { os.Chmod(cfgDir, 0o755) }) //nolint:errcheck | ||
| if err := LoginWithToken("smsk_live_test"); err == nil { | ||
| t.Error("expected error when cfg.Save fails") | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'func Test(LoginWithToken_SaveError|Logout_SaveError)|os.Chmod|runtime.GOOS' internal/auth/handler_test.goRepository: supermodeltools/cli
Length of output: 406
🏁 Script executed:
#!/bin/bash
# Check imports at the top of the file
head -20 internal/auth/handler_test.go
# Get the exact skip conditions
sed -n '219,225p' internal/auth/handler_test.go
sed -n '255,261p' internal/auth/handler_test.goRepository: supermodeltools/cli
Length of output: 784
Skip chmod-dependent tests on Windows.
These two tests (TestLoginWithToken_SaveError and TestLogout_SaveError) use Unix file permissions with os.Chmod to simulate save errors. The tests only skip in CI environments, but they'll be flaky on Windows where permissions work differently.
Add "runtime" to imports and update both skip conditions (lines 220 and 256) to also skip on Windows:
if os.Getenv("CI") != "" || runtime.GOOS == "windows" {
t.Skip("skipping chmod-based test in CI")
}This way, the tests won't cause intermittent failures on Windows machines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/auth/handler_test.go` around lines 219 - 237, Update the
chmod-dependent tests TestLoginWithToken_SaveError and TestLogout_SaveError so
they also skip on Windows: add runtime to the imports and change the skip
condition from only checking CI (os.Getenv("CI") != "") to also skip when
runtime.GOOS == "windows" (e.g., if os.Getenv("CI") != "" || runtime.GOOS ==
"windows" { t.Skip(...) }), ensuring both tests use the same updated guard.
| func TestSave_WriteFileError(t *testing.T) { | ||
| if os.Getenv("CI") != "" { | ||
| t.Skip("skipping chmod-based test in CI") | ||
| } | ||
| home := t.TempDir() | ||
| t.Setenv("HOME", home) | ||
| cfgDir := filepath.Join(home, ".supermodel") | ||
| if err := os.MkdirAll(cfgDir, 0700); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if err := os.Chmod(cfgDir, 0555); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| t.Cleanup(func() { os.Chmod(cfgDir, 0755) }) //nolint:errcheck | ||
| cfg := &Config{APIKey: "test"} | ||
| if err := cfg.Save(); err == nil { | ||
| t.Error("Save should fail when config file cannot be written") | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'func TestSave_WriteFileError|os.Chmod|runtime.GOOS' internal/config/config_test.goRepository: supermodeltools/cli
Length of output: 301
🏁 Script executed:
sed -n '30,50p' internal/config/config_test.goRepository: supermodeltools/cli
Length of output: 562
🏁 Script executed:
sed -n '65,80p' internal/config/config_test.goRepository: supermodeltools/cli
Length of output: 463
🏁 Script executed:
head -20 internal/config/config_test.go | grep -E 'import|runtime'Repository: supermodeltools/cli
Length of output: 82
🏁 Script executed:
sed -n '200,215p' internal/config/config_test.goRepository: supermodeltools/cli
Length of output: 597
Add a Windows guard to skip chmod-based tests on Windows.
This test simulates a permission failure by calling os.Chmod(cfgDir, 0555), but on Windows, os.Chmod doesn't enforce Unix-like permission restrictions the same way. This means the test won't actually fail on Windows like it's supposed to, making it a flaky test on that platform.
The codebase already uses runtime.GOOS guards elsewhere (see lines 36 and 70), so add the same pattern here:
Suggested fix
func TestSave_WriteFileError(t *testing.T) {
- if os.Getenv("CI") != "" {
+ if os.Getenv("CI") != "" || runtime.GOOS == "windows" {
- t.Skip("skipping chmod-based test in CI")
+ t.Skip("skipping chmod-based test in CI and on Windows")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/config/config_test.go` around lines 207 - 225, The test
TestSave_WriteFileError relies on Unix-style os.Chmod behavior which doesn't
hold on Windows; add a guard using runtime.GOOS at the start of
TestSave_WriteFileError to call t.Skip when runtime.GOOS == "windows" (matching
existing guards used elsewhere) so the chmod-based simulation and subsequent
os.Chmod(cfgDir, 0555) are not executed on Windows; keep the rest of the test
(setting HOME, creating cfgDir, chmod, cleanup, creating Config and calling
cfg.Save()) unchanged.
| if err := walkZip("/nonexistent-dir-xyzzy-mcp", dest); err == nil { | ||
| t.Error("walkZip should fail when source directory does not exist") | ||
| } | ||
| } | ||
|
|
||
| func TestMCPWalkZip_OpenFileError(t *testing.T) { | ||
| if os.Getenv("CI") != "" { | ||
| t.Skip("skipping chmod-based test in CI") | ||
| } | ||
| src := t.TempDir() | ||
| secret := filepath.Join(src, "secret.go") | ||
| if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if err := os.Chmod(secret, 0000); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck | ||
| dest := filepath.Join(t.TempDir(), "out.zip") | ||
| if err := walkZip(src, dest); err == nil { | ||
| t.Error("walkZip should fail when a source file cannot be opened") | ||
| } | ||
| } | ||
|
|
||
| func TestMCPCreateZip_CreateTempError(t *testing.T) { | ||
| t.Setenv("TMPDIR", filepath.Join(t.TempDir(), "nonexistent-tmp")) | ||
| _, err := createZip(t.TempDir()) | ||
| if err == nil { | ||
| t.Error("createZip should fail when os.CreateTemp fails") | ||
| } | ||
| } | ||
|
|
||
| func TestMCPCreateZip_NonExistentDir(t *testing.T) { | ||
| _, err := createZip("/nonexistent-dir-mcp-createzip-xyz") | ||
| if err == nil { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "zip_test.go" --path "*/mcp/*"Repository: supermodeltools/cli
Length of output: 296
🏁 Script executed:
cat -n internal/mcp/zip_test.go | sed -n '130,170p'Repository: supermodeltools/cli
Length of output: 1603
🏁 Script executed:
# Get full context of the test file to understand the functions being tested
wc -l internal/mcp/zip_test.goRepository: supermodeltools/cli
Length of output: 91
🏁 Script executed:
# Check how os.CreateTemp determines temp directory across platforms
# Look for any platform-specific handling in the codebase
rg -A 5 "os.CreateTemp" internal/mcp/Repository: supermodeltools/cli
Length of output: 664
🏁 Script executed:
# Check if there are any imports or utility functions that handle cross-platform paths
head -50 internal/mcp/zip_test.goRepository: supermodeltools/cli
Length of output: 1274
🏁 Script executed:
# Verify the behavior - check Go's os.CreateTemp documentation behavior
# by searching for how the temp directory is selected across platforms
python3 << 'EOF'
# Document standard Go behavior for os.CreateTemp
print("Go's os.CreateTemp behavior:")
print("- On Windows: checks TMP, TEMP, USERPROFILE env vars (in order)")
print("- On Unix: checks TMPDIR env var")
print("- Falls back to os.TempDir() if none are set")
print("")
print("The review's concern:")
print("- Line 158 only sets TMPDIR")
print("- On Windows in CI, this won't affect os.CreateTemp behavior")
print("- Solution: set TMP and TEMP as well for full cross-platform coverage")
EOFRepository: supermodeltools/cli
Length of output: 415
🏁 Script executed:
# Confirm the patch would work by checking if createZip and walkZip use standard Go file ops
rg -A 10 "func createZip\|func walkZip" internal/mcp/Repository: supermodeltools/cli
Length of output: 45
Make nonexistent-path and temp-dir failure tests OS-portable.
Lines 133 and 166 hardcode Unix absolute paths that start with /. On Windows, paths look like C:\temp, so these tests will fail there. Line 158 also only sets TMPDIR, but Windows respects the TMP and TEMP environment variables instead. Fix both issues by building missing paths under t.TempDir() (portable and creates unique dirs) and setting all three common temp env vars.
Suggested fixes
func TestMCPWalkZip_WalkError(t *testing.T) {
dest := filepath.Join(t.TempDir(), "out.zip")
- if err := walkZip("/nonexistent-dir-xyzzy-mcp", dest); err == nil {
+ missing := filepath.Join(t.TempDir(), "does-not-exist")
+ if err := walkZip(missing, dest); err == nil {
t.Error("walkZip should fail when source directory does not exist")
}
}
func TestMCPCreateZip_CreateTempError(t *testing.T) {
- t.Setenv("TMPDIR", filepath.Join(t.TempDir(), "nonexistent-tmp"))
+ invalidTmp := filepath.Join(t.TempDir(), "nonexistent-tmp")
+ t.Setenv("TMPDIR", invalidTmp)
+ t.Setenv("TMP", invalidTmp)
+ t.Setenv("TEMP", invalidTmp)
_, err := createZip(t.TempDir())
if err == nil {
t.Error("createZip should fail when os.CreateTemp fails")
}
}
func TestMCPCreateZip_NonExistentDir(t *testing.T) {
- _, err := createZip("/nonexistent-dir-mcp-createzip-xyz")
+ _, err := createZip(filepath.Join(t.TempDir(), "does-not-exist"))
if err == nil {
t.Error("createZip should fail when directory does not exist")
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/mcp/zip_test.go` around lines 133 - 167, Tests use hardcoded Unix
absolute paths and only set TMPDIR, which breaks on Windows; update tests to
build missing directories under t.TempDir() and set all common temp env vars.
Replace occurrences of "/nonexistent-dir-xyzzy-mcp" and
"/nonexistent-dir-mcp-createzip-xyz" with filepath.Join(t.TempDir(),
"nonexistent-...") so they are OS-portable; in TestMCPCreateZip_CreateTempError
set TMPDIR, TMP and TEMP (use t.Setenv for each) to a non-existent path under
t.TempDir() before calling createZip; keep assertions and use the same function
names walkZip and createZip to locate changes.
| func TestMCPWalkZip_OpenFileError(t *testing.T) { | ||
| if os.Getenv("CI") != "" { | ||
| t.Skip("skipping chmod-based test in CI") | ||
| } | ||
| src := t.TempDir() | ||
| secret := filepath.Join(src, "secret.go") | ||
| if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| if err := os.Chmod(secret, 0000); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck | ||
| dest := filepath.Join(t.TempDir(), "out.zip") | ||
| if err := walkZip(src, dest); err == nil { | ||
| t.Error("walkZip should fail when a source file cannot be opened") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "zip_test.go" | head -20Repository: supermodeltools/cli
Length of output: 263
🏁 Script executed:
wc -l internal/mcp/zip_test.goRepository: supermodeltools/cli
Length of output: 91
🏁 Script executed:
cat -n internal/mcp/zip_test.go | sed -n '130,160p'Repository: supermodeltools/cli
Length of output: 1268
🏁 Script executed:
rg -A 10 "func walkZip" internal/mcp/ | head -30Repository: supermodeltools/cli
Length of output: 530
🏁 Script executed:
rg -B 5 -A 20 "func walkZip" internal/mcp/zip.go | head -40Repository: supermodeltools/cli
Length of output: 618
🏁 Script executed:
cat -n internal/mcp/zip.go | sed -n '25,80p'Repository: supermodeltools/cli
Length of output: 2108
🏁 Script executed:
cat -n internal/mcp/zip.go | sed -n '82,140p'Repository: supermodeltools/cli
Length of output: 1419
🏁 Script executed:
rg -i "windows|root|permission" internal/mcp/zip_test.goRepository: supermodeltools/cli
Length of output: 45
Probe actual file readability before running the permission test.
The current CI-only skip is too narrow. On Windows, when running as root, or in some containers, chmod 0000 doesn't actually prevent file reads. This means the test can flake unpredictably—the file stays readable, walkZip succeeds when it should fail, and your test fails.
Instead of relying on environment checks, try to open the file after chmod. If it still opens, skip the test gracefully. This way, the test runs when it can run, and skips when the precondition (unreadable file) can't be established. Much more reliable.
Suggested fix
func TestMCPWalkZip_OpenFileError(t *testing.T) {
- if os.Getenv("CI") != "" {
- t.Skip("skipping chmod-based test in CI")
- }
src := t.TempDir()
secret := filepath.Join(src, "secret.go")
if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil {
t.Fatal(err)
}
if err := os.Chmod(secret, 0000); err != nil {
t.Fatal(err)
}
t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck
+ if f, err := os.Open(secret); err == nil {
+ _ = f.Close()
+ t.Skip("platform/user can still read chmod 0000 file")
+ }
dest := filepath.Join(t.TempDir(), "out.zip")
if err := walkZip(src, dest); err == nil {
t.Error("walkZip should fail when a source file cannot be opened")
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/mcp/zip_test.go` around lines 138 - 154, The test
TestMCPWalkZip_OpenFileError should verify the unreadability precondition after
chmod: after calling os.Chmod(secret, 0000) attempt to open the file (e.g.,
os.Open(secret)) and if the open succeeds (meaning chmod didn't make it
unreadable) call t.Skipf with a clear reason so the test is skipped rather than
flaking; keep the existing t.Cleanup to restore permissions and then proceed to
call walkZip(dest) expecting an error only when the open actually fails.
| func initCleanMCPGitRepo(t *testing.T) string { | ||
| t.Helper() | ||
| dir := t.TempDir() | ||
| run := func(args ...string) { | ||
| t.Helper() | ||
| cmd := exec.Command(args[0], args[1:]...) | ||
| cmd.Dir = dir | ||
| if out, err := cmd.CombinedOutput(); err != nil { | ||
| t.Fatalf("git setup %v: %v\n%s", args, err, out) | ||
| } | ||
| } | ||
| run("git", "init") | ||
| run("git", "config", "user.email", "ci@test.local") | ||
| run("git", "config", "user.name", "CI") | ||
| if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| run("git", "add", ".") | ||
| run("git", "commit", "-m", "init") | ||
| return dir |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and check the function
fd -t f "zip_test.go" | head -5Repository: supermodeltools/cli
Length of output: 210
🏁 Script executed:
# Check if the file exists and read the relevant section
if [ -f "internal/mcp/zip_test.go" ]; then
wc -l internal/mcp/zip_test.go
fiRepository: supermodeltools/cli
Length of output: 91
🏁 Script executed:
# Search for the initCleanMCPGitRepo function
rg "initCleanMCPGitRepo" -A 25 internal/mcp/Repository: supermodeltools/cli
Length of output: 3476
🏁 Script executed:
# Check for other git-related test functions and patterns
rg "exec\.Command.*git" -B 3 -A 3 internal/Repository: supermodeltools/cli
Length of output: 8885
🏁 Script executed:
# Look for existing patterns of skip or LookPath usage in tests
rg "t\.Skip|exec\.LookPath" -B 2 -A 2 --type goRepository: supermodeltools/cli
Length of output: 15293
Skip tests gracefully when git is unavailable.
Right now, if someone runs tests on a system without git installed, the whole initCleanMCPGitRepo helper will crash hard. Since three different tests depend on this helper, they'll all fail for an infrastructure reason (missing git), not because of an actual bug in your code. Better to skip them cleanly instead.
Add a quick exec.LookPath("git") check at the start of the helper—this is already a pattern used elsewhere in your codebase for checking if tools exist. If git is missing, the test skips with a clear message. That way, on systems without git, tests are skipped rather than failing.
Proposed patch
func initCleanMCPGitRepo(t *testing.T) string {
t.Helper()
+ if _, err := exec.LookPath("git"); err != nil {
+ t.Skip("git is not installed in this environment")
+ }
dir := t.TempDir()
run := func(args ...string) {
t.Helper()
cmd := exec.Command(args[0], args[1:]...)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func initCleanMCPGitRepo(t *testing.T) string { | |
| t.Helper() | |
| dir := t.TempDir() | |
| run := func(args ...string) { | |
| t.Helper() | |
| cmd := exec.Command(args[0], args[1:]...) | |
| cmd.Dir = dir | |
| if out, err := cmd.CombinedOutput(); err != nil { | |
| t.Fatalf("git setup %v: %v\n%s", args, err, out) | |
| } | |
| } | |
| run("git", "init") | |
| run("git", "config", "user.email", "ci@test.local") | |
| run("git", "config", "user.name", "CI") | |
| if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil { | |
| t.Fatal(err) | |
| } | |
| run("git", "add", ".") | |
| run("git", "commit", "-m", "init") | |
| return dir | |
| func initCleanMCPGitRepo(t *testing.T) string { | |
| t.Helper() | |
| if _, err := exec.LookPath("git"); err != nil { | |
| t.Skip("git is not installed in this environment") | |
| } | |
| dir := t.TempDir() | |
| run := func(args ...string) { | |
| t.Helper() | |
| cmd := exec.Command(args[0], args[1:]...) | |
| cmd.Dir = dir | |
| if out, err := cmd.CombinedOutput(); err != nil { | |
| t.Fatalf("git setup %v: %v\n%s", args, err, out) | |
| } | |
| } | |
| run("git", "init") | |
| run("git", "config", "user.email", "ci@test.local") | |
| run("git", "config", "user.name", "CI") | |
| if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil { | |
| t.Fatal(err) | |
| } | |
| run("git", "add", ".") | |
| run("git", "commit", "-m", "init") | |
| return dir |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/mcp/zip_test.go` around lines 172 - 191, The initCleanMCPGitRepo
helper should detect missing git and skip tests instead of failing; add an
exec.LookPath("git") call at the start of initCleanMCPGitRepo and if it returns
an error call t.Skipf with a clear message (e.g. "git not available: %v") so
tests depending on initCleanMCPGitRepo are skipped when git is not installed;
keep t.Helper() as-is and ensure subsequent git commands assume git exists.
|
Closing — this PR accumulated 14k lines of unrelated commits from rebasing. Reopening as a clean 196-line PR on feat/skill-command. |
Summary
supermodel skillprints a generic prompt that teaches AI agents how to use .graph filesBenchmark (Django, 8 failing tests)
What changed in the prompt
X.graph.extdirectly instead ofls-ingTest plan
go build ./...passesSummary by CodeRabbit
New Features
Documentation
Tests
Chores
Bug Fixes