feat: skill command — auto-generated prompt matches hand-crafted benchmark by jonathanpopham · Pull Request #123 · supermodeltools/cli

jonathanpopham · 2026-04-13T19:31:46Z

Summary

supermodel skill prints a generic prompt that teaches AI agents how to use .graph files
Prompt is repo/language agnostic — works on any codebase with shards
Benchmark-tuned: traced tool calls from Django benchmark, eliminated 4 wasted turns

Benchmark (Django, 8 failing tests)

Variant	Cost	Duration	Turns
Naked (no shards)	$0.30	122s	20
Supermodel (Grey's hand-crafted prompt)	$0.12	29s	9
Skill v2 (this PR)	$0.11	31s	7
Skill v1 (before tuning)	$0.15	42s	11

What changed in the prompt

Naming convention — tells agents to construct X.graph.ext directly instead of ls-ing
Bold read-order directive — "Read the .graph file before the source file"
Check graph before grepping — agents were wasting turns grepping for structure the graph already had

Test plan

go build ./... passes
Docker benchmark: skill v1 ($0.15, 42s, 11 turns)
Docker benchmark: skill v2 ($0.11, 31s, 7 turns) — matches hand-crafted
All 8 Django tests pass in both runs

Summary by CodeRabbit

New Features
- Added a CLI "skill" command to print an agent prompt; added benchmark orchestration (Dockerfiles, entrypoints, run/compare scripts) and a runnable Django benchmark app.
Documentation
- New benchmark guides, task briefs, and result writeups to explain experiments and usage.
Tests
- Massive expansion of unit and integration tests across many packages to improve coverage.
Chores
- Expanded .gitignore to exclude OS, Python, Go coverage, and generated graph artifacts.
Bug Fixes
- Improved Unicode-safe token/counting and masking behavior and various robustness fixes.

coderabbitai · 2026-04-13T19:31:59Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9f79c1de-9470-47b8-966b-b1405ba268fd

📥 Commits

Reviewing files that changed from the base of the PR and between 0c1589f and 17875e3.

📒 Files selected for processing (1)

cmd/skill_test.go

Walkthrough

Adds a new CLI subcommand skill that prints a repo-facing prompt, many benchmark artifacts and Dockerfiles for a Django benchmark, several rune/UTF-8 and non-mutating fixes, and a very large set of new unit/integration tests across many internal packages and zip/git/shards/archdocs flows.

Changes

Cohort / File(s)	Summary
CLI `cmd/skill.go`, `cmd/skill_test.go`	Adds `skill` Cobra subcommand that prints a package-level `skillPrompt`; tests validate prompt content and minimal length.
Repo ignores `.gitignore`	Adds ignores for `.DS_Store`, Python bytecode, Go coverage outputs, and generated `.graph.*` sidecars.
Benchmark infra & reports `benchmark/`, `benchmark/.sh`, `benchmark/Dockerfile.`, `benchmark/task.md`, `benchmark/results/`	Adds benchmark docs, two Dockerfiles, entrypoint scripts, run/compare scripts, task description, and example results/report drafts.
Django benchmark app `benchmark/django_app/...`	Adds a minimal Django project (manage.py, settings, urls, wsgi), `todos` app (models, views, urls, admin, tests), and pinned requirements.
Change-tracking tests & scaffolding `benchmark/change_tracking/*`	Adds placeholder models file and a comprehensive test suite exercising EmailChangeRecord semantics.
Benchmark entrypoints `benchmark/entrypoint.*.sh`, `benchmark/run.sh`, `benchmark/compare.sh`	Adds benchmark orchestration scripts (naked & supermodel flows), comparison tooling, and run orchestration for Docker images.
Zip / git packaging tests `internal//zip_test.go`, `internal//zip_test.go` (many packages)	Adds extensive zip/git-related tests across multiple packages (walkZip/createZip/gitArchive, large-file skipping, error paths, CI guards).
Archdocs / pssg / render / output tests `internal/archdocs/...`	Large addition of tests for rendering, templates, sitemaps/RSS/manifest, SVG share images, JSON‑LD schemas, taxonomy, loader, and enrichment cache.
MCP / server tests `internal/mcp/*.go`	Adds comprehensive MCP server tests, helpers, and coverage for dispatch, analysis, caching, and zip paths.
Shards package tests & watcher `internal/shards/*`	Adds many tests for shards rendering, watcher, graph merge, affected-file computation, zip behavior, and watcher/gitrepo integration.
Find/dedup behavior `internal/find/handler.go`, `internal/find/handler_test.go`	`dedupSorted` changed to sort a copy instead of mutating input; tests ensure no in-place mutation and add human-format output coverage.
Focus traversal & token estimate `internal/focus/handler.go`, `internal/focus/handler_test.go`	Optimizes reachableImports by indexing import edges; `estimateTokens` now rune-aware; extensive tests added.
Rune/UTF-8 fixes `internal/api/client.go`, `internal/restore/render.go`, `internal/setup/wizard.go`	Make HTTP snippet truncation, token counting, and API-key masking rune-aware (use runes/utf8) to preserve Unicode correctness.
Restore tweaks & tests `internal/restore/*`	`CountTokens` switched to rune-based; added tests for multi-byte inputs, rendering, truncation, parsing, and detection.
Cache / compact / audit & other tests `internal/{cache,compact,audit,...}_test.go`	Wide set of new and expanded tests for cache behavior, source compaction, audit rendering, and many other packages.
Minor IO fix `cmd/restore.go`	Improved error handling to propagate `src.Close()` error when appropriate.

Sequence Diagram(s)

(Skipped — changes are primarily tests, scripts, docs, and a small CLI command; no new multi-component sequential flow requiring visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: add supermodel skill command for agent awareness prompt #119: Adds the same skill subcommand and skillPrompt — likely a direct implementation sibling.
revert: remove skill command (needs redesign) #122: Reverts/removes the skill command and prompt — likely a conflicting or follow-up change.
Add unit tests for internal/focus package #72: Overlaps changes and tests touching internal/focus (reachableImports / token estimation) — strong code-level relation.

Suggested reviewers

greynewell

Poem

A tiny command to print the plan,
Benchmarks, Docker, scripts to span,
Rune-wise fixes so text stays true,
Tests stacked high to catch the glue,
CI hums — merge when green, then grin 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: a new 'skill' CLI command with an auto-generated prompt that matches hand-crafted benchmark performance.
Description check	✅ Passed	The description is comprehensive and well-structured, covering what changed, benchmark results, concrete improvements made to the prompt, and a detailed test plan with checkbox tracking.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch draft/skill-redesign

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cmd/skill.go (1)
45-47: Use RunE with cmd.OutOrStdout() for better testability and output handling.

Right now, fmt.Println just dumps text to stdout without checking if the write succeeded. In Cobra, it's cleaner to use RunE (which can return errors) paired with cmd.OutOrStdout(). Here's why it matters:

It's testable: Tests can inject a mock writer instead of hitting real stdout

It respects output redirection: Users can pipe the output or redirect it without issues

It handles errors: If the pipe breaks or write fails, you'll know about it instead of silently ignoring it

Think of it like the difference between throwing a letter into the void vs. getting a delivery confirmation—cmd.OutOrStdout() gives you feedback that the write actually worked.
Suggested diff
-		Run: func(cmd *cobra.Command, args []string) {
-			fmt.Println(skillPrompt)
-		},
+		RunE: func(cmd *cobra.Command, args []string) error {
+			_, err := fmt.Fprintln(cmd.OutOrStdout(), skillPrompt)
+			return err
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/skill.go` around lines 45 - 47, Replace the anonymous Run handler with
RunE so the command can return write errors and use the command's output writer;
specifically, change the current Run function (the block containing
fmt.Println(skillPrompt)) to RunE and write the prompt to cmd.OutOrStdout(),
capturing and returning any error (e.g., call fmt.Fprintln(cmd.OutOrStdout(),
skillPrompt) and return the error) so tests can inject writers and write
failures are surfaced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/skill.go`:
- Around line 34-50: The skill command is missing from the noConfigCommands
exemption and is being blocked by the PersistentPreRunE preflight; add "skill"
to the noConfigCommands map in cmd/root.go so the command created in init() (the
cobra.Command with Use: "skill" that prints skillPrompt) can run without config.
Locate the noConfigCommands map in cmd/root.go and add the same string key used
for other exempt commands (e.g., "version", "help") so the PersistentPreRunE
allows the skill command to execute on fresh installs.

---

Nitpick comments:
In `@cmd/skill.go`:
- Around line 45-47: Replace the anonymous Run handler with RunE so the command
can return write errors and use the command's output writer; specifically,
change the current Run function (the block containing fmt.Println(skillPrompt))
to RunE and write the prompt to cmd.OutOrStdout(), capturing and returning any
error (e.g., call fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) and return the
error) so tests can inject writers and write failures are surfaced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97994584-11a4-435f-a641-2400e567083d

📥 Commits

Reviewing files that changed from the base of the PR and between 7261bed and d977e4b.

📒 Files selected for processing (1)

cmd/skill.go

coderabbitai · 2026-04-13T19:34:35Z

cmd/skill.go

+func init() {
+	c := &cobra.Command{
+		Use:   "skill",
+		Short: "Print agent awareness prompt for graph files",
+		Long: `Prints a prompt that teaches AI coding agents how to use Supermodel's
+graph files. Pipe into your agent's instructions:
+
+  supermodel skill >> CLAUDE.md
+  supermodel skill >> AGENTS.md
+  supermodel skill >> .cursorrules`,
+		Args: cobra.NoArgs,
+		Run: func(cmd *cobra.Command, args []string) {
+			fmt.Println(skillPrompt)
+		},
+	}
+
+	rootCmd.AddCommand(c)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether `skill` is exempted from config/API-key preflight checks. rg -n -C4 '\bnoConfigCommands\b|PersistentPreRunE' cmd/root.go rg -n -C2 '"skill"' cmd/root.go

Repository: supermodeltools/cli

Length of output: 874

🏁 Script executed:

cat cmd/root.go | sed -n '14,30p'

Repository: supermodeltools/cli

Length of output: 846

🏁 Script executed:

sed -n '14,40p' cmd/root.go

Repository: supermodeltools/cli

Length of output: 1225

Add skill to the noConfigCommands map in cmd/root.go.

The skill command is blocked by the PersistentPreRunE preflight check because it's not listed in noConfigCommands (lines 16-30). Since skill just prints an informational prompt with zero setup or API dependencies, it should be exempted so users can run supermodel skill on fresh installs without config. Add it to the map like the other no-config commands (version, help, etc.).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/skill.go` around lines 34 - 50, The skill command is missing from the noConfigCommands exemption and is being blocked by the PersistentPreRunE preflight; add "skill" to the noConfigCommands map in cmd/root.go so the command created in init() (the cobra.Command with Use: "skill" that prints skillPrompt) can run without config. Locate the noConfigCommands map in cmd/root.go and add the same string key used for other exempt commands (e.g., "version", "help") so the PersistentPreRunE allows the skill command to execute on fresh installs.

…askKey - restore/render.go: countTokens used len(text)/4 (bytes) instead of utf8.RuneCountInString(text)/4 (runes) — inflated estimate for any text containing multi-byte characters (CJK, emoji, accented chars) - api/client.go: HTTP error snippet was truncated at byte offset 300, risking invalid UTF-8 when a multi-byte character straddles that boundary - setup/wizard.go: maskKey sliced the key at byte positions 8 and len-4; same class of bug fixed elsewhere in the codebase Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… estimate uses bytes - find/handler.go: dedupSorted used ss[:1] as the output head, sharing the backing array with the input slice. Appends to out overwrote the caller's map value. Fixed by making an explicit copy before sorting. Adds TestDedupSorted_DoesNotMutateInput regression test. - focus/handler.go: reachableImports iterated all relationships for every node in the BFS queue (O(depth × queue × rels)). Pre-index imports edges by source node to reduce the BFS to O(rels + visited). - focus/handler.go: estimateTokens used len(s)/4 (bytes) instead of utf8.RuneCountInString(s)/4 (runes), inflating the hint for non-ASCII paths or function names. - Adds TestCountTokens_MultiByteChars to guard the rune-count fix in restore/render.go (landed in previous commit). - Adds wizard_test.go with full coverage for maskKey including a multi-byte rune regression test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

If io.Copy succeeded but src.Close() failed, the close error was silently discarded. Preserve the intent of not using defer (to avoid accumulating open handles across Walk iterations) while returning the close error when no copy error occurred. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d shards - api: add GraphFromShardIR tests (Nodes, Rels, RepoID, NodeByID, NodesByLabel, empty) - blastradius: new zip_test.go covering isGitRepo, isWorktreeClean, walkZip (hidden files, skip dirs, createZip round-trip) - cache: add PutJSON/GetJSON round-trip, miss, and overwrite tests - mcp: add boolArg and intArg tests - render/funcs: add 23 tests for totalTime, formatDuration, seq, dict, reverseStrings, minInt/maxInt, length, entity accessors (fieldAccess, sectionAccess, getStringSlice, hasField, getInt, getFloat), jsonMarshal/toJSON, defaultVal/ternary/hasKey, parseQuantity, parseUnit, parseIngredient*, fractionDisplay, scaleQty - shards: new graph_test.go (25 tests for Cache.Build, SourceFiles, FuncName, TransitiveDependents, computeStats, isShardPath, firstString, intProp) - shards: extend render_test.go (CommentPrefix, ShardFilename, Header, sortedUnique, sortedBoolKeys, formatLoc, renderDepsSection, renderImpactSection, RenderGraph, WriteShard, updateGitignore) Coverage deltas: render 11%→45%, blastradius 32%→60%, cache 61%→78% Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- setup: add tests for boolPtr, detectCursor, installHook (fresh install, idempotency, existing settings preservation, invalid JSON guard) - schema: add NewGenerator, GenerateWebSiteSchema, GenerateBreadcrumbSchema, GenerateFAQSchema, GenerateItemListSchema, MarshalSchemas tests (21% → 51%) - output: new output_test.go with tests for GenerateRobotsTxt (allow-all, standard bots, extra bots), GenerateManifest (JSON validity), NewSitemapEntry (basic, root path, trailing slash), chunkEntries, GenerateSitemapFiles (single file, multiple files with index, XML validity) (23% → 74%) - taxonomy: add toStringSlice, HubPageURL, LetterPageURL, FindEntry, ComputePagination (single page, multi-page, empty) tests (21% → ~50%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…functions - config: add tests for ShardsEnabled (nil/true/false), applyEnv (SUPERMODEL_API_KEY, SUPERMODEL_API_BASE, SUPERMODEL_SHARDS=false), applyDefaults (from file) - focus: add extractTypes tests (class detection, other-file exclusion) and extract with includeTypes=true integration - build: add shareImageURL, countTaxEntries, countFieldDistribution (basic, limit, empty), toBreadcrumbItems tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- compact: nextShortName now at 100% (two-char overflow, skip existing, skip builtins) - deadcode: new zip_test.go covers isGitRepo, isWorktreeClean, walkZip, createZip (0% → 65%) - mcp: new zip_test.go covers same zip helpers (26% → 41%) - graph2md: pure function tests for getStr, getNum, mermaidID, generateSlug (100% each); Run-based tests for Class/Type/Domain/Subdomain/Directory nodes (40% → 64%) - output: GenerateRSSFeeds with disabled/main/default-path/category-feed cases (74% → 87%) - render: BuildFuncMap smoke test; sliceHelper with all 3 slice types + passthrough; shareimage functions svgEscape, truncate, renderBarsSVG, all Generate*ShareSVG variants (45% → 70%) - build: toTemplateHTML, writeShareSVG, maybeWriteShareSVG with enabled/disabled (6% → 7.5%) - auth: Logout_AlreadyLoggedOut branch (17% → 19%) - setup: detectClaude with ~/.claude present (23% → 25%) - analyze: TestIsWorktreeClean_NonGitDir added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- schema: GenerateRecipeSchema (basic + author/times/nutrition) and GenerateCollectionPageSchema (50% → 87%) - taxonomy: BuildAll (basic, min_entities filter, multi-value, empty), extractValues (single, missing, enrichment), getEnrichmentOverrides (simple, missing, array path, not-array cases) → 51% → 97% - shards/render: RenderAll tests (empty, writes shards, dry-run, unknown file) → 36% → 43% - shards/zip: new zip_test.go covering isShardFile, matchPattern, shouldInclude (basic/dir/ext/shard/minified/large), buildExclusions (no config, custom config, invalid JSON) - find/zip: add TestIsWorktreeClean_NonGitDir to cover previously-uncovered branch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…overage Adds 11 375 lines of new tests spanning 39 files, covering previously untested packages and edge cases throughout the codebase. Highlights: - internal/audit: new test file from 0% → 85.5% (RenderHealth, RenderRunPrompt, RenderImprovePrompt, Analyze, EnrichWithImpact, and all health-scoring helpers) - internal/mcp: 80.9% → 85.6% via cache-hit tests for toolDeadCode and toolBlastRadius (pre-populate JSON cache, bypass API calls) - internal/archdocs/graph2md: 2 451-line test file covering all major node-type renderers, mermaid diagram generation, and arch-map output - internal/shards: daemon UUID, graph traversal, WriteShard, zip helpers - internal/restore, focus, find, compact, cache, config, auth, blastradius, deadcode, setup, status: targeted tests for all reachable pure functions and edge-case branches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…laude Code Includes Dockerfiles, entrypoints, task definition, run scripts, raw output logs, summary, and blog post draft showing 40% cost reduction and 4x speed improvement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Removes accidentally committed macOS and Python cache files from benchmark/. Adds .DS_Store, __pycache__/*.pyc, cover*.out, and *.graph.{go,py} to .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Revised the generic skill prompt based on benchmark trace analysis. Three changes: teach the .graph naming convention so agents construct paths directly, bold the read-order directive, and tell agents to check graph files before grepping for structure. Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns) Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns

Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (14)

internal/graph/handler_test.go-367-383 (1)

367-383: ⚠️ Potential issue | 🟡 Minor

Fallback-label test is too weak right now.

At Line 381, checking for "node-without-name" anywhere can pass even if fallback labeling breaks, because the node ID is always emitted as the DOT node key.
Please assert the label field specifically.

Suggested assertion tightening

 func TestWriteDOT_NodeWithNoNameUsesID(t *testing.T) {
@@
 	out := buf.String()
-	if !strings.Contains(out, "node-without-name") {
-		t.Errorf("node ID should appear as fallback label:\n%s", out)
+	expected := `"node-without-name" [label="node-without-name"];`
+	if !strings.Contains(out, expected) {
+		t.Errorf("node ID should be used as fallback label, got:\n%s", out)
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/graph/handler_test.go` around lines 367 - 383, The test
TestWriteDOT_NodeWithNoNameUsesID is too weak because it only checks for the
node ID anywhere in the DOT output; update the assertion to verify the DOT
node's label attribute specifically by inspecting the string emitted by writeDOT
for the node key (e.g., "node-without-name") followed by a label assignment like
[label="..."] or label=<...>; change the check to look for the exact label field
(for example assert that the node line contains something like
`node-without-name` followed by `label="node-without-name"` or the appropriate
DOT label syntax) so the test fails if labeling falls back to the ID
incorrectly.

internal/archdocs/pssg/taxonomy/taxonomy_test.go-44-49 (1)

44-49: ⚠️ Potential issue | 🟡 Minor

Avoid direct indexing before asserting taxonomy count

Line 44 and Line 61 index taxes[0] without first asserting len(taxes). If BuildAll behavior shifts, these tests will panic instead of failing with a clear message.

Suggested fix

 func TestBuildAll_MinEntitiesFilter(t *testing.T) {
@@
 	tc := config.TaxonomyConfig{Name: "cuisine", Field: "cuisine", MinEntities: 2}
 	taxes := BuildAll(entities, []config.TaxonomyConfig{tc}, nil)
+	if len(taxes) != 1 {
+		t.Fatalf("expected 1 taxonomy, got %d", len(taxes))
+	}
@@
 	if len(taxes[0].Entries) != 1 {
 		t.Errorf("expected 1 entry (only Japanese), got %d", len(taxes[0].Entries))
 	}
@@
 func TestBuildAll_MultiValue(t *testing.T) {
@@
 	tc := config.TaxonomyConfig{Name: "tags", Field: "tags", MultiValue: true, MinEntities: 1}
 	taxes := BuildAll(entities, []config.TaxonomyConfig{tc}, nil)
+	if len(taxes) != 1 {
+		t.Fatalf("expected 1 taxonomy, got %d", len(taxes))
+	}
@@
 	if len(taxes[0].Entries) != 3 {
 		t.Errorf("multi-value: expected 3 entries, got %d: %v", len(taxes[0].Entries), taxes[0].Entries)
 	}
 }

Also applies to: 61-63

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 44 - 49, The
test indexes taxes[0] before asserting the length; modify the test in
taxonomy_test.go to first check len(taxes) (e.g., require at least 1) and fail
fast (t.Fatalf or t.Fatalf-style message) if it's not as expected before
accessing taxes[0], and do the same for the second occurrence around lines that
check taxes[0].Entries—ensure you validate len(taxes) and then validate
len(taxes[0].Entries) to avoid panics if BuildAll changes behavior.

internal/archdocs/pssg/enrichment/cache_test.go-3-8 (1)

3-8: ⚠️ Potential issue | 🟡 Minor

Unreadable-dir test skip condition is too narrow and can be flaky.

Line 114 only checks CI, but chmod-based permission tests are also unreliable on some platforms. Add an OS guard to reduce false failures.

Suggested patch

 import (
 	"encoding/json"
 	"os"
 	"path/filepath"
+	"runtime"
 	"testing"
 )
@@
 func TestReadAllCaches_UnreadableDir(t *testing.T) {
-	if os.Getenv("CI") != "" {
-		t.Skip("skipping chmod-based test in CI")
+	if os.Getenv("CI") != "" || runtime.GOOS == "windows" {
+		t.Skip("skipping chmod-based test on CI/Windows")
 	}

Also applies to: 113-127

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 3 - 8, The
unreadable-dir permission test currently only skips when CI is set, which is
flaky on platforms that don't support chmod semantics; modify the test skip
condition in cache_test.go for the unreadable-dir test to also detect
unsupported OSes by importing runtime and skipping when runtime.GOOS is
"windows" (and optionally other non-Unix OSes like "plan9" or "js") in addition
to the existing CI check; update the test function that sets directory
permissions to return t.Skipf(...) when runtime.GOOS indicates an unsupported
platform and ensure the runtime import is added to the import block.

internal/archdocs/pssg/enrichment/cache_test.go-79-87 (1)

79-87: ⚠️ Potential issue | 🟡 Minor

Use a deterministic non-existent path instead of a hardcoded absolute path.

"/nonexistent-enrichment-dir-xyz" is brittle and platform-dependent. Build a guaranteed-missing path from t.TempDir() for stable behavior.

Suggested patch

 func TestReadAllCaches_NonExistentDir(t *testing.T) {
-	result, err := ReadAllCaches("/nonexistent-enrichment-dir-xyz")
+	nonExistentDir := filepath.Join(t.TempDir(), "does-not-exist")
+	result, err := ReadAllCaches(nonExistentDir)
 	if err != nil {
 		t.Fatalf("non-existent dir should return empty result, not error: %v", err)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 79 - 87, The
test TestReadAllCaches_NonExistentDir uses a hardcoded absolute path which is
brittle; change it to create a deterministic missing path by using t.TempDir()
and joining a guaranteed-nonexistent subdirectory (e.g.
filepath.Join(t.TempDir(), "nonexistent-enrichment-dir-xyz")) before calling
ReadAllCaches, and update the test to import filepath if required; keep the same
assertions (expect no error and empty result).

internal/archdocs/pssg/loader/loader_test.go-194-202 (1)

194-202: ⚠️ Potential issue | 🟡 Minor

TestDeriveSlug_FromField currently can’t catch fallback behavior regressions.

At Line 199–Line 201, the test only checks non-empty output, and the filename/title can normalize to the same slug. That can mask a bug where field:title is ignored.

Suggested test hardening

 func TestDeriveSlug_FromField(t *testing.T) {
 	l := &MarkdownLoader{Config: &config.Config{
 		Data: config.DataConfig{EntitySlug: config.EntitySlug{Source: "field:title"}},
 	}}
 	fields := map[string]interface{}{"title": "My Recipe!"}
-	slug := l.deriveSlug("/data/my-recipe.md", fields)
-	if slug == "" {
-		t.Error("expected non-empty slug from field")
+	slug := l.deriveSlug("/data/fallback-file.md", fields)
+	if slug == "" || slug == "fallback-file" {
+		t.Errorf("expected slug derived from field:title, got %q", slug)
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/loader/loader_test.go` around lines 194 - 202,
TestDeriveSlug_FromField is too weak because it only asserts the slug is
non-empty and the input filename could normalize to the same slug; update the
test to assert the exact expected slug derived from the title field (e.g.,
expect "my-recipe" when fields["title"]="My Recipe!") and change the input path
passed to l.deriveSlug (e.g., "/data/unrelated-file.md") so the filename would
not normalize to the same slug; this ensures deriveSlug (method deriveSlug on
MarkdownLoader) actually uses the field: compare slug == "my-recipe" and fail
the test if it does not.

internal/archdocs/pssg/loader/loader_test.go-321-345 (1)

321-345: ⚠️ Potential issue | 🟡 Minor

Unreadable-file test is environment-sensitive and can flake locally.

At Line 330+, chmod 0000 is not a reliable unreadability guarantee across OS/user contexts. The test should verify unreadability before asserting skip behavior.

Suggested stabilization

 import (
 	"os"
 	"path/filepath"
+	"runtime"
 	"testing"
@@
 func TestLoad_UnreadableFile(t *testing.T) {
 	if os.Getenv("CI") != "" {
 		t.Skip("skipping chmod-based test in CI")
 	}
+	if runtime.GOOS == "windows" {
+		t.Skip("chmod-based unreadable-file semantics are not reliable on Windows")
+	}
 	dir := t.TempDir()
 	f := filepath.Join(dir, "locked.md")
@@
 	if err := os.Chmod(f, 0000); err != nil {
 		t.Fatal(err)
 	}
+	if _, err := os.ReadFile(f); err == nil {
+		t.Skip("file is still readable in this environment; skipping unreadable-file assertion")
+	}
 	t.Cleanup(func() { os.Chmod(f, 0600) }) //nolint:errcheck

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/loader/loader_test.go` around lines 321 - 345, The
TestLoad_UnreadableFile test sets file mode to 0000 but doesn't verify the file
is actually unreadable on the current environment; update
TestLoad_UnreadableFile to assert unreadability before relying on the behavior
of MarkdownLoader.Load: after os.Chmod(f, 0000) attempt to open the file (or
read it) and assert that it returns a permission error, and if it does not
(i.e., the file is still readable) call t.Skipf to skip the test on this
environment; keep the existing t.Cleanup that restores permissions and then run
l.Load() expecting a warn+skip and zero entities (MarkdownLoader.Load remains
unchanged).

internal/shards/graph_test.go-519-528 (1)

519-528: ⚠️ Potential issue | 🟡 Minor

Test doesn't actually verify anything meaningful.

func TestComputeStats_FromCache(t *testing.T) {
    // ...
    stats.FromCache = true
    if !stats.FromCache {
        t.Error("FromCache should be settable")
    }
}

This test sets a field and then checks if it was set. That's testing Go's struct assignment, not your code. If you want to verify FromCache behavior, you'd need to test whatever code path actually sets this flag based on cache state.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/graph_test.go` around lines 519 - 528, The test
TestComputeStats_FromCache is only assigning stats.FromCache directly instead of
exercising computeStats' cache-detection logic; update the test to set up a real
cache hit/miss and call computeStats to assert its behavior: create an
api.ShardIR with content that will be cached, use NewCache() and call
c.Build(ir) or pre-populate the cache entry so computeStats(ir, c) returns a
stats whose FromCache is true (and add a complementary case where cache is empty
and FromCache is false); reference the functions/values computeStats, NewCache,
c.Build, and stats.FromCache when implementing the assertions.

internal/shards/watcher_test.go-256-262 (1)

256-262: ⚠️ Potential issue | 🟡 Minor

Swallowed error in test setup could hide failures.

The runCmd helper ignores the error from cmd.CombinedOutput():

runCmd := func(args ...string) {
    cmd := exec.Command(args[0], args[1:]...)
    cmd.Dir = dir
    cmd.CombinedOutput() //nolint:errcheck
}

If git add or git commit fails, the test will proceed with incorrect state and might pass or fail for the wrong reasons. Consider checking the error like the helper runWatcherGit does at line 297-299.

Suggested fix

 runCmd := func(args ...string) {
+    t.Helper()
     cmd := exec.Command(args[0], args[1:]...)
     cmd.Dir = dir
-    cmd.CombinedOutput() //nolint:errcheck
+    if out, err := cmd.CombinedOutput(); err != nil {
+        t.Fatalf("%v: %v\n%s", args, err, out)
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/watcher_test.go` around lines 256 - 262, The test helper
runCmd currently swallows errors from cmd.CombinedOutput(), which can hide git
failures; update runCmd to behave like runWatcherGit by calling
cmd.CombinedOutput(), checking the returned error, and failing the test with
t.Fatalf (including the error and output) if it is non-nil; also mark the helper
with t.Helper() so failures point to the caller. Use the existing runWatcherGit
approach (lines around runWatcherGit and its CombinedOutput usage) as the
pattern for error handling.

internal/shards/render_test.go-817-822 (1)

817-822: ⚠️ Potential issue | 🟡 Minor

Placeholder test with no assertions.

This test has a comment explaining what it should do but doesn't actually test anything. Either implement it or remove it to avoid giving false confidence in test coverage.

func TestHook_InvalidJSONExitsCleanly(t *testing.T) {
	// Hook reads from stdin; we test via the exported function with invalid data.
	// The function must return nil (never break the agent) on bad input.
	// We can't easily inject stdin, but we test the underlying validation logic
	// directly by calling with a mock via the export test file.
}

If you can't easily test this now, consider adding t.Skip("TODO: implement stdin injection for Hook testing") so it's clear this is intentionally deferred.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/render_test.go` around lines 817 - 822, The test
TestHook_InvalidJSONExitsCleanly is a no-op placeholder; either implement it to
exercise the Hook logic by feeding invalid JSON into the exported function that
accepts an io.Reader (e.g., call Hook or the exported validation helper such as
ValidateHookInput with a bytes.NewBufferString(`invalid`) and assert it returns
nil/no panic and no unwanted side effects), or if stdin injection is not
feasible yet, replace the body with t.Skip("TODO: implement stdin injection for
Hook testing") so the intent is explicit; locate
TestHook_InvalidJSONExitsCleanly and modify it accordingly, referencing the Hook
function or the exported validation helper used elsewhere in tests.

internal/cache/cache_test.go-358-452 (1)

358-452: ⚠️ Potential issue | 🟡 Minor

These chmod-based failure tests are still a bit flaky.

Using 0555/0000 to force write and read failures is not reliable on every runner. Elevated users and some platform permission models can still pass the I/O calls, which makes these assertions intermittently false. I’d prefer deterministic failure setups here, or at least a shared skip helper for environments where permission semantics are unreliable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cache/cache_test.go` around lines 358 - 452, Replace flaky
chmod-based skips with a deterministic permissions-check helper: add a helper
(e.g., skipIfPermissionsUnreliable) that attempts to enforce and validate
permission behavior (create a temp dir/file, chmod to the intended restrictive
mode, then attempt an operation that should fail—write for 0555 tests, read for
0000 test); if the operation still succeeds, call t.Skip to skip the test.
Update TestPutJSON_WriteFileError, TestPut_WriteFileError and
TestHashFile_ReadError to call this helper at the start instead of directly
skipping on CI or relying on chmod alone, and keep the existing cleanup
(t.Cleanup) logic for restoring perms when the helper confirms the environment
enforces permissions.

internal/find/zip_test.go-120-172 (1)

120-172: ⚠️ Potential issue | 🟡 Minor

Guard test setup against missing git binary.

The initCleanFindGitRepo helper shells out to git directly via exec.Command without checking whether git is installed first. In minimal test environments (CI runners, containers, etc.), a missing git binary will crash the test, which is confusing—the test infrastructure fails even though your code isn't broken.

Add a quick exec.LookPath("git") check at the start of the function and skip the test if git isn't found. That way, the test suite stays portable and only runs when it can actually run:
if _, err := exec.LookPath("git"); err != nil {
    t.Skip("git not available")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/find/zip_test.go` around lines 120 - 172, The test helper
initCleanFindGitRepo currently shells out to git without verifying git is
installed; add a preflight check using exec.LookPath("git") at the start of
initCleanFindGitRepo and call t.Skip("git not available") when LookPath returns
an error so tests that require git are skipped instead of failing in
environments without the git binary.

benchmark/results/blog-post-draft.md-53-53 (1)

53-53: ⚠️ Potential issue | 🟡 Minor

Add language labels to fenced blocks to satisfy markdownlint.

Both code fences should declare a language (for example text) to fix MD040.
Suggested diff
-```
+```text
 Bash: run tests → see 8 errors
 ...
 Bash: run tests → all pass
- +text
Bash: run tests → see 8 errors
...
Bash: run tests → all pass
Also applies to: 72-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/results/blog-post-draft.md` at line 53, The fenced code blocks in
benchmark/results/blog-post-draft.md that show the test output (the blocks
containing the lines starting "Bash: run tests → see 8 errors" and "Bash: run
tests → all pass") are missing language labels and trigger MD040; update both
opening fences from ``` to ```text (or another appropriate language such as
```bash) so the blocks read ```text, ensuring both occurrences are changed to
satisfy markdownlint.

benchmark/results/blog-post-draft.md-49-70 (1)

49-70: ⚠️ Potential issue | 🟡 Minor

Narrative metrics conflict with the results table.

The table reports Naked = 20 turns / $0.30 and Crafted = 9 turns / $0.12, but section headings say 13 turns / $0.22 and 7 turns / $0.13. Please align these numbers so readers don’t get mixed signals.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark/results/blog-post-draft.md` around lines 49 - 70, The section
headings "### Without Supermodel (13 turns, $0.22)" and "### With Supermodel (7
turns, $0.13)" conflict with the results table entries "Naked = 20 turns /
$0.30" and "Crafted = 9 turns / $0.12"; pick the authoritative source (table or
narrative) and update the mismatched values so they match exactly—edit the
headings (the "Without Supermodel" and "With Supermodel" lines) or the table
rows ("Naked" and "Crafted") to make the turn counts and dollar amounts
consistent across both places. Ensure formatting/parentheses remain identical
when you change the numbers.

benchmark/compare.sh-82-93 (1)

82-93: ⚠️ Potential issue | 🟡 Minor

Cost delta parsing is inconsistent and mislabels equal-cost runs.

This block only reads costUSD (unlike extract_cost()), and delta == 0 gets reported as “more expensive”.

Proposed fix

-naked_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || echo "")
-sm_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || echo "")
+naked_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true)
+[[ -z "$naked_cost" ]] && naked_cost=$(grep -oP '"cost_usd"\s*:\s*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true)
+[[ -z "$naked_cost" ]] && naked_cost=$(grep -oP 'Total cost[^0-9]*\K[0-9.]+' "$NAKED_LOG" 2>/dev/null | tail -1 || true)
+
+sm_cost=$(grep -oP '"costUSD"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true)
+[[ -z "$sm_cost" ]] && sm_cost=$(grep -oP '"cost_usd"\s*:\s*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true)
+[[ -z "$sm_cost" ]] && sm_cost=$(grep -oP 'Total cost[^0-9]*\K[0-9.]+' "$SUPERMODEL_LOG" 2>/dev/null | tail -1 || true)

 if [[ -n "$naked_cost" && -n "$sm_cost" ]]; then
   python3 - <<PYEOF
 naked = float("$naked_cost")
 sm    = float("$sm_cost")
 delta = naked - sm
 pct   = (delta / naked * 100) if naked > 0 else 0
-sign  = "cheaper" if delta > 0 else "more expensive"
-print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) {sign} than naked")
+if delta > 0:
+    print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) cheaper than naked")
+elif delta < 0:
+    print(f"supermodel was \${abs(delta):.4f} ({abs(pct):.1f}%) more expensive than naked")
+else:
+    print("supermodel had the same cost as naked")
 PYEOF
 fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark/compare.sh` around lines 82 - 93, The comparison block reads
costUSD with a different regex than extract_cost() and treats delta == 0 as
"more expensive"; change it to reuse the same extraction logic as extract_cost()
(or copy its exact grep/regex) to populate naked_cost and sm_cost, compute delta
= naked - sm, set sign to "same cost" when delta == 0, "cheaper" when delta > 0,
and "more expensive" when delta < 0, and ensure pct calculation uses naked only
when naked > 0 to avoid division by zero; update the print message to reflect
the "same cost" case.

🧹 Nitpick comments (19)

internal/archdocs/pssg/taxonomy/taxonomy_test.go (2)

305-336: Extract the repeated entity-builder helper

The same e := func(n int) []*entity.Entity { ... } helper is repeated in multiple tests. Pulling it into one file-level helper keeps this test file easier to maintain.

Refactor sketch

+func makeEntities(n int) []*entity.Entity {
+	s := make([]*entity.Entity, n)
+	for i := range s {
+		s[i] = &entity.Entity{}
+	}
+	return s
+}
+
 func TestComputePagination_SinglePage(t *testing.T) {
-	e := func(n int) []*entity.Entity {
-		s := make([]*entity.Entity, n)
-		for i := range s {
-			s[i] = &entity.Entity{}
-		}
-		return s
-	}
-	entry := Entry{Slug: "italian", Entities: e(5)}
+	entry := Entry{Slug: "italian", Entities: makeEntities(5)}
@@
 func TestComputePagination_MultiPage(t *testing.T) {
-	e := func(n int) []*entity.Entity {
-		s := make([]*entity.Entity, n)
-		for i := range s {
-			s[i] = &entity.Entity{}
-		}
-		return s
-	}
-	entry := Entry{Slug: "italian", Entities: e(10)}
+	entry := Entry{Slug: "italian", Entities: makeEntities(10)}
@@
 func TestTopEntries(t *testing.T) {
-	e := func(n int) []*entity.Entity {
-		s := make([]*entity.Entity, n)
-		for i := range s {
-			s[i] = &entity.Entity{}
-		}
-		return s
-	}
 	entries := []Entry{
-		{Name: "A", Slug: "a", Entities: e(3)},
-		{Name: "B", Slug: "b", Entities: e(10)},
-		{Name: "C", Slug: "c", Entities: e(1)},
+		{Name: "A", Slug: "a", Entities: makeEntities(3)},
+		{Name: "B", Slug: "b", Entities: makeEntities(10)},
+		{Name: "C", Slug: "c", Entities: makeEntities(1)},
 	}

Also applies to: 408-414

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 305 - 336,
Extract the repeated inline helper e := func(n int) []*entity.Entity { ... }
used in TestComputePagination_SinglePage and TestComputePagination_MultiPage
(and other tests around lines 408-414) into a single file-level helper function
(e.g., makeEntities or buildEntities) declared outside the test functions;
replace the inline e closures in each test with calls to that new helper,
keeping the same signature and returning []*entity.Entity so ComputePagination
and the tests keep working unchanged.

364-366: Drop hard-coded source line references in comments

Comments like “covers L73-76” will get stale as soon as the implementation file moves. Better to describe behavior only.

Also applies to: 380-382, 396-398, 433-434

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go` around lines 364 - 366,
Remove the hard-coded source line references from the test comments (e.g., the
comment above TestBuildAll_InvertMode that says “covers L73-76”) and replace
them with a behavior-focused description; for example, for
TestBuildAll_InvertMode and the other affected comment blocks (near the comments
at the ranges you flagged) state that the test "verifies that when tc.Invert is
true the entity loop hits continue and no entries are added" or similar
behavior-only wording, ensuring you reference the test name
(TestBuildAll_InvertMode) and the condition (tc.Invert true) rather than file
line numbers.

internal/compact/handler_test.go (1)

725-741: Test name and comment are a bit misleading.

The comment says this tests that _ (blank identifier) is skipped if produced, but nextShortName only ever generates names from a-z (and then aa, ab, etc.). So _ would never be generated in the first place.

What this test actually does is verify that when 25 out of 26 single-char slots are occupied, the function correctly returns the one remaining slot (z). That's the same behavior as TestNextShortName_SkipsExisting—just at a larger scale.

Consider renaming to something like TestNextShortName_FindsLastFreeSlot or updating the comment to match what's really being tested. Not a functional bug, just a clarity thing for future readers.
✏️ Suggested comment fix
 func TestNextShortName_SkipsBuiltins(t *testing.T) {
-	// '_' is a Go builtin; if the counter somehow produces it, it must be skipped.
-	// We test indirectly by filling every single-char slot except 'z' with existing
-	// names, then verifying we get 'z' (the only remaining free single-char slot).
+	// Verify that when all single-char slots except 'z' are taken, nextShortName
+	// correctly skips occupied names and returns the only free slot.
 	existing := map[string]bool{}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/compact/handler_test.go` around lines 725 - 741, The test name and
comment are misleading: TestNextShortName_SkipsBuiltins and its comment claim to
validate skipping the blank identifier `_`, but nextShortName only generates
`a`–`z` and the test actually verifies that when 25 of 26 single-letter names
are taken the remaining letter (`z`) is returned; update the test to reflect
intent by either renaming the test to TestNextShortName_FindsLastFreeSlot (or
similar) or by editing the comment to describe that it fills every single-char
slot except one and asserts the last free single-char slot is returned;
reference the existing test name TestNextShortName_SkipsBuiltins and the
function nextShortName when making the change.

internal/archdocs/pssg/enrichment/cache_test.go (1)

136-143: Tighten the mixed-validity assertion.

Nice coverage here; small add: assert that the returned key is "good" so this test catches wrong-key regressions too.

Suggested patch

 	if len(result) != 1 {
 		t.Errorf("expected 1 valid entry, got %d", len(result))
 	}
+	if _, ok := result["good"]; !ok {
+		t.Errorf("expected key %q in result, got %v", "good", result)
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/enrichment/cache_test.go` around lines 136 - 143,
Tighten the assertion after calling ReadAllCaches: in addition to checking
len(result) == 1, assert that the single returned key is "good" by inspecting
result (the map/slice returned by ReadAllCaches) and failing the test if the key
differs; update the test after the existing len check to compare the returned
key name (from result) to "good" and call t.Fatalf/t.Errorf with a clear message
on mismatch so wrong-key regressions are caught.

internal/archdocs/pssg/loader/loader_test.go (1)

388-412: faq branch is configured but not asserted in TestParseSections_AllTypes.

Line 396 sets up Type: "faq", but no assertion checks sections["faqs"]. That leaves a mapping gap in this end-to-end section test.

Suggested assertion add-on

 import (
 	"os"
 	"path/filepath"
 	"testing"
 
 	"github.com/supermodeltools/cli/internal/archdocs/pssg/config"
+	"github.com/supermodeltools/cli/internal/archdocs/pssg/entity"
 )
@@
 	if items, ok := sections["steps"].([]string); !ok || len(items) != 2 {
 		t.Errorf("steps: expected []string with 2 items, got %v", sections["steps"])
 	}
+	if faqs, ok := sections["faqs"].([]entity.FAQ); !ok || len(faqs) != 1 || faqs[0].Question != "What temp?" {
+		t.Errorf("faqs: expected []entity.FAQ with 1 item, got %v", sections["faqs"])
+	}
 	if notes, ok := sections["notes"].(string); !ok || notes == "" {
 		t.Errorf("notes: expected non-empty string, got %v", sections["notes"])
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/loader/loader_test.go` around lines 388 - 412,
TestParseSections_AllTypes sets up a BodySection with Type "faq" but never
asserts the parsed result; update the test (TestParseSections_AllTypes) to
verify sections["faqs"] returned by l.parseSections contains the expected FAQ
data: assert the key "faqs" exists in the sections map, assert its type (e.g.,
map[string]string or whatever parseSections returns for FAQs), and assert it
contains the question "What temp?" with the answer "350°F" (or at minimum that
the faqs entry is non-nil/non-empty) so the FAQ branch is covered.

internal/shards/graph_test.go (1)

457-477: Cycle test logic is a bit odd but functional.

The test spawns a goroutine and then does:

select {
case <-done:
default:
    // immediate completion is fine
    <-done
}

This is essentially just <-done with extra steps. The default case immediately falls through to <-done anyway. It works, but could be simplified to just:

done := make(chan struct{})
go func() {
    c.TransitiveDependents("a.go")
    close(done)
}()
select {
case <-done:
    // completed without infinite loop
case <-time.After(time.Second):
    t.Fatal("TransitiveDependents did not terminate (possible infinite loop)")
}

The current version will block forever if there's an infinite loop, whereas adding a timeout makes the failure more obvious.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/graph_test.go` around lines 457 - 477,
TestTransitiveDependents_Cycle currently uses a convoluted select that
effectively just blocks on <-done and will hang silently if TransitiveDependents
enters a loop; change the test to wait for the done channel with a timeout
instead: after launching the goroutine that calls c.TransitiveDependents("a.go")
and closes done, replace the existing select with one that either receives from
done (success) or from time.After(1 * time.Second) and fails the test (t.Fatal)
on timeout to detect infinite loops reliably.

internal/shards/zip_test.go (3)

340-349: Some openZipEntries errors are ignored.

In several places you do:

r, _ := openZipEntries(t, path)

While it's unlikely to fail after CreateZipFile succeeds, silently ignoring the error could mask issues. Consider using:

r, err := openZipEntries(t, path)
if err != nil {
    t.Fatal(err)
}

Or at minimum, the helper could call t.Fatal internally on error since it already takes *testing.T.

Also applies to: 370-376, 394-401

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/zip_test.go` around lines 340 - 349, Tests currently ignore
errors from openZipEntries (e.g., uses r, _ := openZipEntries(t, path)); update
each call (including occurrences around the other noted blocks) to capture the
error and fail the test on error (r, err := openZipEntries(t, path); if err !=
nil { t.Fatal(err) }), or alternatively modify the openZipEntries helper (which
already receives *testing.T) to call t.Fatal on internal errors so callers can
continue using r := openZipEntries(t, path) without dropping errors; reference
the openZipEntries helper and all its test-call sites to apply one consistent
approach.

681-692: Test comment doesn't match the assertion.

The comment says "same count (1 each), should sort a before b alphabetically", but then the assertion expects go before ts:

// b.go, a.ts — same count (1 each), should sort a before b alphabetically
files := []string{"b.go", "a.ts"}
// ...
if stats[0].Ext != "go" || stats[1].Ext != "ts" {
    // alphabetically "go" < "ts", so go comes first

The comment mentions sorting files "a before b" but you're actually sorting by extension name (go vs ts), not by filename. The inline comment clarifies this, but the test function's leading comment is a bit misleading. Minor nit, but could confuse someone reading just the comment.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/zip_test.go` around lines 681 - 692, The test
TestLanguageStats_TiesSortedAlphabetically is misleading: it says "should sort a
before b alphabetically" but LanguageStats sorts by file extension (e.g., "go"
vs "ts"); update the test comment to clearly state that ties are broken by
extension name (e.g., "b.go, a.ts — same count (1 each), must sort by extension
alphabetically so 'go' comes before 'ts'") or alternatively change the sample
filenames to match the comment intent; keep assertions as-is and reference
LanguageStats and stats in the updated comment.

713-725: Helper could handle errors internally.

Since openZipEntries already takes *testing.T, it could call t.Fatal on error instead of returning it. This would simplify all the call sites and ensure errors are never silently ignored.

func openZipEntries(t *testing.T, path string) map[string]bool {
    t.Helper()
    r, err := zip.OpenReader(path)
    if err != nil {
        t.Fatalf("failed to open zip: %v", err)
    }
    defer r.Close()
    m := make(map[string]bool, len(r.File))
    for _, f := range r.File {
        m[f.Name] = true
    }
    return m
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/shards/zip_test.go` around lines 713 - 725, The helper
openZipEntries currently returns (map[string]bool, error) but takes *testing.T;
change it to handle errors internally by removing the error return, calling
t.Fatalf when zip.OpenReader(path) fails, and returning just map[string]bool;
update its signature to func openZipEntries(t *testing.T, path string)
map[string]bool, keep t.Helper(), still defer r.Close(), and update all call
sites to drop error handling since openZipEntries will now fail the test itself.

internal/shards/watcher_test.go (1)

99-110: Time-based test can be flaky in CI.

Using time.Sleep(20 * time.Millisecond) to verify the ticker fires is a common pattern but can be flaky on slow CI runners or under load. The test will likely pass most of the time, but just be aware if you see intermittent failures here.

One alternative is to use a channel signal from within poll() or to mock the ticker, but that's a bigger refactor. For now this is probably fine.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/watcher_test.go` around lines 99 - 110,
TestWatcher_RunPollsOnTick uses time.Sleep which is flaky; modify the test to
wait on a deterministic signal from the Watcher instead of sleeping: add a
lightweight test hook (e.g., a channel or callback field) to NewWatcher/Watcher
that poll() signals when it runs (e.g., OnPoll chan struct{} or OnPoll func()),
update poll() to send/close/invoke that hook, and then in
TestWatcher_RunPollsOnTick wait on that channel/callback with a timeout (context
or select) to assert poll was called; keep existing cancellation logic for Run.

internal/archdocs/pssg/affiliate/affiliate_test.go (1)

27-36: Strengthen special-character assertion in GenerateLink test.

Right now this passes on any non-empty URL. Consider asserting encoded content (e.g., %26 for &) so the test actually protects encoding behavior.

Suggested tweak

 func TestGenerateLink_SpecialChars(t *testing.T) {
@@
 	url := p.GenerateLink("bread & butter")
-	if url == "" {
-		t.Error("expected non-empty URL")
+	if !strings.Contains(url, "bread+%26+butter") {
+		t.Errorf("expected encoded term in URL, got: %s", url)
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/affiliate/affiliate_test.go` around lines 27 - 36,
Update TestGenerateLink_SpecialChars to assert that Provider.GenerateLink
returns an encoded term rather than merely a non-empty string: call
p.GenerateLink("bread & butter") and assert the resulting URL contains the
expected percent-encoded segment (e.g., "%26" or the full
"bread%20%26%20butter") so the test verifies proper URL encoding behavior in
Provider.GenerateLink.

internal/archdocs/pssg/output/rss_test.go (1)

22-24: Make the category assertion more precise.

The substring check works, but asserting the XML element (e.g., <category>Italian</category>) makes this test less prone to false positives.

Suggested tweak

-	if !strings.Contains(out, "Italian") {
+	if !strings.Contains(out, "<category>Italian</category>") {
 		t.Errorf("expected category in output:\n%s", out)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/output/rss_test.go` around lines 22 - 24, The test
currently uses strings.Contains(out, "Italian") which is too broad; update the
assertion in rss_test.go to check for the exact XML category element (e.g.,
assert that out contains "<category>Italian</category>") or parse the generated
RSS and assert the category element value. Locate the assertion using the out
variable in the test (the if !strings.Contains(...) block) and replace it with a
precise check for the full element string or use an XML parser to verify the
<category> node equals "Italian".

internal/archdocs/pssg/schema/jsonld_test.go (1)

113-127: Type assertion without safety check could cause confusing test failures.

If the implementation of GenerateBreadcrumbSchema ever changes and s["itemListElement"] isn't a []map[string]interface{}, this test will panic with a cryptic message instead of a clear test failure.

Think of it like this: if you're expecting a pizza and someone hands you a burrito, you want to say "hey, this isn't pizza!" rather than just choking on it.
💡 Safer type assertion pattern
-	list := s["itemListElement"].([]map[string]interface{})
+	list, ok := s["itemListElement"].([]map[string]interface{})
+	if !ok {
+		t.Fatalf("itemListElement: want []map[string]interface{}, got %T", s["itemListElement"])
+	}
This same pattern appears in a few other places (lines 141, 172, 405) — consider applying it there too for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/schema/jsonld_test.go` around lines 113 - 127, The
test directly type-asserts s["itemListElement"] to []map[string]interface{}
which will panic on mismatch; change the assertion in the Test (around the use
of s and "itemListElement") to a safe comma-ok form (e.g., v, ok :=
s["itemListElement"].([]map[string]interface{})) and call t.Fatalf with a clear
message if ok is false so the failure is explicit; apply the same safe-assertion
pattern to the other similar tests that access "itemListElement" (the other
occurrences in this file) to avoid panics and produce readable test errors.

internal/blastradius/zip_test.go (1)

121-293: Worth pulling the shared ZIP/git harness into a test helper.

This file now repeats the same repo bootstrap, zip-reader helper, and most walk/create cases already added in internal/analyze/zip_test.go, internal/find/zip_test.go, and internal/deadcode/zip_test.go. A shared test helper would make future ZIP behavior changes a one-file update instead of a four-file update.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/blastradius/zip_test.go` around lines 121 - 293, Multiple test files
duplicate the same ZIP/git harness (functions like initCleanBlastGitRepo and
readBlastZipEntries and repeated walk/create test cases); extract those shared
helpers into a single test helper file and update each test file to call them.
Create a shared test helper (e.g., package-level test helper file) that defines
reusable functions initCleanBlastGitRepo, readBlastZipEntries and any common
setup/teardown utilities, make them exported or placed in the same test package
so other tests can call them, remove the duplicated implementations from
internal/blastradius/zip_test.go and the other zip_test.go files, and update
tests that reference walkZip, createZip, gitArchive, and isWorktreeClean to use
the centralized helpers. Ensure imports and test package visibility are adjusted
so all test files compile against the single helper.

internal/mcp/server_test.go (1)

682-829: Make these cache-hit tests hermetic.

Right now they fingerprint whatever os.Getwd() points at, so the setup depends on how the test runner launches the package and on the contents of the real checkout. Building a tiny temp repo/dir here, like the ZIP suites do, would make the cache key and server.dir much more predictable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/mcp/server_test.go` around lines 682 - 829, Tests use repoDir which
returns os.Getwd() making cache keys non-hermetic; change the tests to create an
isolated temp repo directory (use t.TempDir()), populate it with a minimal
deterministic file tree (small go files and a README), initialize a git repo
there (git init, git add, git commit) so cache.RepoFingerprint(dir) yields a
stable fingerprint, and pass that dir into the server (server{..., dir: dir})
and into cache.AnalysisKey; update or replace repoDir(t *testing.T) to
construct/return this temp repo or inline the temp-repo setup at the start of
TestToolDeadCode_CacheHit, TestToolDeadCode_CacheHitWithArgs,
TestToolBlastRadius_CacheHit, and TestToolBlastRadius_CacheHitWithTarget so each
test uses the hermetic repo.

benchmark/Dockerfile.supermodel (1)

18-21: Use --no-install-recommends in runtime apt installs too.

Same as the naked image: this trims unnecessary packages and keeps the benchmark runtime image leaner.

Suggested diff

-RUN apt-get update && apt-get install -y curl ca-certificates git && \
+RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates git && \
     curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
-    apt-get install -y nodejs && \
+    apt-get install -y --no-install-recommends nodejs && \
     rm -rf /var/lib/apt/lists/*

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark/Dockerfile.supermodel` around lines 18 - 21, Update the Dockerfile
RUN command to add --no-install-recommends to the apt installs so runtime image
stays lean: modify the apt-get install invocations in the RUN line that
currently installs curl ca-certificates git and nodejs (the multi-step RUN
block) to use apt-get install -y --no-install-recommends for both installs while
keeping the existing curl | bash node setup and rm -rf /var/lib/apt/lists/*
cleanup.

benchmark/Dockerfile.naked (1)

5-8: Reduce package footprint in apt installs.

Use --no-install-recommends on both apt-get install calls to keep the benchmark image smaller and reduce unnecessary packages.

Suggested diff

-RUN apt-get update && apt-get install -y curl ca-certificates git && \
+RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates git && \
     curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
-    apt-get install -y nodejs && \
+    apt-get install -y --no-install-recommends nodejs && \
     rm -rf /var/lib/apt/lists/*

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark/Dockerfile.naked` around lines 5 - 8, The RUN instruction that
executes apt-get installs (the line starting with "RUN apt-get update && apt-get
install -y curl ca-certificates git && \") should be changed to use
--no-install-recommends on both apt-get install invocations (the initial
"apt-get install -y curl ca-certificates git" and the subsequent "apt-get
install -y nodejs") so that unnecessary recommended packages aren't pulled into
the benchmark image; update those two install commands to include the
--no-install-recommends flag while preserving the existing chaining, curl
bootstrap for Node.js, and the cleanup step.

benchmark/run.sh (1)

28-39: Keep full Docker build logs for debugging.

Piping directly to tail -3 hides useful failure context. Consider teeing full logs to files, then tailing for console brevity.

Suggested diff

 docker build \
   -f "$SCRIPT_DIR/Dockerfile.naked" \
   -t bench-naked \
   "$SCRIPT_DIR" \
-  2>&1 | tail -3
+  2>&1 | tee "$RESULTS_DIR/build-naked.log" | tail -3
@@
 docker build \
   -f "$SCRIPT_DIR/Dockerfile.supermodel" \
   -t bench-supermodel \
   "$REPO_ROOT" \
-  2>&1 | tail -3
+  2>&1 | tee "$RESULTS_DIR/build-supermodel.log" | tail -3

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark/run.sh` around lines 28 - 39, The current docker build invocations
for bench-naked and bench-supermodel pipe output to "tail -3", which discards
full build logs; change the two docker build steps (the commands that use
"$SCRIPT_DIR/Dockerfile.naked" -> -t bench-naked and
"$SCRIPT_DIR/Dockerfile.supermodel" -> -t bench-supermodel with "$REPO_ROOT") to
write full logs to persistent files (e.g., via tee to separate log files per
build) and still print a short summary to the console by tailing the saved log
files; ensure the build failure paths use those saved log files for debugging
instead of losing output.

benchmark/results/naked.txt (1)

11-44: Consider committing a sanitized benchmark log instead of full raw transcript.

This includes full session/tool payloads and identifiers that aren’t required for the comparison script’s core metrics. Keeping only test summary + cost/token fields would make diffs smaller and reduce data retention risk.

Also applies to: 91-91
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/results/naked.txt` around lines 11 - 44, The benchmark log
currently contains full raw session/tool payloads (session_id, tool traces,
model fields, etc.); replace the file content with a sanitized summary that
preserves only the test summary lines (e.g., "Ran X tests", pass/fail counts)
and the aggregated cost/token metrics, removing or redacting session_id, tool
payloads, and any identifiers such as "session_id", "model", "tools", and raw
tool inputs/outputs; update benchmark/results/naked.txt so it contains only the
concise test summary plus explicit cost/token fields (and a minimal timestamp if
needed) to keep diffs small and avoid leaking identifiers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e252eb40-6c77-45a2-999a-4986183f525d

📥 Commits

Reviewing files that changed from the base of the PR and between ecb068e and 0c1589f.

⛔ Files ignored due to path filters (1)

benchmark/results/benchmark_results.zip is excluded by !**/*.zip

📒 Files selected for processing (78)

.gitignore
benchmark/CLAUDE.md
benchmark/CLAUDE.skill.md
benchmark/Dockerfile.naked
benchmark/Dockerfile.supermodel
benchmark/change_tracking/__init__.py
benchmark/change_tracking/models.py
benchmark/change_tracking/tests.py
benchmark/compare.sh
benchmark/django_app/manage.py
benchmark/django_app/mysite/__init__.py
benchmark/django_app/mysite/settings.py
benchmark/django_app/mysite/urls.py
benchmark/django_app/mysite/wsgi.py
benchmark/django_app/requirements.txt
benchmark/django_app/todos/__init__.py
benchmark/django_app/todos/admin.py
benchmark/django_app/todos/apps.py
benchmark/django_app/todos/models.py
benchmark/django_app/todos/tests.py
benchmark/django_app/todos/urls.py
benchmark/django_app/todos/views.py
benchmark/entrypoint.naked.sh
benchmark/entrypoint.supermodel.sh
benchmark/results/blog-post-draft.md
benchmark/results/naked.txt
benchmark/results/skill-v2.txt
benchmark/results/summary.md
benchmark/results/supermodel.txt
benchmark/run.sh
benchmark/task.md
cmd/restore.go
cmd/skill.go
internal/analyze/zip_test.go
internal/api/client.go
internal/api/types_test.go
internal/archdocs/graph2md/graph2md_test.go
internal/archdocs/handler_test.go
internal/archdocs/pssg/affiliate/affiliate_test.go
internal/archdocs/pssg/build/build_test.go
internal/archdocs/pssg/config/config_test.go
internal/archdocs/pssg/enrichment/cache_test.go
internal/archdocs/pssg/entity/entity_test.go
internal/archdocs/pssg/loader/loader_test.go
internal/archdocs/pssg/output/llmstxt_test.go
internal/archdocs/pssg/output/output_test.go
internal/archdocs/pssg/output/rss_test.go
internal/archdocs/pssg/render/funcs_test.go
internal/archdocs/pssg/render/render_test.go
internal/archdocs/pssg/render/shareimage_test.go
internal/archdocs/pssg/schema/jsonld_test.go
internal/archdocs/pssg/taxonomy/taxonomy_test.go
internal/archdocs/zip_test.go
internal/audit/audit_test.go
internal/auth/handler_test.go
internal/blastradius/zip_test.go
internal/cache/cache_test.go
internal/compact/handler_test.go
internal/config/config_test.go
internal/deadcode/zip_test.go
internal/find/handler.go
internal/find/handler_test.go
internal/find/zip_test.go
internal/focus/handler.go
internal/focus/handler_test.go
internal/graph/handler_test.go
internal/mcp/server_test.go
internal/mcp/zip_test.go
internal/restore/render.go
internal/restore/restore_test.go
internal/setup/wizard.go
internal/setup/wizard_test.go
internal/shards/daemon_test.go
internal/shards/graph_test.go
internal/shards/render_test.go
internal/shards/watcher_test.go
internal/shards/zip_test.go
internal/status/handler_test.go

✅ Files skipped from review due to trivial changes (7)

benchmark/change_tracking/models.py
.gitignore
benchmark/task.md
benchmark/CLAUDE.md
benchmark/CLAUDE.skill.md
benchmark/results/summary.md
internal/audit/audit_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

cmd/skill.go

coderabbitai · 2026-04-13T21:24:49Z

benchmark/django_app/todos/models.py

+class Todo(models.Model):
+    title = models.CharField(max_length=200)
+    completed = models.BooleanField(default=False)
+    created_at = models.DateTimeField(auto_now_add=True)
+


⚠️ Potential issue | 🔴 Critical

priority field required by tests is missing.

Line [4]-Line [8] define Todo without priority, but benchmark/django_app/todos/tests.py (Lines [72-91]) expects low|medium|high with default medium. This will break the priority feature path.

Suggested fix

class Todo(models.Model): + PRIORITY_LOW = 'low' + PRIORITY_MEDIUM = 'medium' + PRIORITY_HIGH = 'high' + PRIORITY_CHOICES = [ + (PRIORITY_LOW, 'Low'), + (PRIORITY_MEDIUM, 'Medium'), + (PRIORITY_HIGH, 'High'), + ] + title = models.CharField(max_length=200) completed = models.BooleanField(default=False) + priority = models.CharField( + max_length=10, + choices=PRIORITY_CHOICES, + default=PRIORITY_MEDIUM, + ) created_at = models.DateTimeField(auto_now_add=True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/django_app/todos/models.py` around lines 4 - 8, The Todo model is missing the required priority field that tests expect; add a CharField named priority to the Todo class with choices set to ('low','medium','high') (use a PRIORITY_CHOICES tuple) and default 'medium' (e.g., priority = models.CharField(max_length=6, choices=PRIORITY_CHOICES, default='medium')), then create and apply a Django migration so the DB schema matches the model and tests pass.

coderabbitai · 2026-04-13T21:24:49Z

benchmark/django_app/todos/views.py

+def todo_to_dict(todo):
+    return {
+        'id': todo.id,
+        'title': todo.title,
+        'completed': todo.completed,
+        'created_at': todo.created_at.isoformat(),
+    }


⚠️ Potential issue | 🟠 Major

Include priority in serialized responses.

todo_to_dict() omits priority, so create/list/detail responses miss a required field expected by benchmark/django_app/todos/tests.py (Lines 93-143).

Proposed fix

def todo_to_dict(todo): return { 'id': todo.id, 'title': todo.title, 'completed': todo.completed, 'created_at': todo.created_at.isoformat(), + 'priority': todo.priority, }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/django_app/todos/views.py` around lines 9 - 15, The serialized output from todo_to_dict is missing the required "priority" field; update the todo_to_dict function to include a 'priority' key (e.g., 'priority': todo.priority) in the returned dict so create/list/detail responses include that field (handle nullable values as appropriate, e.g., pass through None or cast to the expected type) to satisfy the tests referenced in benchmark/django_app/todos/tests.py.

coderabbitai · 2026-04-13T21:24:49Z

benchmark/django_app/todos/views.py

+    def get(self, request):
+        todos = Todo.objects.all()
+        return JsonResponse({'todos': [todo_to_dict(t) for t in todos]})


⚠️ Potential issue | 🟠 Major

Add ?priority= filtering in list endpoint.

The list view currently ignores the priority query param, so GET /api/todos/?priority=high won’t behave as the tests require.

Proposed fix

class TodoListView(View): def get(self, request): todos = Todo.objects.all() + priority = request.GET.get('priority') + if priority: + todos = todos.filter(priority=priority) return JsonResponse({'todos': [todo_to_dict(t) for t in todos]})

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get(self, request):

todos = Todo.objects.all()

return JsonResponse({'todos': [todo_to_dict(t) for t in todos]})

def get(self, request):

todos = Todo.objects.all()

priority = request.GET.get('priority')

if priority:

todos = todos.filter(priority=priority)

return JsonResponse({'todos': [todo_to_dict(t) for t in todos]})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/django_app/todos/views.py` around lines 20 - 22, The GET list handler in TodoListView (the get method) ignores the priority query param; update the get(self, request) implementation to read request.GET.get('priority') and, when present, filter the queryset (Todo.objects) by that priority before serializing (e.g., Todo.objects.filter(priority=priority_value)); ensure you still return the JsonResponse with the list using todo_to_dict(t) for the filtered queryset.

coderabbitai · 2026-04-13T21:24:49Z

benchmark/django_app/todos/views.py

+    def post(self, request):
+        body = json.loads(request.body)
+        todo = Todo.objects.create(title=body['title'])
+        return JsonResponse(todo_to_dict(todo), status=201)


⚠️ Potential issue | 🟠 Major

POST should accept and persist priority.

Create currently drops priority from request JSON, so test_create_todo_with_priority will fail.

Proposed fix

def post(self, request): body = json.loads(request.body) - todo = Todo.objects.create(title=body['title']) + todo = Todo.objects.create( + title=body['title'], + priority=body.get('priority', 'medium'), + ) return JsonResponse(todo_to_dict(todo), status=201)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def post(self, request):

body = json.loads(request.body)

todo = Todo.objects.create(title=body['title'])

return JsonResponse(todo_to_dict(todo), status=201)

def post(self, request):

body = json.loads(request.body)

todo = Todo.objects.create(

title=body['title'],

priority=body.get('priority', 'medium'),

)

return JsonResponse(todo_to_dict(todo), status=201)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/django_app/todos/views.py` around lines 24 - 27, The POST handler in the TodoView.post currently ignores the incoming priority field and only creates the model with title, so update the Todo.objects.create call in post(self, request) to include the priority from the request JSON (e.g., use body.get('priority') or body['priority'] as appropriate) so the Todo instance persists the priority; ensure any validation/defaulting logic you need matches the Todo model and that todo_to_dict(todo) will reflect the saved priority.

coderabbitai · 2026-04-13T21:24:49Z

benchmark/django_app/todos/views.py

+    def patch(self, request, pk):
+        try:
+            todo = Todo.objects.get(pk=pk)
+        except Todo.DoesNotExist:
+            return JsonResponse({'error': 'Not found'}, status=404)
+        body = json.loads(request.body)
+        if 'completed' in body:
+            todo.completed = body['completed']
+        todo.save()


⚠️ Potential issue | 🟠 Major

PATCH should support priority updates too.

Detail PATCH only updates completed; it should also handle priority to keep API behavior consistent with the priority feature scope.

Proposed fix

def patch(self, request, pk): try: todo = Todo.objects.get(pk=pk) except Todo.DoesNotExist: return JsonResponse({'error': 'Not found'}, status=404) body = json.loads(request.body) if 'completed' in body: todo.completed = body['completed'] + if 'priority' in body: + todo.priority = body['priority'] todo.save() return JsonResponse(todo_to_dict(todo))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/django_app/todos/views.py` around lines 39 - 47, The PATCH handler in the Todo view currently only updates completed; modify the patch(self, request, pk) method to also check for 'priority' in the parsed body and assign it to the Todo instance (e.g., if 'priority' in body: todo.priority = body['priority']) before calling todo.save(), ensuring the Todo model's priority field is updated alongside completed.

coderabbitai · 2026-04-13T21:24:50Z

internal/auth/handler_test.go

+func TestLoginWithToken_SaveError(t *testing.T) {
+	if os.Getenv("CI") != "" {
+		t.Skip("skipping chmod-based test in CI")
+	}
+	tmp := t.TempDir()
+	t.Setenv("HOME", tmp)
+	t.Setenv("SUPERMODEL_API_KEY", "")
+	cfgDir := filepath.Join(tmp, ".supermodel")
+	if err := os.MkdirAll(cfgDir, 0o700); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.Chmod(cfgDir, 0o555); err != nil {
+		t.Fatal(err)
+	}
+	t.Cleanup(func() { os.Chmod(cfgDir, 0o755) }) //nolint:errcheck
+	if err := LoginWithToken("smsk_live_test"); err == nil {
+		t.Error("expected error when cfg.Save fails")
+	}
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'func Test(LoginWithToken_SaveError|Logout_SaveError)|os.Chmod|runtime.GOOS' internal/auth/handler_test.go

Repository: supermodeltools/cli

Length of output: 406

🏁 Script executed:

#!/bin/bash # Check imports at the top of the file head -20 internal/auth/handler_test.go # Get the exact skip conditions sed -n '219,225p' internal/auth/handler_test.go sed -n '255,261p' internal/auth/handler_test.go

Repository: supermodeltools/cli

Length of output: 784

Skip chmod-dependent tests on Windows.

These two tests (TestLoginWithToken_SaveError and TestLogout_SaveError) use Unix file permissions with os.Chmod to simulate save errors. The tests only skip in CI environments, but they'll be flaky on Windows where permissions work differently.

Add "runtime" to imports and update both skip conditions (lines 220 and 256) to also skip on Windows:

if os.Getenv("CI") != "" || runtime.GOOS == "windows" { t.Skip("skipping chmod-based test in CI") }

This way, the tests won't cause intermittent failures on Windows machines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/auth/handler_test.go` around lines 219 - 237, Update the chmod-dependent tests TestLoginWithToken_SaveError and TestLogout_SaveError so they also skip on Windows: add runtime to the imports and change the skip condition from only checking CI (os.Getenv("CI") != "") to also skip when runtime.GOOS == "windows" (e.g., if os.Getenv("CI") != "" || runtime.GOOS == "windows" { t.Skip(...) }), ensuring both tests use the same updated guard.

coderabbitai · 2026-04-13T21:24:50Z

internal/config/config_test.go

+func TestSave_WriteFileError(t *testing.T) {
+	if os.Getenv("CI") != "" {
+		t.Skip("skipping chmod-based test in CI")
+	}
+	home := t.TempDir()
+	t.Setenv("HOME", home)
+	cfgDir := filepath.Join(home, ".supermodel")
+	if err := os.MkdirAll(cfgDir, 0700); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.Chmod(cfgDir, 0555); err != nil {
+		t.Fatal(err)
+	}
+	t.Cleanup(func() { os.Chmod(cfgDir, 0755) }) //nolint:errcheck
+	cfg := &Config{APIKey: "test"}
+	if err := cfg.Save(); err == nil {
+		t.Error("Save should fail when config file cannot be written")
+	}
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'func TestSave_WriteFileError|os.Chmod|runtime.GOOS' internal/config/config_test.go

Repository: supermodeltools/cli

Length of output: 301

🏁 Script executed:

sed -n '30,50p' internal/config/config_test.go

Repository: supermodeltools/cli

Length of output: 562

🏁 Script executed:

sed -n '65,80p' internal/config/config_test.go

Repository: supermodeltools/cli

Length of output: 463

🏁 Script executed:

head -20 internal/config/config_test.go | grep -E 'import|runtime'

Repository: supermodeltools/cli

Length of output: 82

🏁 Script executed:

sed -n '200,215p' internal/config/config_test.go

Repository: supermodeltools/cli

Length of output: 597

Add a Windows guard to skip chmod-based tests on Windows.

This test simulates a permission failure by calling os.Chmod(cfgDir, 0555), but on Windows, os.Chmod doesn't enforce Unix-like permission restrictions the same way. This means the test won't actually fail on Windows like it's supposed to, making it a flaky test on that platform.

The codebase already uses runtime.GOOS guards elsewhere (see lines 36 and 70), so add the same pattern here:

Suggested fix

func TestSave_WriteFileError(t *testing.T) { - if os.Getenv("CI") != "" { + if os.Getenv("CI") != "" || runtime.GOOS == "windows" { - t.Skip("skipping chmod-based test in CI") + t.Skip("skipping chmod-based test in CI and on Windows") }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/config/config_test.go` around lines 207 - 225, The test TestSave_WriteFileError relies on Unix-style os.Chmod behavior which doesn't hold on Windows; add a guard using runtime.GOOS at the start of TestSave_WriteFileError to call t.Skip when runtime.GOOS == "windows" (matching existing guards used elsewhere) so the chmod-based simulation and subsequent os.Chmod(cfgDir, 0555) are not executed on Windows; keep the rest of the test (setting HOME, creating cfgDir, chmod, cleanup, creating Config and calling cfg.Save()) unchanged.

coderabbitai · 2026-04-13T21:24:50Z

internal/mcp/zip_test.go

+	if err := walkZip("/nonexistent-dir-xyzzy-mcp", dest); err == nil {
+		t.Error("walkZip should fail when source directory does not exist")
+	}
+}
+
+func TestMCPWalkZip_OpenFileError(t *testing.T) {
+	if os.Getenv("CI") != "" {
+		t.Skip("skipping chmod-based test in CI")
+	}
+	src := t.TempDir()
+	secret := filepath.Join(src, "secret.go")
+	if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.Chmod(secret, 0000); err != nil {
+		t.Fatal(err)
+	}
+	t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck
+	dest := filepath.Join(t.TempDir(), "out.zip")
+	if err := walkZip(src, dest); err == nil {
+		t.Error("walkZip should fail when a source file cannot be opened")
+	}
+}
+
+func TestMCPCreateZip_CreateTempError(t *testing.T) {
+	t.Setenv("TMPDIR", filepath.Join(t.TempDir(), "nonexistent-tmp"))
+	_, err := createZip(t.TempDir())
+	if err == nil {
+		t.Error("createZip should fail when os.CreateTemp fails")
+	}
+}
+
+func TestMCPCreateZip_NonExistentDir(t *testing.T) {
+	_, err := createZip("/nonexistent-dir-mcp-createzip-xyz")
+	if err == nil {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "zip_test.go" --path "*/mcp/*"

Repository: supermodeltools/cli

Length of output: 296

🏁 Script executed:

cat -n internal/mcp/zip_test.go | sed -n '130,170p'

Repository: supermodeltools/cli

Length of output: 1603

🏁 Script executed:

# Get full context of the test file to understand the functions being tested wc -l internal/mcp/zip_test.go

Repository: supermodeltools/cli

Length of output: 91

🏁 Script executed:

# Check how os.CreateTemp determines temp directory across platforms # Look for any platform-specific handling in the codebase rg -A 5 "os.CreateTemp" internal/mcp/

Repository: supermodeltools/cli

Length of output: 664

🏁 Script executed:

# Check if there are any imports or utility functions that handle cross-platform paths head -50 internal/mcp/zip_test.go

Repository: supermodeltools/cli

Length of output: 1274

🏁 Script executed:

# Verify the behavior - check Go's os.CreateTemp documentation behavior # by searching for how the temp directory is selected across platforms python3 << 'EOF' # Document standard Go behavior for os.CreateTemp print("Go's os.CreateTemp behavior:") print("- On Windows: checks TMP, TEMP, USERPROFILE env vars (in order)") print("- On Unix: checks TMPDIR env var") print("- Falls back to os.TempDir() if none are set") print("") print("The review's concern:") print("- Line 158 only sets TMPDIR") print("- On Windows in CI, this won't affect os.CreateTemp behavior") print("- Solution: set TMP and TEMP as well for full cross-platform coverage") EOF

Repository: supermodeltools/cli

Length of output: 415

🏁 Script executed:

# Confirm the patch would work by checking if createZip and walkZip use standard Go file ops rg -A 10 "func createZip\|func walkZip" internal/mcp/

Repository: supermodeltools/cli

Length of output: 45

Make nonexistent-path and temp-dir failure tests OS-portable.

Lines 133 and 166 hardcode Unix absolute paths that start with /. On Windows, paths look like C:\temp, so these tests will fail there. Line 158 also only sets TMPDIR, but Windows respects the TMP and TEMP environment variables instead. Fix both issues by building missing paths under t.TempDir() (portable and creates unique dirs) and setting all three common temp env vars.

Suggested fixes

func TestMCPWalkZip_WalkError(t *testing.T) { dest := filepath.Join(t.TempDir(), "out.zip") - if err := walkZip("/nonexistent-dir-xyzzy-mcp", dest); err == nil { + missing := filepath.Join(t.TempDir(), "does-not-exist") + if err := walkZip(missing, dest); err == nil { t.Error("walkZip should fail when source directory does not exist") } } func TestMCPCreateZip_CreateTempError(t *testing.T) { - t.Setenv("TMPDIR", filepath.Join(t.TempDir(), "nonexistent-tmp")) + invalidTmp := filepath.Join(t.TempDir(), "nonexistent-tmp") + t.Setenv("TMPDIR", invalidTmp) + t.Setenv("TMP", invalidTmp) + t.Setenv("TEMP", invalidTmp) _, err := createZip(t.TempDir()) if err == nil { t.Error("createZip should fail when os.CreateTemp fails") } } func TestMCPCreateZip_NonExistentDir(t *testing.T) { - _, err := createZip("/nonexistent-dir-mcp-createzip-xyz") + _, err := createZip(filepath.Join(t.TempDir(), "does-not-exist")) if err == nil { t.Error("createZip should fail when directory does not exist") } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/mcp/zip_test.go` around lines 133 - 167, Tests use hardcoded Unix absolute paths and only set TMPDIR, which breaks on Windows; update tests to build missing directories under t.TempDir() and set all common temp env vars. Replace occurrences of "/nonexistent-dir-xyzzy-mcp" and "/nonexistent-dir-mcp-createzip-xyz" with filepath.Join(t.TempDir(), "nonexistent-...") so they are OS-portable; in TestMCPCreateZip_CreateTempError set TMPDIR, TMP and TEMP (use t.Setenv for each) to a non-existent path under t.TempDir() before calling createZip; keep assertions and use the same function names walkZip and createZip to locate changes.

coderabbitai · 2026-04-13T21:24:50Z

internal/mcp/zip_test.go

+func TestMCPWalkZip_OpenFileError(t *testing.T) {
+	if os.Getenv("CI") != "" {
+		t.Skip("skipping chmod-based test in CI")
+	}
+	src := t.TempDir()
+	secret := filepath.Join(src, "secret.go")
+	if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil {
+		t.Fatal(err)
+	}
+	if err := os.Chmod(secret, 0000); err != nil {
+		t.Fatal(err)
+	}
+	t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck
+	dest := filepath.Join(t.TempDir(), "out.zip")
+	if err := walkZip(src, dest); err == nil {
+		t.Error("walkZip should fail when a source file cannot be opened")
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "zip_test.go" | head -20

Repository: supermodeltools/cli

Length of output: 263

🏁 Script executed:

wc -l internal/mcp/zip_test.go

Repository: supermodeltools/cli

Length of output: 91

🏁 Script executed:

cat -n internal/mcp/zip_test.go | sed -n '130,160p'

Repository: supermodeltools/cli

Length of output: 1268

🏁 Script executed:

rg -A 10 "func walkZip" internal/mcp/ | head -30

Repository: supermodeltools/cli

Length of output: 530

🏁 Script executed:

rg -B 5 -A 20 "func walkZip" internal/mcp/zip.go | head -40

Repository: supermodeltools/cli

Length of output: 618

🏁 Script executed:

cat -n internal/mcp/zip.go | sed -n '25,80p'

Repository: supermodeltools/cli

Length of output: 2108

🏁 Script executed:

cat -n internal/mcp/zip.go | sed -n '82,140p'

Repository: supermodeltools/cli

Length of output: 1419

🏁 Script executed:

rg -i "windows|root|permission" internal/mcp/zip_test.go

Repository: supermodeltools/cli

Length of output: 45

Probe actual file readability before running the permission test.

The current CI-only skip is too narrow. On Windows, when running as root, or in some containers, chmod 0000 doesn't actually prevent file reads. This means the test can flake unpredictably—the file stays readable, walkZip succeeds when it should fail, and your test fails.

Instead of relying on environment checks, try to open the file after chmod. If it still opens, skip the test gracefully. This way, the test runs when it can run, and skips when the precondition (unreadable file) can't be established. Much more reliable.

Suggested fix

func TestMCPWalkZip_OpenFileError(t *testing.T) { - if os.Getenv("CI") != "" { - t.Skip("skipping chmod-based test in CI") - } src := t.TempDir() secret := filepath.Join(src, "secret.go") if err := os.WriteFile(secret, []byte("package main"), 0600); err != nil { t.Fatal(err) } if err := os.Chmod(secret, 0000); err != nil { t.Fatal(err) } t.Cleanup(func() { os.Chmod(secret, 0600) }) //nolint:errcheck + if f, err := os.Open(secret); err == nil { + _ = f.Close() + t.Skip("platform/user can still read chmod 0000 file") + } dest := filepath.Join(t.TempDir(), "out.zip") if err := walkZip(src, dest); err == nil { t.Error("walkZip should fail when a source file cannot be opened") } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/mcp/zip_test.go` around lines 138 - 154, The test TestMCPWalkZip_OpenFileError should verify the unreadability precondition after chmod: after calling os.Chmod(secret, 0000) attempt to open the file (e.g., os.Open(secret)) and if the open succeeds (meaning chmod didn't make it unreadable) call t.Skipf with a clear reason so the test is skipped rather than flaking; keep the existing t.Cleanup to restore permissions and then proceed to call walkZip(dest) expecting an error only when the open actually fails.

coderabbitai · 2026-04-13T21:24:50Z

internal/mcp/zip_test.go

+func initCleanMCPGitRepo(t *testing.T) string {
+	t.Helper()
+	dir := t.TempDir()
+	run := func(args ...string) {
+		t.Helper()
+		cmd := exec.Command(args[0], args[1:]...)
+		cmd.Dir = dir
+		if out, err := cmd.CombinedOutput(); err != nil {
+			t.Fatalf("git setup %v: %v\n%s", args, err, out)
+		}
+	}
+	run("git", "init")
+	run("git", "config", "user.email", "ci@test.local")
+	run("git", "config", "user.name", "CI")
+	if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil {
+		t.Fatal(err)
+	}
+	run("git", "add", ".")
+	run("git", "commit", "-m", "init")
+	return dir


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and check the function fd -t f "zip_test.go" | head -5

Repository: supermodeltools/cli

Length of output: 210

🏁 Script executed:

# Check if the file exists and read the relevant section if [ -f "internal/mcp/zip_test.go" ]; then wc -l internal/mcp/zip_test.go fi

Repository: supermodeltools/cli

Length of output: 91

🏁 Script executed:

# Search for the initCleanMCPGitRepo function rg "initCleanMCPGitRepo" -A 25 internal/mcp/

Repository: supermodeltools/cli

Length of output: 3476

🏁 Script executed:

# Check for other git-related test functions and patterns rg "exec\.Command.*git" -B 3 -A 3 internal/

Repository: supermodeltools/cli

Length of output: 8885

🏁 Script executed:

# Look for existing patterns of skip or LookPath usage in tests rg "t\.Skip|exec\.LookPath" -B 2 -A 2 --type go

Repository: supermodeltools/cli

Length of output: 15293

Skip tests gracefully when git is unavailable.

Right now, if someone runs tests on a system without git installed, the whole initCleanMCPGitRepo helper will crash hard. Since three different tests depend on this helper, they'll all fail for an infrastructure reason (missing git), not because of an actual bug in your code. Better to skip them cleanly instead.

Add a quick exec.LookPath("git") check at the start of the helper—this is already a pattern used elsewhere in your codebase for checking if tools exist. If git is missing, the test skips with a clear message. That way, on systems without git, tests are skipped rather than failing.

Proposed patch

func initCleanMCPGitRepo(t *testing.T) string { t.Helper() + if _, err := exec.LookPath("git"); err != nil { + t.Skip("git is not installed in this environment") + } dir := t.TempDir() run := func(args ...string) { t.Helper() cmd := exec.Command(args[0], args[1:]...)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func initCleanMCPGitRepo(t *testing.T) string {

t.Helper()

dir := t.TempDir()

run := func(args ...string) {

t.Helper()

cmd := exec.Command(args[0], args[1:]...)

cmd.Dir = dir

if out, err := cmd.CombinedOutput(); err != nil {

t.Fatalf("git setup %v: %v\n%s", args, err, out)

}

}

run("git", "init")

run("git", "config", "user.email", "ci@test.local")

run("git", "config", "user.name", "CI")

if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil {

t.Fatal(err)

}

run("git", "add", ".")

run("git", "commit", "-m", "init")

return dir

func initCleanMCPGitRepo(t *testing.T) string {

t.Helper()

if _, err := exec.LookPath("git"); err != nil {

t.Skip("git is not installed in this environment")

}

dir := t.TempDir()

run := func(args ...string) {

t.Helper()

cmd := exec.Command(args[0], args[1:]...)

cmd.Dir = dir

if out, err := cmd.CombinedOutput(); err != nil {

t.Fatalf("git setup %v: %v\n%s", args, err, out)

}

}

run("git", "init")

run("git", "config", "user.email", "ci@test.local")

run("git", "config", "user.name", "CI")

if err := os.WriteFile(filepath.Join(dir, "main.go"), []byte("package main"), 0600); err != nil {

t.Fatal(err)

}

run("git", "add", ".")

run("git", "commit", "-m", "init")

return dir

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/mcp/zip_test.go` around lines 172 - 191, The initCleanMCPGitRepo helper should detect missing git and skip tests instead of failing; add an exec.LookPath("git") call at the start of initCleanMCPGitRepo and if it returns an error call t.Skipf with a clear message (e.g. "git not available: %v") so tests depending on initCleanMCPGitRepo are skipped when git is not installed; keep t.Helper() as-is and ensure subsequent git commands assume git exists.

jonathanpopham · 2026-04-13T21:25:03Z

Closing — this PR accumulated 14k lines of unrelated commits from rebasing. Reopening as a clean 196-line PR on feat/skill-command.

draft: skill command covering both graph file formats

d977e4b

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

draft: simplify skill to single .graph format only

ecb068e

jonathanpopham mentioned this pull request Apr 13, 2026

Benchmark: A/B test with/without shards on real open-source issues #113

Closed

greynewell and others added 12 commits April 13, 2026 17:14

chore: remove .DS_Store and __pycache__ files, update .gitignore

2ea8298

Removes accidentally committed macOS and Python cache files from benchmark/. Adds .DS_Store, __pycache__/*.pyc, cover*.out, and *.graph.{go,py} to .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jonathanpopham changed the title ~~draft: skill command redesign — covers both graph file formats~~ feat: skill command — auto-generated prompt matches hand-crafted benchmark Apr 13, 2026

jonathanpopham marked this pull request as ready for review April 13, 2026 21:15

jonathanpopham requested a review from greynewell as a code owner April 13, 2026 21:15

test: add skill prompt regression tests

17875e3

Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

jonathanpopham closed this Apr 13, 2026

Conversation

jonathanpopham commented Apr 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (Django, 8 failing tests)

What changed in the prompt

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanpopham commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonathanpopham commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading