Skip to content

feat: add arch-docs command for static architecture site generation#21

Merged
greynewell merged 8 commits intomainfrom
worktree-feat-arch-docs
Apr 2, 2026
Merged

feat: add arch-docs command for static architecture site generation#21
greynewell merged 8 commits intomainfrom
worktree-feat-arch-docs

Conversation

@greynewell
Copy link
Copy Markdown
Contributor

@greynewell greynewell commented Apr 2, 2026

Summary

  • Imports the full supermodeltools/arch-docs pipeline into the CLI — same functionality, no GitHub Action required
  • New supermodel arch-docs [path] command generates a browsable static HTML site from a codebase's architecture graph
  • Bundles all 12 HTML/CSS/JS templates into the binary via go:embed so no external assets needed

What changed

New command (cmd/archdocs.go + internal/archdocs/handler.go):

supermodel arch-docs [path] [flags]

Flags:
  --site-name string       display title for the generated site
  --base-url string        canonical base URL (handles subdirectory path-prefix rewriting)
  --repo string            GitHub owner/repo slug for source links
  -o, --output string      output directory (default: ./arch-docs-output)
  --templates-dir string   override bundled templates with a custom directory
  --max-source-files int   cap source files sent to API (default 3000)
  --max-entities int       cap entity pages generated (default 12000)
  --force                  bypass cache

Ported packages (internal/archdocs/):

  • graph2md — converts the Supermodel API graph JSON → per-entity markdown files
  • pssg/{build,config,entity,loader,render,output,schema,taxonomy,...} — static site generator that turns those markdown files into a full HTML site with search, taxonomy nav, sitemaps, RSS, llms.txt, and JSON-LD

Small additions to existing packages:

  • internal/api/client.goAnalyzeRaw() returns the raw job.Result JSON (graph2md needs the full result payload, not just the parsed Graph)
  • internal/analyze/zip.go — exported CreateZip() wrapper so archdocs can reuse the archive logic

Pipeline

  1. Archive repo → analyze.CreateZip() (git archive or walk fallback)
  2. Upload + poll API → api.Client.AnalyzeRaw() → raw result JSON
  3. graph2md.Run() → markdown files per entity
  4. Extract bundled templates → temp dir, write pssg.yaml, pssg.Build()
  5. Optionally rewrite root-relative paths for subdirectory deployments

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run ./... — clean
  • go test ./... — all existing tests pass
  • supermodel arch-docs --help shows expected flags
  • End-to-end: supermodel arch-docs . --output /tmp/test-site generates a valid static site

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added arch-docs CLI to generate static architecture documentation sites.
    • Full static-site generator: templates, taxonomies, pagination, search index, asset extraction, and build pipeline.
    • Interactive visualizations (force graphs, charts, architecture maps) and site search UI.
    • Social sharing: Open Graph, custom share images, and PWA manifest.
    • Feeds & SEO: RSS feeds, sitemap generation, robots.txt, JSON‑LD schema output.
    • Output extras: llms.txt generation and enrichment/affiliate link support.

Imports the full supermodeltools/arch-docs pipeline into the CLI so the
same functionality is available without a GitHub Action. The new
`supermodel arch-docs [path]` command:

- Archives the repo (reuses existing git-archive / walk logic)
- Uploads to Supermodel API via a new AnalyzeRaw() method that returns
  the raw result JSON needed by graph2md
- Converts the graph to markdown with the ported graph2md package
- Builds a browsable static HTML site via the ported pssg packages
- Bundles the 12 HTML/CSS/JS templates into the binary (go:embed)
- Rewrites root-relative paths when --base-url has a subdirectory prefix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 063f69a1-9852-4cf3-a4b8-46854ceedd20

📥 Commits

Reviewing files that changed from the base of the PR and between 21fdc5d and cad7f9b.

📒 Files selected for processing (1)
  • .github/workflows/architecture.yml

Walkthrough

Adds a new arch-docs feature: CLI command that zips a repo, requests an analysis from the API (raw JSON), converts the graph to markdown, and builds a static documentation site (templates, search, RSS, sitemap, visualizations).

Changes

Cohort / File(s) Summary
CLI & Entry
cmd/archdocs.go
New arch-docs [path] Cobra subcommand with flags; enforces API key and invokes archdocs handler.
API Client
internal/api/client.go
Added Client.AnalyzeRaw(ctx, zipPath, idempotencyKey) returning raw JSON result.
Orchestration & ZIP
internal/archdocs/handler.go, internal/archdocs/zip.go
Top-level Run() handler added; resolves paths, creates repo ZIP (git archive fallback), calls API, writes raw JSON and drives downstream conversion/build.
Graph → Markdown
internal/archdocs/graph2md/graph2md.go
New graph2md.Run: parses/merges graph JSON(s), deduplicates nodes, builds relationship indexes, generates per-entity markdown with frontmatter and neighborhood data, supports truncation limits.
PSSG Build System
internal/archdocs/pssg/...
.../build/build.go, .../config/*, .../render/*, .../output/*, .../entity/*, .../loader/*, .../taxonomy/*, .../enrichment/*, .../schema/*
Comprehensive static-site generator: YAML config loader, entity model & slugging, loaders (markdown), enrichment cache, affiliate registry, taxonomy builder, rendering engine + template func map, share-image SVGs, JSON-LD generator, sitemap/RSS/robots/manifest/llms.txt outputs, and a concurrent builder orchestration.
Templates & Frontend
internal/archdocs/templates/* (many files, e.g. _head.html, _header.html, _main.js, _styles.css, entity.html, index.html, hub.html, all_entities.html, taxonomy_index.html, letter.html, etc.)
New template set, CSS, and large client-side JS for D3/Mermaid visualizations and search UI.
Affiliate & Utilities
internal/archdocs/pssg/affiliate/affiliate.go, internal/archdocs/pssg/entity/slug.go, internal/archdocs/pssg/enrichment/cache.go
Affiliate link generation from enrichment, slug utility, enrichment cache readers and typed extractors.
Lint Config
.golangci.yaml
Adjusted linter exclusions and GoSec rules for new archdocs code and CLI-driven output paths.
CI
.github/workflows/architecture.yml
Workflow removed.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as User/CLI
    participant Handler as archdocs.Handler
    participant ZIP as Zip (git/walk)
    participant API as Supermodel API Client
    participant G2MD as Graph2MD
    participant Builder as PSSG Builder
    participant FS as File Output

    CLI->>Handler: run arch-docs [path] + options
    Handler->>ZIP: createZip(dir)
    ZIP-->>Handler: zipPath
    Handler->>API: AnalyzeRaw(ctx, zipPath, idempotencyKey)
    API-->>Handler: raw JSON graph
    Handler->>G2MD: Run(raw JSON -> markdown)
    G2MD-->>Handler: markdown files (entities)
    Handler->>Builder: NewBuilder.Build(outputDir, templates, opts)
    Builder->>Builder: load entities, enrichment, taxonomies
    Builder->>Builder: render pages (templates + funcmap)
    Builder->>FS: write HTML/CSS/JS, rss, sitemap, search index
    FS-->>CLI: output ready (summary)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • jonathanpopham

Poem

🛠️ Zip the repo, call the API,
Nodes become pages, graphs fly by,
Taxonomies, slugs, a dark-mode gleam,
Search and charts dance in a stream,
Docs built from code — a tiny dream.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature addition—a new arch-docs command for generating static architecture documentation sites.
Description check ✅ Passed The description is mostly complete with clear sections covering what changed (new command, ported packages, small additions), the pipeline flow, and a test plan. Minor section headings differ from template, but all core information is present and well-organized.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-feat-arch-docs

Comment @coderabbitai help to get the list of available commands and usage tips.

greynewell and others added 5 commits April 2, 2026 15:05
- graph2md: replace sb.WriteString(fmt.Sprintf(...)) with fmt.Fprintf(sb, ...)
  for all 107 occurrences flagged by gocritic preferFprint
- handler.go: covered by extending the os.Remove errcheck exclusion
- golangci.yaml: extend best-effort cleanup exclusion from os.Remove to
  os.RemoveAll? (both are temp file/dir cleanup, errors not actionable)
- cmd/archdocs.go: fix goimports import ordering

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- golangci.yaml: fix RemoveAll errcheck exclusion regex (os\.Remove(All)?
  now matches both os.Remove and os.RemoveAll); add path-based gocritic
  exclusion for ported graph2md and pssg packages to avoid needless
  divergence from upstream supermodeltools/arch-docs
- handler.go: use new-style octal literals (0o600, 0o755)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- golangci.yaml: extend ported-package exclusion to include gocyclo and
  gosec (high-complexity functions and permission choices are intentional
  in the arch-docs pipeline); add global gosec exclusions for G301/G306
  (static site output permissions) and G703 (user-supplied output paths)
- handler.go: add nolint:gocyclo — sequential pipeline with many error
  checks; splitting would obscure the flow
- Re-run goimports with -local flag so import groups match CI expectations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- golangci.yaml: add revive and ineffassign to the ported-package
  exclusion rule (exported-type doc comments and ineffectual assignments
  are intentional in the arch-docs upstream code)
- handler.go: add nolint:gocritic for hugeParam on opts (value-semantic
  config struct; pointer would be unusual at a CLI call boundary)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
staticcheck fires on ported arch-docs code (deprecated strings.Title,
S1011 loop simplification, SA4004/SA4006 control-flow issues, QF1012
WriteString pattern) — all acceptable in vendored/ported code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (10)
internal/api/client.go-56-62 (1)

56-62: ⚠️ Potential issue | 🟡 Minor

Fail fast when completed job has an empty result

At Line 61, returning an empty job.Result pushes failure downstream with a less clear error path. Add a guard here so callers get an immediate API-level failure reason.

Suggested patch
 func (c *Client) AnalyzeRaw(ctx context.Context, zipPath, idempotencyKey string) (json.RawMessage, error) {
 	job, err := c.pollUntilComplete(ctx, zipPath, idempotencyKey)
 	if err != nil {
 		return nil, err
 	}
+	if len(job.Result) == 0 {
+		return nil, fmt.Errorf("analysis completed with empty result payload")
+	}
 	return job.Result, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/client.go` around lines 56 - 62, In Client.AnalyzeRaw, after
receiving job from pollUntilComplete, add a guard to fail fast when job.Result
is nil or empty: check job.Result (or len(job.Result) == 0) and return a clear
error (e.g., fmt.Errorf("AnalyzeRaw: completed job has empty result for zip %s,
idempotency %s", zipPath, idempotencyKey)) instead of returning nil/empty
payload; this ensures callers of AnalyzeRaw get an immediate API-level failure
rather than downstream errors.
internal/archdocs/templates/_header.html-24-28 (1)

24-28: ⚠️ Potential issue | 🟡 Minor

Add dialog semantics and explicit input labeling for search overlay.

This search UI is close, but accessibility improves a lot with role="dialog" + aria-modal on the modal and an explicit aria-label for the input.

Suggested fix
-  <div class="search-modal">
+  <div class="search-modal" role="dialog" aria-modal="true" aria-label="Search">
...
-      <input type="text" class="search-input" id="search-input" placeholder="Search entities..." autocomplete="off">
+      <input type="text" class="search-input" id="search-input" placeholder="Search entities..." aria-label="Search entities" autocomplete="off">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/_header.html` around lines 24 - 28, The search
overlay lacks dialog semantics and an explicit label: add role="dialog" and
aria-modal="true" to the element with class "search-modal" (or the wrapping
container used for the overlay) and provide an accessible label for the input
with id "search-input" (e.g., add aria-label="Search entities" or reference a
visible label via aria-labelledby) so the search input and modal are announced
correctly to assistive technologies.
internal/archdocs/pssg/output/robots.go-36-37 (1)

36-37: ⚠️ Potential issue | 🟡 Minor

Normalize BaseURL before appending sitemap.xml.

On Line 36, a trailing slash in cfg.Site.BaseURL creates a double slash in the sitemap URL (...//sitemap.xml). Trim once before concatenation.

Suggested fix
-	sitemapURL := fmt.Sprintf("%s/sitemap.xml", cfg.Site.BaseURL)
+	baseURL := strings.TrimRight(cfg.Site.BaseURL, "/")
+	sitemapURL := fmt.Sprintf("%s/sitemap.xml", baseURL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/output/robots.go` around lines 36 - 37, The code
builds sitemapURL using cfg.Site.BaseURL which may have a trailing slash causing
a double-slash; normalize BaseURL first (e.g., trim trailing '/' from
cfg.Site.BaseURL) before constructing sitemapURL so
fmt.Sprintf("%s/sitemap.xml", ...) always produces a single-slash URL — update
the logic around sitemapURL creation and the lines = append(...) call to use the
trimmed BaseURL (reference cfg.Site.BaseURL and sitemapURL).
internal/archdocs/templates/_styles.css-107-108 (1)

107-108: ⚠️ Potential issue | 🟡 Minor

Remove deprecated word-break: break-word.

Stylelint is right here: overflow-wrap: break-word is already doing the wrapping, and the extra word-break: break-word declaration is deprecated.

Also applies to: 217-218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/_styles.css` around lines 107 - 108, Remove the
deprecated declaration "word-break: break-word" and keep only "overflow-wrap:
break-word" where it appears; specifically, locate occurrences in the CSS (the
block containing overflow-wrap: break-word) and delete the "word-break:
break-word" line (also remove the duplicate at the later occurrence noted) so
the stylesheet uses only overflow-wrap for wrapping.
internal/archdocs/templates/index.html-2-2 (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Use the configured site language in the root tag.

config.Load already populates Site.Language, but this hard-codes en, so non-English sites still advertise English to browsers and assistive tech.

Possible fix
-<html lang="en">
+<html lang="{{.Site.Language}}">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/index.html` at line 2, The root <html> tag is
hard-coded to lang="en" instead of using the configured site language; update
the template to use the populated Site.Language value (from config.Load) for the
lang attribute so the generated pages reflect non-English sites correctly—locate
the root html tag in internal/archdocs/templates/index.html and replace the
literal "en" with the template expression that outputs Site.Language.
internal/archdocs/handler.go-397-418 (1)

397-418: ⚠️ Potential issue | 🟡 Minor

Multiple unchecked os.RemoveAll calls in resolveTemplates.

Same issue here - the linter wants you to explicitly handle or discard these errors. Since these are cleanup-on-error paths, discarding is appropriate but should be explicit.

🔧 Fix all unchecked os.RemoveAll calls
 	entries, err := bundledTemplates.ReadDir("templates")
 	if err != nil {
-		os.RemoveAll(tmp)
+		_ = os.RemoveAll(tmp)
 		return "", nil, err
 	}
 	for _, e := range entries {
 		if e.IsDir() {
 			continue
 		}
 		data, err := bundledTemplates.ReadFile("templates/" + e.Name())
 		if err != nil {
-			os.RemoveAll(tmp)
+			_ = os.RemoveAll(tmp)
 			return "", nil, err
 		}
 		if err := os.WriteFile(filepath.Join(tmp, e.Name()), data, 0600); err != nil {
-			os.RemoveAll(tmp)
+			_ = os.RemoveAll(tmp)
 			return "", nil, err
 		}
 	}

-	return tmp, func() { os.RemoveAll(tmp) }, nil
+	return tmp, func() { _ = os.RemoveAll(tmp) }, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 397 - 418, In resolveTemplates,
the cleanup calls call os.RemoveAll(tmp) without handling the returned error;
update each error path (after bundledTemplates.ReadDir,
bundledTemplates.ReadFile, and os.WriteFile failures) to explicitly discard the
cleanup error (e.g., _ = os.RemoveAll(tmp)) so the linter is satisfied; ensure
every occurrence of os.RemoveAll(tmp) in resolveTemplates is replaced with an
explicit ignore of the error.
internal/archdocs/pssg/config/types.go-143-150 (1)

143-150: ⚠️ Potential issue | 🟡 Minor

TemplatesConfig missing all_entities template field.

The handler template (line 235) specifies all_entities: "all_entities.html", but this struct doesn't have a corresponding field. This means the all_entities template path won't be configurable through the config struct.

Looking at how templates are used in render.go, this might be fine if it's hardcoded elsewhere, but it's inconsistent with the other template fields.

🔧 Add missing template field
 type TemplatesConfig struct {
 	Entity         string            `yaml:"entity"`
 	Homepage       string            `yaml:"homepage"`
 	Hub            string            `yaml:"hub"`
 	TaxonomyIndex  string            `yaml:"taxonomy_index"`
 	Letter         string            `yaml:"letter"`
+	AllEntities    string            `yaml:"all_entities"`
 	StaticPages    map[string]string `yaml:"static_pages"`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/config/types.go` around lines 143 - 150,
TemplatesConfig is missing the all_entities field, so add a new string field
named AllEntities with yaml tag `all_entities` to the TemplatesConfig struct so
the handler-configured template `all_entities: "all_entities.html"` becomes
configurable; update the TemplatesConfig definition (the struct currently
containing Entity, Homepage, Hub, TaxonomyIndex, Letter, StaticPages) to include
AllEntities string `yaml:"all_entities"` so render.go and the handler will pick
up the configured template path.
internal/archdocs/handler.go-313-313 (1)

313-313: ⚠️ Potential issue | 🟡 Minor

Unchecked os.RemoveAll causing pipeline failure.

The linter is unhappy because os.RemoveAll returns an error that's being ignored. For cleanup in a defer, this is usually fine since you can't do much if cleanup fails anyway.

To make the linter happy and be explicit about ignoring the error:

🔧 Explicitly discard the error
-	defer os.RemoveAll(tmpDir)
+	defer func() { _ = os.RemoveAll(tmpDir) }()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` at line 313, The linter flags the unchecked
error from defer os.RemoveAll(tmpDir); fix by explicitly discarding the error so
it's clear and linter-friendly: replace the defer call with an explicit discard
such as defer func() { _ = os.RemoveAll(tmpDir) }() (or defer func() { _, _ =
os.RemoveAll(tmpDir) }() ) so the error is intentionally ignored while keeping
the same cleanup behavior for tmpDir and satisfying the linter.
internal/archdocs/pssg/affiliate/affiliate.go-26-36 (1)

26-36: ⚠️ Potential issue | 🟡 Minor

Dead code: encoded variable is computed but never used.

Lines 27-28 create an encoded variable with %20 spaces, but then line 30 creates plusEncoded with + spaces, and only plusEncoded is used in the template substitution (line 33).

The comment on line 29 says "Also support + encoding like the TS version" but the %20 version is never actually used anywhere.

🧹 Remove unused variable
 func (p *Provider) GenerateLink(term string) string {
-	encoded := url.QueryEscape(term)
-	encoded = strings.ReplaceAll(encoded, "+", "%20")
-	// Also support + encoding like the TS version
 	plusEncoded := strings.ReplaceAll(url.QueryEscape(term), "%20", "+")

 	result := p.URLTemplate
 	result = strings.ReplaceAll(result, "{{term}}", plusEncoded)
 	result = strings.ReplaceAll(result, "{{tag}}", p.Tag)
 	return result
 }

Or if you actually need both encodings for different providers:

 func (p *Provider) GenerateLink(term string) string {
 	encoded := url.QueryEscape(term)
-	encoded = strings.ReplaceAll(encoded, "+", "%20")
-	// Also support + encoding like the TS version
-	plusEncoded := strings.ReplaceAll(url.QueryEscape(term), "%20", "+")
+	spaceEncoded := strings.ReplaceAll(encoded, "+", "%20")
+	plusEncoded := encoded // QueryEscape already uses + for spaces

 	result := p.URLTemplate
-	result = strings.ReplaceAll(result, "{{term}}", plusEncoded)
+	result = strings.ReplaceAll(result, "{{term}}", plusEncoded)      // + for spaces
+	result = strings.ReplaceAll(result, "{{term20}}", spaceEncoded)   // %20 for spaces
 	result = strings.ReplaceAll(result, "{{tag}}", p.Tag)
 	return result
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/affiliate/affiliate.go` around lines 26 - 36, In
Provider.GenerateLink the local variable encoded is computed but never used;
remove the unused encoded computation (the url.QueryEscape + strings.ReplaceAll
that sets encoded) so only plusEncoded is kept and used for replacing "{{term}}"
in p.URLTemplate (symbols: Provider.GenerateLink, encoded, plusEncoded,
p.URLTemplate, p.Tag); alternatively if both encodings are required, modify the
template and replacement logic to use distinct placeholders (e.g.,
"{{term_space}}" -> encoded and "{{term_plus}}" -> plusEncoded) and perform both
strings.ReplaceAll calls accordingly.
internal/archdocs/pssg/taxonomy/taxonomy.go-252-259 (1)

252-259: ⚠️ Potential issue | 🟡 Minor

Decode the first rune before calling unicode.ToUpper, not the first byte.

entry.Name[0] grabs the first byte, not the first character. For names starting with accented or non-Latin letters like Éclair or Ωmega, this causes a problem: the first byte of the UTF-8 encoding (like 0xC9 for É) isn't a valid letter on its own, so unicode.IsLetter() returns false and the entry gets bucketed under # instead of its actual letter.

Quick example from the initial check:

  • Éclair: first-byte = 'Ã' (wrong) instead of 'É' (correct)
  • Ωmega: first-byte = 'Î' (wrong) instead of 'Ω' (correct)
  • Go: first-byte = 'G' (correct, because it's ASCII)

Convert to a rune first. In Go, you can use []rune(entry.Name)[0] or utf8.DecodeRuneInString() to do this properly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy.go` around lines 252 - 259, The loop
currently uses entry.Name[0] which reads the first byte; decode the first rune
instead so unicode.ToUpper and unicode.IsLetter operate on the full character
(handles accented/non-ASCII letters). Change the logic around entry.Name, e.g.
use utf8.DecodeRuneInString(entry.Name) or convert to []rune and take the first
rune, then call unicode.ToUpper on that rune and continue using letter =
string(firstRune) for bucket keys; update the branch that checks
unicode.IsLetter(first) accordingly (functions: entry.Name, unicode.ToUpper,
unicode.IsLetter).
🧹 Nitpick comments (7)
internal/archdocs/templates/_head.html (1)

4-5: Use base-URL-aware links for feed/manifest

Line 4 and Line 5 hardcode root-relative URLs, which can break when docs are hosted under a subpath (for example /docs). Prefer generating these from the configured base URL so they work without depending on a post-build rewrite step.

Suggested patch
-<link rel="alternate" type="application/rss+xml" title="{{.Site.Name}}" href="https://github.com/feed.xml">
-<link rel="manifest" href="https://github.com/manifest.json">
+<link rel="alternate" type="application/rss+xml" title="{{.Site.Name}}" href="{{.Site.BaseURL}}/feed.xml">
+<link rel="manifest" href="{{.Site.BaseURL}}/manifest.json">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/_head.html` around lines 4 - 5, Replace the
hardcoded root-relative hrefs in the head template with base-URL-aware Hugo
helpers so links work under a subpath; specifically update the <link
rel="alternate" ... href="https://github.com/feed.xml"> and <link rel="manifest" ...
href="https://github.com/manifest.json"> entries to use Hugo URL helpers such as {{ "feed.xml" |
relURL }} (or {{ "feed.xml" | absURL }} if you prefer absolute) and similarly {{
"manifest.json" | relURL }} so the generated hrefs honor the configured base
URL.
internal/archdocs/templates/index.html (1)

91-92: Don't make the generated site depend on a CDN for core charts.

arch-docs is supposed to emit a portable static site, but this pulls D3 from jsDelivr at runtime. Open the output offline, behind a strict CSP, or in an air-gapped environment and the homepage visualizations stop working.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/index.html` around lines 91 - 92, The HTML
currently pulls D3 from a CDN via the external script tag referencing d3.min.js;
replace this runtime CDN dependency by vendoring or bundling D3 into the
generated site: either copy a local d3.min.js into the site assets and update
the <script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script> to
point to that local asset, or update the build that produces main.js (the
bundler entry referenced as /main.js) to include/import d3 so the library is
bundled into main.js and remove the external CDN script tag; ensure the build
step that generates the static output copies the local d3 asset (if used) so the
emitted site is fully self-contained.
internal/archdocs/pssg/output/llmstxt.go (1)

28-32: Avoid strings.Title + "s" for the section heading.

This turns common values into awkward labels like Entitys or Classs, and strings.Title is deprecated in Go. A configured plural label, or a tiny helper that formats one cleanly, will keep llms.txt output sane.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/output/llmstxt.go` around lines 28 - 32, The current
code builds a heading using cfg.Data.EntityType with strings.Title and a literal
"s" which produces bad plurals and uses deprecated strings.Title; replace this
with a small helper (e.g. formatEntityLabel or pluralizeEntityLabel) that: (1)
prefers a configured plural label if available (e.g. cfg.Data.EntityPlural), (2)
otherwise applies a minimal pluralization rule (handle common endings like "s",
"x", "ch", "sh" -> add "es", words ending in "y" -> replace "y" with "ies",
otherwise add "s"), and (3) capitalizes only the first rune using
unicode.ToUpper; then use that helper when building the heading instead of
strings.Title+ "s" (update the variable entityLabel and the fmt.Sprintf call
that appends the "## %ss" line).
internal/archdocs/graph2md/graph2md.go (1)

588-594: Linter suggests using fmt.Fprintf instead of sb.WriteString(fmt.Sprintf(...)).

This is a minor style thing - fmt.Fprintf(sb, "...", args) is slightly more idiomatic than sb.WriteString(fmt.Sprintf("...", args)). Both work fine, so this is optional cleanup.

💡 Optional: Use fmt.Fprintf for cleaner code
-	sb.WriteString(fmt.Sprintf("title: %q\n", title))
-	sb.WriteString(fmt.Sprintf("description: %q\n", desc))
+	fmt.Fprintf(sb, "title: %q\n", title)
+	fmt.Fprintf(sb, "description: %q\n", desc)

This pattern applies throughout the frontmatter writers. Not critical, just cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/graph2md/graph2md.go` around lines 588 - 594, Replace the
repeated sb.WriteString(fmt.Sprintf(...)) calls in the frontmatter writer with
fmt.Fprintf to write directly to the string builder (use fmt.Fprintf(sb, "title:
%q\n", title), fmt.Fprintf(sb, "description: %q\n", desc), fmt.Fprintf(sb,
"node_type: \"File\"\n"), fmt.Fprintf(sb, "file_path: %q\n", path),
fmt.Fprintf(sb, "file_name: %q\n", name) and conditionally fmt.Fprintf(sb,
"language: %q\n", lang) when lang != "") so the code uses fmt.Fprintf against sb
instead of allocating intermediate strings via fmt.Sprintf; update all similar
frontmatter writes in the same file (e.g., in graph2md.go frontmatter writer
functions) for consistency.
internal/archdocs/pssg/render/shareimage.go (1)

28-34: Same UTF-8 truncation concern here.

Like the search index in build.go, s[:max-1] on line 33 could slice in the middle of a multi-byte UTF-8 character. For SVG text this could result in a malformed character, though browsers are usually forgiving.

💡 Optional: Use rune-aware truncation
 func truncate(s string, max int) string {
-	if len(s) <= max {
+	runes := []rune(s)
+	if len(runes) <= max {
 		return s
 	}
-	return s[:max-1] + "\u2026"
+	return string(runes[:max-1]) + "\u2026"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/render/shareimage.go` around lines 28 - 34, The
truncate function currently slices bytes with s[:max-1], which can split
multi-byte UTF-8 characters; change truncate to be rune-aware (keep the function
name truncate and the same signature) by computing rune length and slicing by
runes instead of bytes (e.g., convert to []rune or iterate runes to build the
prefix up to max runes minus one), then append the ellipsis; ensure you compare
rune count (utf8.RuneCountInString) against max and use rune-based slicing to
avoid cutting a multi-byte character in the middle.
internal/archdocs/pssg/build/build.go (1)

1337-1369: Search index generation handles truncation nicely.

The description truncation at 120 chars (line 1345-1346) is a good choice for keeping the index size reasonable. The compact JSON keys (t, d, s, etc.) are smart for reducing payload size on large codebases.

One thing to consider: if you have entities with multi-byte UTF-8 characters, desc[:120] could potentially slice in the middle of a rune, producing invalid UTF-8. For a "chill" review I won't flag this as critical since it's unlikely to cause user-visible problems, but something to keep in mind.

💡 Optional: Use rune-aware truncation
-		if len(desc) > 120 {
-			desc = desc[:120]
-		}
+		if len(desc) > 120 {
+			runes := []rune(desc)
+			if len(runes) > 120 {
+				desc = string(runes[:120])
+			}
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/build/build.go` around lines 1337 - 1369, The code in
generateSearchIndex truncates descriptions with desc = desc[:120], which can cut
a UTF-8 multi-byte rune and produce invalid UTF-8; change the truncation to be
rune-aware by converting the string to runes (or using utf8/Runes) and slicing
to at most 120 runes before converting back to string, updating the truncation
logic where desc is assigned from e.GetString("description") so the
searchEntry.D value is always valid UTF-8.
internal/archdocs/handler.go (1)

453-482: Path prefix rewriting uses simple string replacement.

The rewritePathPrefix function does straightforward string replacement for common patterns like href="https://github.com/, src="https://github.com/, etc. This works for the generated output, but be aware of edge cases:

  1. If someone has literal text like The path is href="https://github.com/foo" in markdown, it would get rewritten
  2. Single-quoted attributes (href='/') won't be rewritten
  3. Dynamically constructed paths in JS won't be caught

For generated architecture docs this is probably fine since you control the templates. Just something to keep in mind if users ever customize content.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 453 - 482, The current
rewritePathPrefix function does blind string replacements which can incorrectly
modify literal text and misses single-quoted attributes and more complex JS
patterns; update rewritePathPrefix to parse HTML files instead of using
ReplaceAll (use Go's html tokenizer or a DOM library) and only rewrite href/src
attributes (skipping code/pre/text nodes), and for .js files use targeted
patterns (regular expressions) that handle both single and double quotes and
common JS call forms (e.g., fetch('...'), fetch("..."), window.location.href =
'/' and concatenations) so you avoid rewriting arbitrary text; locate the logic
in rewritePathPrefix and replace the simple strings.ReplaceAll calls with
HTML-attribute-aware rewriting for .html and quote-aware regex-based rewriting
for .js.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/archdocs/pssg/config/config.go`:
- Around line 106-108: The default template for entity is incorrect: in
config.go the code that sets cfg.Templates.Entity falls back to "recipe.html"
which is from the old template set; change the fallback to "entity.html" so
cfg.Templates.Entity defaults to the arch-docs expected template. Locate the
defaulting logic that sets cfg.Templates.Entity and replace the literal
"recipe.html" with "entity.html" (ensure any related tests or callers that rely
on the default are updated if necessary).
- Around line 144-160: The code in resolvePaths is resolving most PathsConfig
fields but forgot to resolve cfg.Paths.SourceDir, leaving it relative to the
CWD; update the resolvePaths function to also call resolve on
cfg.Paths.SourceDir (i.e., set cfg.Paths.SourceDir =
resolve(cfg.Paths.SourceDir)) alongside the other path assignments so SourceDir
is anchored to ConfigDir like Data/Templates/Output/Cache/Static.
- Around line 26-33: Normalize the configured site base URL by trimming any
trailing slash once during config load: after validate(&cfg) and before
resolvePaths(&cfg) modify cfg.Site.BaseURL (or the Site.BaseURL field on cfg) to
remove a trailing '/' (e.g., use strings.TrimSuffix or TrimRight) while
preserving empty or root values appropriately so downstream emitters (index.html
canonical, llmstxt, rss, sitemap) receive a consistent base URL.

In `@internal/archdocs/pssg/config/types.go`:
- Around line 87-89: The YAML tag on PaginationConfig is wrong: the struct field
EntitiesPerPage currently uses `yaml:"entities_per_page"` but the handler emits
`per_page`; update the struct tag on PaginationConfig.EntitiesPerPage to
`yaml:"per_page"` so the generated config unmarshals correctly and pagination
values (per_page) are populated.

In `@internal/archdocs/pssg/loader/loader.go`:
- Around line 15-20: The switch on cfg.Data.Format currently always returns
&MarkdownLoader (silently accepting unknown formats); change the logic to fail
fast on unsupported formats by returning a clear error instead of defaulting to
Markdown. Update the function that constructs/loaders (the switch that now
returns &MarkdownLoader) to explicitly handle supported values (e.g.,
"markdown") and return an error like fmt.Errorf("unsupported data format: %s",
cfg.Data.Format) in the default branch; if the function currently has a single
return type, adjust its signature to return (loader, error) so callers can
handle the failure. Ensure the error message includes cfg.Data.Format and that
MarkdownLoader is only returned for the "markdown" case.

In `@internal/archdocs/pssg/loader/markdown.go`:
- Around line 142-165: The extractSection function currently uses strings.Index
with a substring marker which can match inside words or code blocks; change its
logic to scan the body line-by-line: split body into lines, find a line that
exactly matches the H2 heading (line trimmed equals "## "+header or starts with
"## "+header followed only by whitespace), record the index of the next line as
the section start, and then collect subsequent lines until you hit the next line
that begins with "## " (an H2) or EOF; return the trimmed joined lines. Update
references in the function to stop using the marker/Index approach and use the
line-based search to determine start and rest.
- Around line 100-112: The deriveSlug function currently returns an empty slug
when the configured field value normalizes to an empty string, causing output
paths like `.html`; update deriveSlug (using l.Config.Data.EntitySlug.Source and
the fields map) so that after extracting the configured field (field:...), you
call entity.ToSlug on its string value and if the result is empty then fall back
to deriving the slug from the filename: compute base := filepath.Base(path),
slug := entity.ToSlug(strings.TrimSuffix(base, filepath.Ext(base))) and return
that; ensure the filename fallback also goes through entity.ToSlug for
consistency.

In `@internal/archdocs/pssg/output/rss.go`:
- Around line 95-118: The code is double-escaping XML by calling xmlEscape
before marshaling; remove xmlEscape usage and assign raw string values to
rssChannel and rssItem fields so xml.Marshal handles escaping. Specifically,
update the rssChannel construction (Title, Description) to use title and
description directly, and update rssItem assignments (Title, Description,
Category) to use e.GetString(...) and category without xmlEscape; check other
occurrences (e.g., the similar block around the item creation mentioned) and
remove xmlEscape there as well to prevent AT&amp;amp;T-style double-escaping.
- Around line 74-79: The feed generation is hardcoding recipe-specific metadata:
replace the "%s recipes" description and any uses of
"recipe_category"/recipe-specific category fields with values derived from the
configured taxonomy/entity type in cfg (e.g., use cfg.RSS.CategoryTaxonomy or an
entity type field from cfg.Site or cfg.RSS) when calling generateFeed and when
populating item categories (see generateFeed invocation and item/category
generation around generateFeed and any functions that build feed items). Update
the title/description and item category assignments to use the configurable
taxonomy/entity label (and a localized or configured plural form if available)
so arch-docs feeds reflect the configured taxonomy rather than "recipes".

In `@internal/archdocs/pssg/render/funcs.go`:
- Around line 230-257: In sliceHelper, guard against invalid slice bounds by
clamping start and end before any v[start:end] operations: for each type branch
(case []string, []*entity.Entity, []interface{}) normalize start = max(0,
start), end = min(len(v), end), then if end < start set end = start (or set
start = end) so slicing yields an empty slice instead of panicking; update the
slicing logic in sliceHelper to use these clamped values for v[start:end].
- Around line 272-276: The helper function sortStrings currently just copies and
returns the slice without sorting; update the function sortStrings to copy the
input slice into result and then call sort.Strings(result) before returning, and
add the standard "sort" package to the imports so the helper actually returns a
sorted slice for template use.

In `@internal/archdocs/pssg/schema/jsonld.go`:
- Around line 28-131: GenerateRecipeSchema is emitting a Recipe JSON-LD for all
entities (causing code/arch pages to be published as recipes); update
GenerateRecipeSchema (and any callers) to either: 1) produce an
arch-docs-appropriate schema (e.g., change "@type" from "Recipe" to
"TechArticle" or "Article" and remove recipe-specific fields like
"recipeIngredient"/"recipeInstructions"/"recipeYield"/"recipeCuisine"/"isRelatedTo"),
mapping relevant fields (name, description, author, datePublished, image,
keywords) instead, or 2) short-circuit and return nil/empty map for non-recipe
entities until the entity data model supports recipes (check entity methods like
e.GetIngredients(), e.GetInstructions(), e.GetString("title"),
e.GetStringSlice("keywords") to decide). Ensure references to helper symbols
computeTotalTime, GetIngredients, GetInstructions, Schema.ExtraKeywords and
SiteConfig.BaseURL are adjusted or removed accordingly so no recipe-specific
fields are emitted for arch docs.

In `@internal/archdocs/pssg/taxonomy/taxonomy.go`:
- Around line 72-75: The current branch silently skips when
TaxonomyConfig.Invert (tc.Invert) is true, producing empty taxonomies; instead
fail fast: detect tc.Invert and return a clear error (e.g., "Invert taxonomies
not supported") from the taxonomy-building function so callers get immediate
feedback, or if you prefer to implement the feature, replace the continue with
inverse grouping logic that for each possible value v builds a group of entities
that do NOT have v (use existing grouping helpers to collect entities missing
v). Ensure you modify the code path that checks tc.Invert (the tc.Invert branch)
to either return the error or perform the inverse grouping so the public config
surface is not a silent no-op.
- Around line 78-89: The loop that builds the groups map (iterating over values,
computing slug := entity.ToSlug(val), then appending e to groups[slug].Entities)
can append the same entity multiple times if an entity's values contain
duplicate slugs; add a per-entity "seen" set (e.g., map[string]bool) inside the
scope that processes each entity to skip appending when slug has already been
seen for that entity; update the block that creates/uses groups and
Entry.Entities to consult the seen set before calling groups[slug].Entities =
append(...), ensuring each entity is only added once per slug.

In `@internal/archdocs/templates/_main.js`:
- Around line 25-55: The code concatenates unescaped item names into an SVG
string (see variables label, items, svg and the final svgContainer.innerHTML
assignment), allowing XSS; replace string-building with DOM/SVG APIs instead of
innerHTML: create the SVG root with
document.createElementNS('http://www.w3.org/2000/svg', 'svg'), create
rect/text/a/path/polygon nodes with createElementNS, set attributes via
setAttribute, and assign text nodes via textContent (or use a proper escaping
function) so items[i].name is never injected as raw HTML; apply the same fix to
the other occurrence mentioned (lines 414-423) that also builds markup from
repo-derived labels.
- Around line 678-682: The fetch in function loadIndex should use the runtime
__basePath (not absolute "/search-index.json") and after the async load re-run
any existing query so early keystrokes aren't dropped; update loadIndex to
fetch(__basePath + "search-index.json") (or join safely), assign the JSON to
index, and then call the existing search/query function with the current input
value (or trigger the same handler) so resultsEl reflects the user's current
query after the index is loaded.

---

Minor comments:
In `@internal/api/client.go`:
- Around line 56-62: In Client.AnalyzeRaw, after receiving job from
pollUntilComplete, add a guard to fail fast when job.Result is nil or empty:
check job.Result (or len(job.Result) == 0) and return a clear error (e.g.,
fmt.Errorf("AnalyzeRaw: completed job has empty result for zip %s, idempotency
%s", zipPath, idempotencyKey)) instead of returning nil/empty payload; this
ensures callers of AnalyzeRaw get an immediate API-level failure rather than
downstream errors.

In `@internal/archdocs/handler.go`:
- Around line 397-418: In resolveTemplates, the cleanup calls call
os.RemoveAll(tmp) without handling the returned error; update each error path
(after bundledTemplates.ReadDir, bundledTemplates.ReadFile, and os.WriteFile
failures) to explicitly discard the cleanup error (e.g., _ = os.RemoveAll(tmp))
so the linter is satisfied; ensure every occurrence of os.RemoveAll(tmp) in
resolveTemplates is replaced with an explicit ignore of the error.
- Line 313: The linter flags the unchecked error from defer
os.RemoveAll(tmpDir); fix by explicitly discarding the error so it's clear and
linter-friendly: replace the defer call with an explicit discard such as defer
func() { _ = os.RemoveAll(tmpDir) }() (or defer func() { _, _ =
os.RemoveAll(tmpDir) }() ) so the error is intentionally ignored while keeping
the same cleanup behavior for tmpDir and satisfying the linter.

In `@internal/archdocs/pssg/affiliate/affiliate.go`:
- Around line 26-36: In Provider.GenerateLink the local variable encoded is
computed but never used; remove the unused encoded computation (the
url.QueryEscape + strings.ReplaceAll that sets encoded) so only plusEncoded is
kept and used for replacing "{{term}}" in p.URLTemplate (symbols:
Provider.GenerateLink, encoded, plusEncoded, p.URLTemplate, p.Tag);
alternatively if both encodings are required, modify the template and
replacement logic to use distinct placeholders (e.g., "{{term_space}}" ->
encoded and "{{term_plus}}" -> plusEncoded) and perform both strings.ReplaceAll
calls accordingly.

In `@internal/archdocs/pssg/config/types.go`:
- Around line 143-150: TemplatesConfig is missing the all_entities field, so add
a new string field named AllEntities with yaml tag `all_entities` to the
TemplatesConfig struct so the handler-configured template `all_entities:
"all_entities.html"` becomes configurable; update the TemplatesConfig definition
(the struct currently containing Entity, Homepage, Hub, TaxonomyIndex, Letter,
StaticPages) to include AllEntities string `yaml:"all_entities"` so render.go
and the handler will pick up the configured template path.

In `@internal/archdocs/pssg/output/robots.go`:
- Around line 36-37: The code builds sitemapURL using cfg.Site.BaseURL which may
have a trailing slash causing a double-slash; normalize BaseURL first (e.g.,
trim trailing '/' from cfg.Site.BaseURL) before constructing sitemapURL so
fmt.Sprintf("%s/sitemap.xml", ...) always produces a single-slash URL — update
the logic around sitemapURL creation and the lines = append(...) call to use the
trimmed BaseURL (reference cfg.Site.BaseURL and sitemapURL).

In `@internal/archdocs/pssg/taxonomy/taxonomy.go`:
- Around line 252-259: The loop currently uses entry.Name[0] which reads the
first byte; decode the first rune instead so unicode.ToUpper and
unicode.IsLetter operate on the full character (handles accented/non-ASCII
letters). Change the logic around entry.Name, e.g. use
utf8.DecodeRuneInString(entry.Name) or convert to []rune and take the first
rune, then call unicode.ToUpper on that rune and continue using letter =
string(firstRune) for bucket keys; update the branch that checks
unicode.IsLetter(first) accordingly (functions: entry.Name, unicode.ToUpper,
unicode.IsLetter).

In `@internal/archdocs/templates/_header.html`:
- Around line 24-28: The search overlay lacks dialog semantics and an explicit
label: add role="dialog" and aria-modal="true" to the element with class
"search-modal" (or the wrapping container used for the overlay) and provide an
accessible label for the input with id "search-input" (e.g., add
aria-label="Search entities" or reference a visible label via aria-labelledby)
so the search input and modal are announced correctly to assistive technologies.

In `@internal/archdocs/templates/_styles.css`:
- Around line 107-108: Remove the deprecated declaration "word-break:
break-word" and keep only "overflow-wrap: break-word" where it appears;
specifically, locate occurrences in the CSS (the block containing overflow-wrap:
break-word) and delete the "word-break: break-word" line (also remove the
duplicate at the later occurrence noted) so the stylesheet uses only
overflow-wrap for wrapping.

In `@internal/archdocs/templates/index.html`:
- Line 2: The root <html> tag is hard-coded to lang="en" instead of using the
configured site language; update the template to use the populated Site.Language
value (from config.Load) for the lang attribute so the generated pages reflect
non-English sites correctly—locate the root html tag in
internal/archdocs/templates/index.html and replace the literal "en" with the
template expression that outputs Site.Language.

---

Nitpick comments:
In `@internal/archdocs/graph2md/graph2md.go`:
- Around line 588-594: Replace the repeated sb.WriteString(fmt.Sprintf(...))
calls in the frontmatter writer with fmt.Fprintf to write directly to the string
builder (use fmt.Fprintf(sb, "title: %q\n", title), fmt.Fprintf(sb,
"description: %q\n", desc), fmt.Fprintf(sb, "node_type: \"File\"\n"),
fmt.Fprintf(sb, "file_path: %q\n", path), fmt.Fprintf(sb, "file_name: %q\n",
name) and conditionally fmt.Fprintf(sb, "language: %q\n", lang) when lang != "")
so the code uses fmt.Fprintf against sb instead of allocating intermediate
strings via fmt.Sprintf; update all similar frontmatter writes in the same file
(e.g., in graph2md.go frontmatter writer functions) for consistency.

In `@internal/archdocs/handler.go`:
- Around line 453-482: The current rewritePathPrefix function does blind string
replacements which can incorrectly modify literal text and misses single-quoted
attributes and more complex JS patterns; update rewritePathPrefix to parse HTML
files instead of using ReplaceAll (use Go's html tokenizer or a DOM library) and
only rewrite href/src attributes (skipping code/pre/text nodes), and for .js
files use targeted patterns (regular expressions) that handle both single and
double quotes and common JS call forms (e.g., fetch('...'), fetch("..."),
window.location.href = '/' and concatenations) so you avoid rewriting arbitrary
text; locate the logic in rewritePathPrefix and replace the simple
strings.ReplaceAll calls with HTML-attribute-aware rewriting for .html and
quote-aware regex-based rewriting for .js.

In `@internal/archdocs/pssg/build/build.go`:
- Around line 1337-1369: The code in generateSearchIndex truncates descriptions
with desc = desc[:120], which can cut a UTF-8 multi-byte rune and produce
invalid UTF-8; change the truncation to be rune-aware by converting the string
to runes (or using utf8/Runes) and slicing to at most 120 runes before
converting back to string, updating the truncation logic where desc is assigned
from e.GetString("description") so the searchEntry.D value is always valid
UTF-8.

In `@internal/archdocs/pssg/output/llmstxt.go`:
- Around line 28-32: The current code builds a heading using cfg.Data.EntityType
with strings.Title and a literal "s" which produces bad plurals and uses
deprecated strings.Title; replace this with a small helper (e.g.
formatEntityLabel or pluralizeEntityLabel) that: (1) prefers a configured plural
label if available (e.g. cfg.Data.EntityPlural), (2) otherwise applies a minimal
pluralization rule (handle common endings like "s", "x", "ch", "sh" -> add "es",
words ending in "y" -> replace "y" with "ies", otherwise add "s"), and (3)
capitalizes only the first rune using unicode.ToUpper; then use that helper when
building the heading instead of strings.Title+ "s" (update the variable
entityLabel and the fmt.Sprintf call that appends the "## %ss" line).

In `@internal/archdocs/pssg/render/shareimage.go`:
- Around line 28-34: The truncate function currently slices bytes with
s[:max-1], which can split multi-byte UTF-8 characters; change truncate to be
rune-aware (keep the function name truncate and the same signature) by computing
rune length and slicing by runes instead of bytes (e.g., convert to []rune or
iterate runes to build the prefix up to max runes minus one), then append the
ellipsis; ensure you compare rune count (utf8.RuneCountInString) against max and
use rune-based slicing to avoid cutting a multi-byte character in the middle.

In `@internal/archdocs/templates/_head.html`:
- Around line 4-5: Replace the hardcoded root-relative hrefs in the head
template with base-URL-aware Hugo helpers so links work under a subpath;
specifically update the <link rel="alternate" ... href="https://github.com/feed.xml"> and <link
rel="manifest" ... href="https://github.com/manifest.json"> entries to use Hugo URL helpers such
as {{ "feed.xml" | relURL }} (or {{ "feed.xml" | absURL }} if you prefer
absolute) and similarly {{ "manifest.json" | relURL }} so the generated hrefs
honor the configured base URL.

In `@internal/archdocs/templates/index.html`:
- Around line 91-92: The HTML currently pulls D3 from a CDN via the external
script tag referencing d3.min.js; replace this runtime CDN dependency by
vendoring or bundling D3 into the generated site: either copy a local d3.min.js
into the site assets and update the <script
src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script> to point to
that local asset, or update the build that produces main.js (the bundler entry
referenced as /main.js) to include/import d3 so the library is bundled into
main.js and remove the external CDN script tag; ensure the build step that
generates the static output copies the local d3 asset (if used) so the emitted
site is fully self-contained.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9cd95d5-3f2c-4bf9-b90f-ffc8497d1f50

📥 Commits

Reviewing files that changed from the base of the PR and between 6e920e2 and ee6fcf7.

📒 Files selected for processing (36)
  • cmd/archdocs.go
  • internal/analyze/zip.go
  • internal/api/client.go
  • internal/archdocs/graph2md/graph2md.go
  • internal/archdocs/handler.go
  • internal/archdocs/pssg/affiliate/affiliate.go
  • internal/archdocs/pssg/build/build.go
  • internal/archdocs/pssg/config/config.go
  • internal/archdocs/pssg/config/types.go
  • internal/archdocs/pssg/enrichment/cache.go
  • internal/archdocs/pssg/entity/entity.go
  • internal/archdocs/pssg/entity/slug.go
  • internal/archdocs/pssg/loader/loader.go
  • internal/archdocs/pssg/loader/markdown.go
  • internal/archdocs/pssg/output/llmstxt.go
  • internal/archdocs/pssg/output/manifest.go
  • internal/archdocs/pssg/output/robots.go
  • internal/archdocs/pssg/output/rss.go
  • internal/archdocs/pssg/output/sitemap.go
  • internal/archdocs/pssg/render/funcs.go
  • internal/archdocs/pssg/render/render.go
  • internal/archdocs/pssg/render/shareimage.go
  • internal/archdocs/pssg/schema/jsonld.go
  • internal/archdocs/pssg/taxonomy/taxonomy.go
  • internal/archdocs/templates/_footer.html
  • internal/archdocs/templates/_head.html
  • internal/archdocs/templates/_header.html
  • internal/archdocs/templates/_main.js
  • internal/archdocs/templates/_og.html
  • internal/archdocs/templates/_styles.css
  • internal/archdocs/templates/all_entities.html
  • internal/archdocs/templates/entity.html
  • internal/archdocs/templates/hub.html
  • internal/archdocs/templates/index.html
  • internal/archdocs/templates/letter.html
  • internal/archdocs/templates/taxonomy_index.html

Comment on lines +26 to +33
if err := validate(&cfg); err != nil {
return nil, fmt.Errorf("validating config: %w", err)
}

// Resolve relative paths against config directory
resolvePaths(&cfg)

return &cfg, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize site.base_url once during load.

If a user configures https://docs.example.com/, the current builders emit https://docs.example.com//... in places like the canonical tag (internal/archdocs/templates/index.html, Line 7), llms.txt (internal/archdocs/pssg/output/llmstxt.go, Line 44), RSS (internal/archdocs/pssg/output/rss.go, Line 110), and sitemap entries (internal/archdocs/pssg/output/sitemap.go, Line 137). Trimming the trailing slash here keeps every downstream emitter consistent.

Possible fix
 import (
 	"fmt"
 	"os"
 	"path/filepath"
+	"strings"
 
 	"gopkg.in/yaml.v3"
 )
@@
 	if err := validate(&cfg); err != nil {
 		return nil, fmt.Errorf("validating config: %w", err)
 	}
+
+	cfg.Site.BaseURL = strings.TrimRight(cfg.Site.BaseURL, "/")
 
 	// Resolve relative paths against config directory
 	resolvePaths(&cfg)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := validate(&cfg); err != nil {
return nil, fmt.Errorf("validating config: %w", err)
}
// Resolve relative paths against config directory
resolvePaths(&cfg)
return &cfg, nil
if err := validate(&cfg); err != nil {
return nil, fmt.Errorf("validating config: %w", err)
}
cfg.Site.BaseURL = strings.TrimRight(cfg.Site.BaseURL, "/")
// Resolve relative paths against config directory
resolvePaths(&cfg)
return &cfg, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/config/config.go` around lines 26 - 33, Normalize the
configured site base URL by trimming any trailing slash once during config load:
after validate(&cfg) and before resolvePaths(&cfg) modify cfg.Site.BaseURL (or
the Site.BaseURL field on cfg) to remove a trailing '/' (e.g., use
strings.TrimSuffix or TrimRight) while preserving empty or root values
appropriately so downstream emitters (index.html canonical, llmstxt, rss,
sitemap) receive a consistent base URL.

Comment on lines +106 to +108
if cfg.Templates.Entity == "" {
cfg.Templates.Entity = "recipe.html"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Default entity template still points at the old recipe page.

When templates.entity is omitted, this falls back to recipe.html, but the arch-docs template set uses entity.html. That makes the default path brittle and can break rendering for the common case where callers rely on defaults.

Possible fix
 	// Default templates
 	if cfg.Templates.Entity == "" {
-		cfg.Templates.Entity = "recipe.html"
+		cfg.Templates.Entity = "entity.html"
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cfg.Templates.Entity == "" {
cfg.Templates.Entity = "recipe.html"
}
if cfg.Templates.Entity == "" {
cfg.Templates.Entity = "entity.html"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/config/config.go` around lines 106 - 108, The default
template for entity is incorrect: in config.go the code that sets
cfg.Templates.Entity falls back to "recipe.html" which is from the old template
set; change the fallback to "entity.html" so cfg.Templates.Entity defaults to
the arch-docs expected template. Locate the defaulting logic that sets
cfg.Templates.Entity and replace the literal "recipe.html" with "entity.html"
(ensure any related tests or callers that rely on the default are updated if
necessary).

Comment on lines +144 to +160
cfg.Paths.Data = resolve(cfg.Paths.Data)
cfg.Paths.Templates = resolve(cfg.Paths.Templates)
cfg.Paths.Output = resolve(cfg.Paths.Output)
cfg.Paths.Cache = resolve(cfg.Paths.Cache)
if cfg.Paths.Static != "" {
cfg.Paths.Static = resolve(cfg.Paths.Static)
}
if cfg.Enrichment.CacheDir != "" {
cfg.Enrichment.CacheDir = resolve(cfg.Enrichment.CacheDir)
}
if cfg.Extra.Favorites != "" {
cfg.Extra.Favorites = resolve(cfg.Extra.Favorites)
}
if cfg.Extra.Contributors != "" {
cfg.Extra.Contributors = resolve(cfg.Extra.Contributors)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve paths.source_dir too.

PathsConfig includes SourceDir, but resolvePaths leaves it relative to the current working directory while the other path fields are anchored to ConfigDir. A config outside the repo with source_dir: ./repo will point at the wrong tree.

Possible fix
 	cfg.Paths.Data = resolve(cfg.Paths.Data)
 	cfg.Paths.Templates = resolve(cfg.Paths.Templates)
 	cfg.Paths.Output = resolve(cfg.Paths.Output)
 	cfg.Paths.Cache = resolve(cfg.Paths.Cache)
+	if cfg.Paths.SourceDir != "" {
+		cfg.Paths.SourceDir = resolve(cfg.Paths.SourceDir)
+	}
 	if cfg.Paths.Static != "" {
 		cfg.Paths.Static = resolve(cfg.Paths.Static)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cfg.Paths.Data = resolve(cfg.Paths.Data)
cfg.Paths.Templates = resolve(cfg.Paths.Templates)
cfg.Paths.Output = resolve(cfg.Paths.Output)
cfg.Paths.Cache = resolve(cfg.Paths.Cache)
if cfg.Paths.Static != "" {
cfg.Paths.Static = resolve(cfg.Paths.Static)
}
if cfg.Enrichment.CacheDir != "" {
cfg.Enrichment.CacheDir = resolve(cfg.Enrichment.CacheDir)
}
if cfg.Extra.Favorites != "" {
cfg.Extra.Favorites = resolve(cfg.Extra.Favorites)
}
if cfg.Extra.Contributors != "" {
cfg.Extra.Contributors = resolve(cfg.Extra.Contributors)
}
}
cfg.Paths.Data = resolve(cfg.Paths.Data)
cfg.Paths.Templates = resolve(cfg.Paths.Templates)
cfg.Paths.Output = resolve(cfg.Paths.Output)
cfg.Paths.Cache = resolve(cfg.Paths.Cache)
if cfg.Paths.SourceDir != "" {
cfg.Paths.SourceDir = resolve(cfg.Paths.SourceDir)
}
if cfg.Paths.Static != "" {
cfg.Paths.Static = resolve(cfg.Paths.Static)
}
if cfg.Enrichment.CacheDir != "" {
cfg.Enrichment.CacheDir = resolve(cfg.Enrichment.CacheDir)
}
if cfg.Extra.Favorites != "" {
cfg.Extra.Favorites = resolve(cfg.Extra.Favorites)
}
if cfg.Extra.Contributors != "" {
cfg.Extra.Contributors = resolve(cfg.Extra.Contributors)
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/config/config.go` around lines 144 - 160, The code in
resolvePaths is resolving most PathsConfig fields but forgot to resolve
cfg.Paths.SourceDir, leaving it relative to the CWD; update the resolvePaths
function to also call resolve on cfg.Paths.SourceDir (i.e., set
cfg.Paths.SourceDir = resolve(cfg.Paths.SourceDir)) alongside the other path
assignments so SourceDir is anchored to ConfigDir like
Data/Templates/Output/Cache/Static.

Comment on lines +87 to +89
type PaginationConfig struct {
EntitiesPerPage int `yaml:"entities_per_page"`
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

YAML tag mismatch: entities_per_page vs per_page in handler template.

Hey, heads up - there's a mismatch between what the handler writes and what this struct expects:

  • handler.go (line 192-193) writes:
    pagination:
      per_page: 48
  • types.go (line 88) expects:
    EntitiesPerPage int `yaml:"entities_per_page"`

This means pagination won't be loaded from the generated config - EntitiesPerPage will be zero, which could cause division-by-zero or unexpected behavior when calculating page counts.

🔧 Fix: Align YAML tag with handler template
 type PaginationConfig struct {
-	EntitiesPerPage int `yaml:"entities_per_page"`
+	EntitiesPerPage int `yaml:"per_page"`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/config/types.go` around lines 87 - 89, The YAML tag on
PaginationConfig is wrong: the struct field EntitiesPerPage currently uses
`yaml:"entities_per_page"` but the handler emits `per_page`; update the struct
tag on PaginationConfig.EntitiesPerPage to `yaml:"per_page"` so the generated
config unmarshals correctly and pagination values (per_page) are populated.

Comment on lines +15 to +20
switch cfg.Data.Format {
case "markdown":
return &MarkdownLoader{Config: cfg}
default:
return &MarkdownLoader{Config: cfg}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently ignore unsupported data formats

Line 18-19 currently falls back to Markdown for any unknown value, which can hide config typos and produce confusing behavior. This should fail fast.

Suggested patch
 package loader
 
 import (
+	"fmt"
 	"github.com/supermodeltools/cli/internal/archdocs/pssg/config"
 	"github.com/supermodeltools/cli/internal/archdocs/pssg/entity"
 )
@@
-func New(cfg *config.Config) Loader {
+func New(cfg *config.Config) (Loader, error) {
 	switch cfg.Data.Format {
-	case "markdown":
-		return &MarkdownLoader{Config: cfg}
+	case "", "markdown":
+		return &MarkdownLoader{Config: cfg}, nil
 	default:
-		return &MarkdownLoader{Config: cfg}
+		return nil, fmt.Errorf("unsupported data format: %q", cfg.Data.Format)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch cfg.Data.Format {
case "markdown":
return &MarkdownLoader{Config: cfg}
default:
return &MarkdownLoader{Config: cfg}
}
package loader
import (
"fmt"
"github.com/supermodeltools/cli/internal/archdocs/pssg/config"
"github.com/supermodeltools/cli/internal/archdocs/pssg/entity"
)
func New(cfg *config.Config) (Loader, error) {
switch cfg.Data.Format {
case "", "markdown":
return &MarkdownLoader{Config: cfg}, nil
default:
return nil, fmt.Errorf("unsupported data format: %q", cfg.Data.Format)
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/loader/loader.go` around lines 15 - 20, The switch on
cfg.Data.Format currently always returns &MarkdownLoader (silently accepting
unknown formats); change the logic to fail fast on unsupported formats by
returning a clear error instead of defaulting to Markdown. Update the function
that constructs/loaders (the switch that now returns &MarkdownLoader) to
explicitly handle supported values (e.g., "markdown") and return an error like
fmt.Errorf("unsupported data format: %s", cfg.Data.Format) in the default
branch; if the function currently has a single return type, adjust its signature
to return (loader, error) so callers can handle the failure. Ensure the error
message includes cfg.Data.Format and that MarkdownLoader is only returned for
the "markdown" case.

Comment on lines +28 to +131
// GenerateRecipeSchema generates Recipe JSON-LD for an entity.
func (g *Generator) GenerateRecipeSchema(e *entity.Entity, entityURL string) map[string]interface{} {
schema := map[string]interface{}{
"@context": "https://schema.org",
"@type": "Recipe",
"name": e.GetString("title"),
"description": e.GetString("description"),
"url": entityURL,
}

// Author
authorName := e.GetString("author")
if authorName != "" {
authorSlug := entity.ToSlug(authorName)
schema["author"] = map[string]interface{}{
"@type": "Person",
"name": authorName,
"url": fmt.Sprintf("%s/author/%s.html", g.SiteConfig.BaseURL, authorSlug),
}
}

// Date published
schema["datePublished"] = g.Schema.DatePublished

// Times
prepTime := e.GetString("prep_time")
cookTime := e.GetString("cook_time")
if prepTime != "" {
schema["prepTime"] = prepTime
}
if cookTime != "" {
schema["cookTime"] = cookTime
}
if prepTime != "" && cookTime != "" {
schema["totalTime"] = computeTotalTime(prepTime, cookTime)
}

// Servings
if servings := e.GetInt("servings"); servings > 0 {
schema["recipeYield"] = fmt.Sprintf("%d servings", servings)
}

// Category & cuisine
if cat := e.GetString("recipe_category"); cat != "" {
schema["recipeCategory"] = cat
}
if cuisine := e.GetString("cuisine"); cuisine != "" {
schema["recipeCuisine"] = cuisine
}

// Image
if img := e.GetString("image"); img != "" {
schema["image"] = []string{img}
}

// Nutrition
if cal := e.GetInt("calories"); cal > 0 {
schema["nutrition"] = map[string]interface{}{
"@type": "NutritionInformation",
"calories": fmt.Sprintf("%d calories", cal),
}
}

// Ingredients
if ingredients := e.GetIngredients(); len(ingredients) > 0 {
schema["recipeIngredient"] = ingredients
}

// Instructions as HowToSteps
if instructions := e.GetInstructions(); len(instructions) > 0 {
var steps []map[string]interface{}
for i, inst := range instructions {
steps = append(steps, map[string]interface{}{
"@type": "HowToStep",
"text": inst,
"name": stepName(inst),
"position": i + 1,
})
}
schema["recipeInstructions"] = steps
}

// Keywords
keywords := e.GetStringSlice("keywords")
extra := g.Schema.ExtraKeywords
allKeywords := append(keywords, extra...)
if len(allKeywords) > 0 {
schema["keywords"] = strings.Join(allKeywords, ", ")
}

// Pairings as isRelatedTo
if pairings := e.GetStringSlice("pairings"); len(pairings) > 0 {
var related []map[string]interface{}
for _, slug := range pairings {
related = append(related, map[string]interface{}{
"@type": "Recipe",
"name": slug, // Will be resolved to title by the builder
"url": fmt.Sprintf("%s/%s.html", g.SiteConfig.BaseURL, slug),
})
}
schema["isRelatedTo"] = related
}

return schema
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This still emits recipe structured data for architecture pages.

This PR builds architecture docs, but this function hard-codes Recipe fields like recipeIngredient, recipeInstructions, servings, and cuisine. If it is wired into entity pages, code entities get published as recipes. Please swap this to an arch-docs-specific schema or disable entity JSON-LD until the data model is adapted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/schema/jsonld.go` around lines 28 - 131,
GenerateRecipeSchema is emitting a Recipe JSON-LD for all entities (causing
code/arch pages to be published as recipes); update GenerateRecipeSchema (and
any callers) to either: 1) produce an arch-docs-appropriate schema (e.g., change
"@type" from "Recipe" to "TechArticle" or "Article" and remove recipe-specific
fields like
"recipeIngredient"/"recipeInstructions"/"recipeYield"/"recipeCuisine"/"isRelatedTo"),
mapping relevant fields (name, description, author, datePublished, image,
keywords) instead, or 2) short-circuit and return nil/empty map for non-recipe
entities until the entity data model supports recipes (check entity methods like
e.GetIngredients(), e.GetInstructions(), e.GetString("title"),
e.GetStringSlice("keywords") to decide). Ensure references to helper symbols
computeTotalTime, GetIngredients, GetInstructions, Schema.ExtraKeywords and
SiteConfig.BaseURL are adjusted or removed accordingly so no recipe-specific
fields are emitted for arch docs.

Comment on lines +72 to +75
if tc.Invert {
// Invert mode: for each possible value, add entities that DON'T have it
// Not commonly used - skip for now, handled separately if needed
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently no-op Invert taxonomies.

TaxonomyConfig.Invert is part of the public config surface, but this branch just continues. Any config that enables it will generate an empty taxonomy with no warning. Either implement the inverse grouping here or fail fast when tc.Invert is set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy.go` around lines 72 - 75, The
current branch silently skips when TaxonomyConfig.Invert (tc.Invert) is true,
producing empty taxonomies; instead fail fast: detect tc.Invert and return a
clear error (e.g., "Invert taxonomies not supported") from the taxonomy-building
function so callers get immediate feedback, or if you prefer to implement the
feature, replace the continue with inverse grouping logic that for each possible
value v builds a group of entities that do NOT have v (use existing grouping
helpers to collect entities missing v). Ensure you modify the code path that
checks tc.Invert (the tc.Invert branch) to either return the error or perform
the inverse grouping so the public config surface is not a silent no-op.

Comment on lines +78 to +89
for _, val := range values {
slug := entity.ToSlug(val)
if slug == "" {
continue
}
if _, ok := groups[slug]; !ok {
groups[slug] = &Entry{
Name: val,
Slug: slug,
}
}
groups[slug].Entities = append(groups[slug].Entities, e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deduplicate values per entity before counting them.

If one entity yields ["go", "go"], this loop appends that same entity twice to the go entry. The result is inflated counts/pagination and duplicated entries on hub pages. Keep a per-entity seen set keyed by slug before appending.

💡 Proposed fix
-		for _, val := range values {
+		seen := make(map[string]struct{}, len(values))
+		for _, val := range values {
 			slug := entity.ToSlug(val)
 			if slug == "" {
 				continue
 			}
+			if _, dup := seen[slug]; dup {
+				continue
+			}
+			seen[slug] = struct{}{}
 			if _, ok := groups[slug]; !ok {
 				groups[slug] = &Entry{
 					Name: val,
 					Slug: slug,
 				}
 			}
 			groups[slug].Entities = append(groups[slug].Entities, e)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, val := range values {
slug := entity.ToSlug(val)
if slug == "" {
continue
}
if _, ok := groups[slug]; !ok {
groups[slug] = &Entry{
Name: val,
Slug: slug,
}
}
groups[slug].Entities = append(groups[slug].Entities, e)
seen := make(map[string]struct{}, len(values))
for _, val := range values {
slug := entity.ToSlug(val)
if slug == "" {
continue
}
if _, dup := seen[slug]; dup {
continue
}
seen[slug] = struct{}{}
if _, ok := groups[slug]; !ok {
groups[slug] = &Entry{
Name: val,
Slug: slug,
}
}
groups[slug].Entities = append(groups[slug].Entities, e)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/taxonomy/taxonomy.go` around lines 78 - 89, The loop
that builds the groups map (iterating over values, computing slug :=
entity.ToSlug(val), then appending e to groups[slug].Entities) can append the
same entity multiple times if an entity's values contain duplicate slugs; add a
per-entity "seen" set (e.g., map[string]bool) inside the scope that processes
each entity to skip appending when slug has already been seen for that entity;
update the block that creates/uses groups and Entry.Entities to consult the seen
set before calling groups[slug].Entities = append(...), ensuring each entity is
only added once per slug.

Comment on lines +25 to +55
var svg = '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 ' + totalW + ' ' + totalH + '" style="max-height:60px">';

for (var i = 0; i < items.length; i++) {
var x = pad + i * (boxW + arrowW);
var y = pad;
var isLast = i === items.length - 1;
var fill = isLast ? "#71B9BC" : "#161616";
var stroke = isLast ? "#8CC6C9" : "#202020";
var textColor = isLast ? "#fff" : "#FFFFFF";
var label = items[i].name || "";
if (label.length > 16) label = label.substring(0, 14) + "..";

if (items[i].slug && !isLast) {
svg += '<a href="' + __basePath + '/' + items[i].slug + '.html">';
}
svg += '<rect x="' + x + '" y="' + y + '" width="' + boxW + '" height="' + boxH + '" rx="6" fill="' + fill + '" stroke="' + stroke + '" stroke-width="1"/>';
svg += '<text x="' + (x + boxW / 2) + '" y="' + (y + boxH / 2 + 5) + '" text-anchor="middle" fill="' + textColor + '" font-size="12" font-family="Public Sans,system-ui,sans-serif">' + label + '</text>';
if (items[i].slug && !isLast) {
svg += '</a>';
}

if (i < items.length - 1) {
var ax = x + boxW + 4;
var ay = y + boxH / 2;
svg += '<path d="M' + ax + ' ' + ay + ' L' + (ax + arrowW - 8) + ' ' + ay + '" stroke="#202020" stroke-width="1.5" fill="none"/>';
svg += '<polygon points="' + (ax + arrowW - 8) + ',' + (ay - 4) + ' ' + (ax + arrowW - 2) + ',' + ay + ' ' + (ax + arrowW - 8) + ',' + (ay + 4) + '" fill="#202020"/>';
}
}

svg += '</svg>';
svgContainer.innerHTML = svg;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Escape repo-derived labels before writing them into the DOM.

These two spots build markup with raw name values and then push it through innerHTML. A filename or taxonomy label like foo</span><script>...</script> becomes live markup in the generated site. Please use DOM APIs (textContent, createElement, createElementNS) or escape the text first.

Also applies to: 414-423

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/_main.js` around lines 25 - 55, The code
concatenates unescaped item names into an SVG string (see variables label,
items, svg and the final svgContainer.innerHTML assignment), allowing XSS;
replace string-building with DOM/SVG APIs instead of innerHTML: create the SVG
root with document.createElementNS('http://www.w3.org/2000/svg', 'svg'), create
rect/text/a/path/polygon nodes with createElementNS, set attributes via
setAttribute, and assign text nodes via textContent (or use a proper escaping
function) so items[i].name is never injected as raw HTML; apply the same fix to
the other occurrence mentioned (lines 414-423) that also builds markup from
repo-derived labels.

Comment on lines +678 to +682
function loadIndex() {
fetch("/search-index.json")
.then(function(r) { return r.json(); })
.then(function(data) { index = data; })
.catch(function() { resultsEl.innerHTML = '<div class="search-no-results">Failed to load search index.</div>'; });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use __basePath for the search index request.

This is the one runtime path here that still bypasses __basePath. If the site is served from /docs, Line 679 asks the domain root for /search-index.json, so search just looks broken. While you’re here, rerun the current query after the async load so the first keystrokes don’t get dropped.

💡 Proposed fix
   function loadIndex() {
-    fetch("/search-index.json")
+    fetch(__basePath + "/search-index.json")
       .then(function(r) { return r.json(); })
-      .then(function(data) { index = data; })
+      .then(function(data) {
+        index = data;
+        if (input.value.trim()) search(input.value.trim());
+      })
       .catch(function() { resultsEl.innerHTML = '<div class="search-no-results">Failed to load search index.</div>'; });
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function loadIndex() {
fetch("/search-index.json")
.then(function(r) { return r.json(); })
.then(function(data) { index = data; })
.catch(function() { resultsEl.innerHTML = '<div class="search-no-results">Failed to load search index.</div>'; });
function loadIndex() {
fetch(__basePath + "/search-index.json")
.then(function(r) { return r.json(); })
.then(function(data) {
index = data;
if (input.value.trim()) search(input.value.trim());
})
.catch(function() { resultsEl.innerHTML = '<div class="search-no-results">Failed to load search index.</div>'; });
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/templates/_main.js` around lines 678 - 682, The fetch in
function loadIndex should use the runtime __basePath (not absolute
"/search-index.json") and after the async load re-run any existing query so
early keystrokes aren't dropped; update loadIndex to fetch(__basePath +
"search-index.json") (or join safely), assign the JSON to index, and then call
the existing search/query function with the current input value (or trigger the
same handler) so resultsEl reflects the user's current query after the index is
loaded.

The vertical slice check enforces that command slices must not import
each other. internal/archdocs was importing internal/analyze solely for
CreateZip — matching the pattern used by focus, find, and mcp which each
have their own zip.go. Add internal/archdocs/zip.go with the same logic
and revert the CreateZip export from internal/analyze.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (2)
internal/archdocs/pssg/render/funcs.go (2)

282-285: ⚠️ Potential issue | 🟠 Major

Still present: sort helper never sorts.

sortStrings copies the slice and returns it untouched, so any template using {{sort ...}} gets the original order back. That makes the helper silently lie about what it does.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/render/funcs.go` around lines 282 - 285, The
sortStrings helper currently just copies the input slice and returns it
unsorted; update the function sortStrings to sort the copied slice before
returning (e.g., call sort.Strings(result)) and add the "sort" package import so
the function actually returns a sorted []string; keep the copy to avoid mutating
the original slice.

240-265: ⚠️ Potential issue | 🟠 Major

Still present: clamp slice bounds before slicing.

If start > len(v) or end < start, this still reaches v[start:end] and panics page rendering. Bad pagination input should fail soft and return an empty slice here instead of killing the build.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/render/funcs.go` around lines 240 - 265, In
sliceHelper, the code clamps negative start and end beyond len but doesn't
handle cases where start > len(v) or end < start, which leads to panics when
evaluating v[start:end]; update sliceHelper for each type branch ([]string,
[]*entity.Entity, []interface{}) to check and return an empty slice when start
>= len(v) or end <= start (after clamping), and otherwise clamp end to len(v)
and return the slice; ensure the logic is consistent across all three type cases
so bad pagination inputs yield an empty slice instead of panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/archdocs/handler.go`:
- Around line 419-435: The current writePssgConfig function builds YAML via
fmt.Sprintf using pssgConfigTemplate which breaks on quoted strings and Windows
paths; replace the manual interpolation by constructing a typed config value
(matching what pssgconfig.Load expects) and serialize it with a YAML encoder
(e.g., yaml.Marshal or yaml.NewEncoder) instead of fmt.Sprintf, then write the
encoded bytes to path (preserving the intended file mode like 0o600); locate
writePssgConfig and pssgConfigTemplate to derive the struct fields (site.name,
site.base_url, site.repo_url, site.description, paths.*, rss.*, llms_txt.*) so
values such as siteName, baseURL, repoURL, repoName, contentDir, tplDir,
outputDir, sourceDir are safely escaped when persisted.
- Around line 372-380: deriveRepoInfo currently returns filepath.Base(dir) and
empty repoURL when repoSlug is empty, so it never inspects the git remotes;
update deriveRepoInfo to, when repoSlug == "", attempt to read the repository's
remote URL (e.g., run git remote get-url origin or read .git config) from the
directory `dir`, parse that URL to extract the GitHub slug (owner/repo) and
canonical https repoURL, and return those values; if no remote or parsing fails,
fall back to filepath.Base(dir) and empty repoURL. Ensure parsing handles
git@github.com:owner/repo.git, https://github.com/owner/repo(.git) and similar
formats and reference deriveRepoInfo, repoSlug and dir in the change.
- Around line 278-300: The flags --max-source-files and --force are ineffective
because opts.MaxSourceFiles is set but never passed to createZip and opts.Force
is not reflected in the idempotency key or API call; fix by (1) modifying
createZip usage/signature (and its implementation) to accept the MaxSourceFiles
value (e.g., call createZip(absDir, opts.MaxSourceFiles) or add a
CreateZipWithLimit function) so the zip generation respects opts.MaxSourceFiles,
and (2) ensure opts.Force changes the idempotency behavior used when calling
cache.HashFile / AnalyzeRaw (for example, incorporate the force flag into the
idempotency key string passed to AnalyzeRaw like "archdocs-"+hash[:16]+"-force"
or add a force-specific query/body parameter to client.AnalyzeRaw) so a forced
run bypasses/invalidates cached analysis; update references to createZip,
cache.HashFile, and client.AnalyzeRaw accordingly.
- Around line 452-480: The rewritePathPrefix function currently performs raw
string replacements that alter displayed code samples; change it to avoid blind
ReplaceAlls: for .html files, parse the document (e.g. using
golang.org/x/net/html), walk nodes and only rewrite attribute values (href, src,
window.location assignments inside inline script nodes if you choose) when the
attribute value is a root-relative string starting with "/" and the node is not
inside a <pre>, <code>, <textarea> or <script type="text/plain"> block; do not
perform global string replacements on the entire file content. Remove or stop
rewriting .js files with naive ReplaceAll; if you must handle .js, use a real JS
AST/token approach to only rewrite string literals passed to fetch/assignment
expressions (or conservatively skip .js and let URL generation handle prefixes).
Update rewritePathPrefix to read, parse, mutate only targeted attributes/AST
nodes and write back only when changes occur.

In `@internal/archdocs/pssg/build/build.go`:
- Around line 342-343: The build is still using recipe-era artifacts (calling
GenerateRecipeSchema and populating recipeSchema, reading recipe_category and
hard-coded strings like "All Recipes"/"Browse ... recipes") which are absent on
architecture graph entities and produce bad metadata and empty chart/share-image
inputs; replace that path so that when the entity's pssg.yaml declares
SoftwareSourceCode the builder does not call GenerateRecipeSchema but instead
builds structured metadata from the SoftwareSourceCode fields (create or call a
GenerateSoftwareSourceSchema/GenerateCodeSchema helper and populate
title/description/image/chart inputs from SoftwareSourceCode properties), and
remove use of recipe_category and the "All Recipes"/"Browse ... recipes"
strings; apply the same conditional fix to the other occurrences mentioned (the
blocks around where recipeSchema is used and the ranges noted) so arch-docs
output matches pssg.yaml (SoftwareSourceCode) and chart/share-image inputs are
filled.
- Around line 463-473: The code that extracts sourceCode from the file can panic
when end_line < start_line or when either bound is outside the file length;
update the logic around e.GetString("file_path") / e.GetInt("start_line") /
e.GetInt("end_line") so you validate and clamp both sl and el to the range [1,
len(lines)], enforce el >= sl before slicing, and only perform sourceCode =
strings.Join(lines[sl-1:el], "\n") when the sanitized indices are valid;
reference the variables/fullPath/lines/sourceCode to locate and update the slice
safety checks.

In `@internal/archdocs/zip.go`:
- Around line 14-29: The walkZip implementation is only skipping dot-files and
configured names for files, so hidden directories (like .github) and the
configured output directory (arch-docs-output) still get walked and zipped;
update walkZip to check directory entries as well: when iterating entries in
walkZip (see function walkZip and the skipDirs map), skip any entry whose name
begins with "." or whose name exists in skipDirs, and also explicitly skip the
configured output directory name (arch-docs-output or the variable used for the
output dir) so previous outputs aren’t included; apply the same directory-skip
logic in the other related loop referenced around lines 82-110 to ensure
directories are excluded consistently.

---

Duplicate comments:
In `@internal/archdocs/pssg/render/funcs.go`:
- Around line 282-285: The sortStrings helper currently just copies the input
slice and returns it unsorted; update the function sortStrings to sort the
copied slice before returning (e.g., call sort.Strings(result)) and add the
"sort" package import so the function actually returns a sorted []string; keep
the copy to avoid mutating the original slice.
- Around line 240-265: In sliceHelper, the code clamps negative start and end
beyond len but doesn't handle cases where start > len(v) or end < start, which
leads to panics when evaluating v[start:end]; update sliceHelper for each type
branch ([]string, []*entity.Entity, []interface{}) to check and return an empty
slice when start >= len(v) or end <= start (after clamping), and otherwise clamp
end to len(v) and return the slice; ensure the logic is consistent across all
three type cases so bad pagination inputs yield an empty slice instead of
panicking.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 93c73550-b148-478c-8ea8-2481baeae383

📥 Commits

Reviewing files that changed from the base of the PR and between ee6fcf7 and 21fdc5d.

📒 Files selected for processing (11)
  • .golangci.yaml
  • cmd/archdocs.go
  • internal/archdocs/graph2md/graph2md.go
  • internal/archdocs/handler.go
  • internal/archdocs/pssg/build/build.go
  • internal/archdocs/pssg/config/types.go
  • internal/archdocs/pssg/output/sitemap.go
  • internal/archdocs/pssg/render/funcs.go
  • internal/archdocs/pssg/render/render.go
  • internal/archdocs/pssg/render/shareimage.go
  • internal/archdocs/zip.go
✅ Files skipped from review due to trivial changes (1)
  • internal/archdocs/pssg/config/types.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/archdocs/pssg/render/shareimage.go
  • internal/archdocs/graph2md/graph2md.go

Comment on lines +278 to +300
if opts.MaxSourceFiles == 0 {
opts.MaxSourceFiles = 3000
}
if opts.MaxEntities == 0 {
opts.MaxEntities = 12000
}

ui.Step("Creating repository archive…")
zipPath, err := createZip(absDir)
if err != nil {
return fmt.Errorf("create archive: %w", err)
}
defer os.Remove(zipPath)

// Use zip hash as idempotency key (matches existing CLI cache key style)
hash, err := cache.HashFile(zipPath)
if err != nil {
return fmt.Errorf("hash archive: %w", err)
}

client := api.New(cfg)
spin := ui.Start("Uploading and analyzing repository…")
rawResult, err := client.AnalyzeRaw(ctx, zipPath, "archdocs-"+hash[:16])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--force and --max-source-files are no-ops right now.

opts.MaxSourceFiles is defaulted here but never reaches createZip, and opts.Force never changes the hash/idempotency key or API call. Users can flip both flags and get the exact same behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 278 - 300, The flags
--max-source-files and --force are ineffective because opts.MaxSourceFiles is
set but never passed to createZip and opts.Force is not reflected in the
idempotency key or API call; fix by (1) modifying createZip usage/signature (and
its implementation) to accept the MaxSourceFiles value (e.g., call
createZip(absDir, opts.MaxSourceFiles) or add a CreateZipWithLimit function) so
the zip generation respects opts.MaxSourceFiles, and (2) ensure opts.Force
changes the idempotency behavior used when calling cache.HashFile / AnalyzeRaw
(for example, incorporate the force flag into the idempotency key string passed
to AnalyzeRaw like "archdocs-"+hash[:16]+"-force" or add a force-specific
query/body parameter to client.AnalyzeRaw) so a forced run bypasses/invalidates
cached analysis; update references to createZip, cache.HashFile, and
client.AnalyzeRaw accordingly.

Comment on lines +372 to +380
func deriveRepoInfo(repoSlug, dir string) (name, repoURL string) {
if repoSlug != "" {
parts := strings.SplitN(repoSlug, "/", 2)
if len(parts) == 2 {
return parts[1], "https://github.com/" + repoSlug
}
return repoSlug, ""
}
return filepath.Base(dir), ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--repo is not actually inferred yet.

When the flag is omitted, this falls back to filepath.Base(dir) and an empty repoURL; it never inspects the git remote. That drops GitHub/source links for the common “run inside a clone” case, even though the option is documented as inferable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 372 - 380, deriveRepoInfo
currently returns filepath.Base(dir) and empty repoURL when repoSlug is empty,
so it never inspects the git remotes; update deriveRepoInfo to, when repoSlug ==
"", attempt to read the repository's remote URL (e.g., run git remote get-url
origin or read .git config) from the directory `dir`, parse that URL to extract
the GitHub slug (owner/repo) and canonical https repoURL, and return those
values; if no remote or parsing fails, fall back to filepath.Base(dir) and empty
repoURL. Ensure parsing handles git@github.com:owner/repo.git,
https://github.com/owner/repo(.git) and similar formats and reference
deriveRepoInfo, repoSlug and dir in the change.

Comment on lines +419 to +435
// writePssgConfig generates a pssg.yaml configuration file from the template.
func writePssgConfig(path, siteName, baseURL, repoURL, repoName, contentDir, tplDir, outputDir, sourceDir string) error {
content := fmt.Sprintf(pssgConfigTemplate,
siteName, // site.name
baseURL, // site.base_url
repoURL, // site.repo_url
repoName, // site.description
contentDir, // paths.data
tplDir, // paths.templates
outputDir, // paths.output
sourceDir, // paths.source_dir
siteName, // rss.title
repoName, // rss.description
siteName, // llms_txt.title
repoName, // llms_txt.description
)
return os.WriteFile(path, []byte(content), 0o600)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hand-built YAML will break on quoted input and Windows paths.

writePssgConfig interpolates raw strings into YAML. A site name like Acme "Core" or a path like C:\tmp\docs will produce invalid or re-escaped values, which makes pssgconfig.Load brittle cross-platform. Please serialize a config struct with the YAML encoder instead of fmt.Sprintf.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 419 - 435, The current
writePssgConfig function builds YAML via fmt.Sprintf using pssgConfigTemplate
which breaks on quoted strings and Windows paths; replace the manual
interpolation by constructing a typed config value (matching what
pssgconfig.Load expects) and serialize it with a YAML encoder (e.g.,
yaml.Marshal or yaml.NewEncoder) instead of fmt.Sprintf, then write the encoded
bytes to path (preserving the intended file mode like 0o600); locate
writePssgConfig and pssgConfigTemplate to derive the struct fields (site.name,
site.base_url, site.repo_url, site.description, paths.*, rss.*, llms_txt.*) so
values such as siteName, baseURL, repoURL, repoName, contentDir, tplDir,
outputDir, sourceDir are safely escaped when persisted.

Comment on lines +452 to +480
// rewritePathPrefix rewrites root-relative paths in HTML and JS files to
// include prefix. Required for subdirectory deployments (e.g. GitHub Pages).
func rewritePathPrefix(dir, prefix string) error {
return filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
if err != nil || info.IsDir() {
return err
}
ext := strings.ToLower(filepath.Ext(path))
if ext != ".html" && ext != ".js" {
return nil
}
data, err := os.ReadFile(path)
if err != nil {
return nil
}
content := string(data)
original := content
content = strings.ReplaceAll(content, `href="/`, `href="`+prefix+`/`)
content = strings.ReplaceAll(content, `src="/`, `src="`+prefix+`/`)
content = strings.ReplaceAll(content, `fetch("/`, `fetch("`+prefix+`/`)
content = strings.ReplaceAll(content, `window.location.href = "/"`, `window.location.href = "`+prefix+`/"`)
content = strings.ReplaceAll(content, `window.location.href = "/" + `, `window.location.href = "`+prefix+`/" + `)
if content != original {
if err := os.WriteFile(path, []byte(content), info.Mode()); err != nil {
return fmt.Errorf("writing %s: %w", path, err)
}
}
return nil
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Raw string rewriting will mutate rendered code samples.

This pass blindly replaces href="https://github.com/, src="https://github.com/, and fetch("/ in every .html and .js file. Because entity pages embed source snippets, displayed code like fetch("/api") gets rewritten as if it were a real asset URL. Prefix handling needs to happen when URLs are generated, not via a global text replace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/handler.go` around lines 452 - 480, The rewritePathPrefix
function currently performs raw string replacements that alter displayed code
samples; change it to avoid blind ReplaceAlls: for .html files, parse the
document (e.g. using golang.org/x/net/html), walk nodes and only rewrite
attribute values (href, src, window.location assignments inside inline script
nodes if you choose) when the attribute value is a root-relative string starting
with "/" and the node is not inside a <pre>, <code>, <textarea> or <script
type="text/plain"> block; do not perform global string replacements on the
entire file content. Remove or stop rewriting .js files with naive ReplaceAll;
if you must handle .js, use a real JS AST/token approach to only rewrite string
literals passed to fetch/assignment expressions (or conservatively skip .js and
let URL generation handle prefixes). Update rewritePathPrefix to read, parse,
mutate only targeted attributes/AST nodes and write back only when changes
occur.

Comment on lines +342 to +343
recipeSchema := schemaGen.GenerateRecipeSchema(e, entityURL)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Recipe-era fields and copy are still wired into the arch-docs build.

This path still depends on GenerateRecipeSchema, recipe_category, and strings like "All Recipes" / "Browse ... recipes". On architecture graph data those fields are absent or semantically wrong, so the generated metadata is mislabeled and the chart/share-image inputs go empty. This also disagrees with the generated pssg.yaml, which declares SoftwareSourceCode structured data.

Also applies to: 358-380, 564-569, 681-681, 910-915, 983-1013

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/build/build.go` around lines 342 - 343, The build is
still using recipe-era artifacts (calling GenerateRecipeSchema and populating
recipeSchema, reading recipe_category and hard-coded strings like "All
Recipes"/"Browse ... recipes") which are absent on architecture graph entities
and produce bad metadata and empty chart/share-image inputs; replace that path
so that when the entity's pssg.yaml declares SoftwareSourceCode the builder does
not call GenerateRecipeSchema but instead builds structured metadata from the
SoftwareSourceCode fields (create or call a
GenerateSoftwareSourceSchema/GenerateCodeSchema helper and populate
title/description/image/chart inputs from SoftwareSourceCode properties), and
remove use of recipe_category and the "All Recipes"/"Browse ... recipes"
strings; apply the same conditional fix to the other occurrences mentioned (the
blocks around where recipeSchema is used and the ranges noted) so arch-docs
output matches pssg.yaml (SoftwareSourceCode) and chart/share-image inputs are
filled.

Comment on lines +463 to +473
if filePath := e.GetString("file_path"); filePath != "" {
if sl := e.GetInt("start_line"); sl > 0 {
if el := e.GetInt("end_line"); el > 0 {
sourceDir := b.cfg.Paths.SourceDir
if sourceDir != "" {
fullPath := filepath.Join(sourceDir, filePath)
if data, err := os.ReadFile(fullPath); err == nil {
lines := strings.Split(string(data), "\n")
if sl <= len(lines) && el <= len(lines) {
sourceCode = strings.Join(lines[sl-1:el], "\n")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate source line bounds before slicing.

If the API ever returns end_line < start_line, lines[sl-1:el] panics and takes down the build worker. Clamp both bounds and require el >= sl before slicing.

💡 Minimal fix
-						if sl <= len(lines) && el <= len(lines) {
+						if sl > 0 && el >= sl && sl <= len(lines) && el <= len(lines) {
 							sourceCode = strings.Join(lines[sl-1:el], "\n")
 						}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/pssg/build/build.go` around lines 463 - 473, The code that
extracts sourceCode from the file can panic when end_line < start_line or when
either bound is outside the file length; update the logic around
e.GetString("file_path") / e.GetInt("start_line") / e.GetInt("end_line") so you
validate and clamp both sl and el to the range [1, len(lines)], enforce el >= sl
before slicing, and only perform sourceCode = strings.Join(lines[sl-1:el], "\n")
when the sanitized indices are valid; reference the
variables/fullPath/lines/sourceCode to locate and update the slice safety
checks.

Comment on lines +14 to +29
var skipDirs = map[string]bool{
".git": true,
"node_modules": true,
"vendor": true,
"__pycache__": true,
".venv": true,
"venv": true,
"dist": true,
"build": true,
"target": true,
".next": true,
".nuxt": true,
"coverage": true,
".terraform": true,
".tox": true,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback zipping still walks hidden dirs and previous build output.

In the walkZip path, dot-directories are not skipped because the . check only runs for files, and there is no way to exclude the configured output directory. Re-running against a non-git repo can zip .github/ or an existing arch-docs-output/ back into the next analysis, which is both noisy and risky.

Also applies to: 82-110

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/archdocs/zip.go` around lines 14 - 29, The walkZip implementation is
only skipping dot-files and configured names for files, so hidden directories
(like .github) and the configured output directory (arch-docs-output) still get
walked and zipped; update walkZip to check directory entries as well: when
iterating entries in walkZip (see function walkZip and the skipDirs map), skip
any entry whose name begins with "." or whose name exists in skipDirs, and also
explicitly skip the configured output directory name (arch-docs-output or the
variable used for the output dir) so previous outputs aren’t included; apply the
same directory-skip logic in the other related loop referenced around lines
82-110 to ensure directories are excluded consistently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greynewell greynewell merged commit 2a46bf0 into main Apr 2, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant