From 6915b8cd5daa1cca3db6e5dd11fd3afab4dbb0b5 Mon Sep 17 00:00:00 2001 From: Juliusz Sompolski Date: Tue, 28 Apr 2026 09:44:56 +0000 Subject: [PATCH 1/3] Raise unidoc -Xmaxerrs/-Xmaxwarns and add -verbose to surface real failures The unidoc step in CI sometimes failed with the SPARK-56630 (https://github.com/apache/spark/pull/55548) banner reporting "Javadoc exited but no class HTML generation was in progress" or with no actionable diagnostic at all. Two javadoc defaults were masking the underlying causes: 1. `-Xmaxerrs 100`: javadoc bails during source loading once the cumulative count of benign genjavadoc-stub errors crosses 100, before any HTML is generated. Every Spark unidoc run produces ~100 such errors (`error: cannot find symbol` on type variables, `error: illegal combination of modifiers: abstract and static`) -- the SPARK-56630 PR description documents that these are inert. When the count tips past 100 the build fails with a wall of those errors and no Generating .html line for the SPARK-56630 banner to point at. 2. `-Xmaxwarns 100`: even when javadoc completes HTML generation, the doclint warnings on a full Spark unidoc run number in the tens of thousands (`no comment`, `empty

tag`, `no @return`, `no @param ...`). Anything past the first 100 is silently dropped, including per-link `error: reference not found` lines that share the warn stream. Setting both ceilings to 999999 keeps javadoc producing output past the real volume so the SPARK-56630 banner can identify the crashing class. `-verbose` makes javadoc emit a `.java:: error: reference not found` line for every broken {@link} during HTML generation. Without it, javadoc tracks reference errors in its internal counter and reports the bulk total in the final ` errors / warnings` summary, but does not print a file:line for each one. The flag also dumps "Loading source file ..." progress lines and grows the unidoc log by an order of magnitude; that is the price of being able to debug reference errors at all from CI logs. This change does not silence any failure -- javadoc still exits non-zero when there are real errors. It only removes the noise / clipping masks. --- project/SparkBuild.scala | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 866a535c6d951..0cd2fa5613b5a 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -1699,7 +1699,10 @@ object Unidoc { "-tag", "todo:X", "-tag", "groupname:X", "-tag", "inheritdoc", - "--ignore-source-errors", "-notree" + "--ignore-source-errors", "-notree", + "-Xmaxerrs", "999999", + "-Xmaxwarns", "999999", + "-verbose" ) }, From 25e7d0ff9c843bd7c9c6a60c9e3ba8095ffdea9e Mon Sep 17 00:00:00 2001 From: Juliusz Sompolski Date: Tue, 28 Apr 2026 09:45:09 +0000 Subject: [PATCH 2/3] Extend unidoc diagnostic banner to surface reference-not-found errors The diagnostic banner added in SPARK-56630 (https://github.com/apache/spark/pull/55548) names the class javadoc was rendering when it crashed mid-HTML-generation. With -verbose now enabled in JavaUnidoc / unidoc / javacOptions, javadoc also emits a per-error line of the form .java:: error: reference not found for every broken {@link} it can't resolve during HTML generation, and the build then fails on the non-zero error count even though all HTML files were produced. This commit makes the banner scan the captured log for those messages and list them in the diagnostic output, so the developer sees the file:line of each broken {@link} alongside the existing class-crash hint. A short note on the most common cause is included in the banner: [[Class.member]] in scaladoc when Class is a regular class/trait (not a Scala object) trips javadoc's inner-class lookup; the fix is to use [[Class#member]] (the Javadoc-canonical member separator), which genjavadoc passes through unchanged. --- docs/_plugins/build_api_docs.rb | 51 +++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 5 deletions(-) diff --git a/docs/_plugins/build_api_docs.rb b/docs/_plugins/build_api_docs.rb index e6719c4bed7e3..6e68b8e98c465 100644 --- a/docs/_plugins/build_api_docs.rb +++ b/docs/_plugins/build_api_docs.rb @@ -164,11 +164,19 @@ def stream_and_capture(command, log_file) end # Scans the captured unidoc log and prints a pointer to the most likely -# culprit source file. The heuristic: when javadoc dies mid-HTML-generation, -# the last "Generating .../X.html" line before "javadoc exited with exit code" -# names the class that tripped it. Prints nothing actionable if the failure -# mode doesn't match (e.g. a scaladoc error), in which case the full log above -# already shows what's wrong. +# culprit source file. Two failure modes are surfaced: +# +# 1. javadoc dies mid-HTML-generation. The last "Generating .../X.html" line +# before "javadoc exited with exit code" names the class that tripped it. +# +# 2. javadoc completes HTML generation but reports a non-zero " errors" +# count from doclint reference checks. With "-verbose" enabled in the +# javacOptions, each such error appears in the log as +# .java:: error: reference not found +# and we list them so the developer knows exactly which {@link} to fix. +# +# Prints nothing actionable if neither pattern matches (e.g. a scaladoc +# error), in which case the full log above already shows what's wrong. def diagnose_unidoc_failure(log_file) return unless File.exist?(log_file) begin @@ -187,6 +195,22 @@ def diagnose_unidoc_failure(log_file) end end + # "error: reference not found" lines come from javadoc's reference doclint + # check on broken {@link Class.member} or {@link Class#member} refs in the + # generated stubs (under target/java/...). The line number in the message + # is into the *generated* .java, not the original .scala source -- finding + # the offending scaladoc usually means opening that target/java file at + # that line and reading the {@link ...} on it back to the .scala doc. + ansi = /\e\[[0-9;]*[A-Za-z]/ + ref_errors = [] + lines.each do |line| + stripped = line.gsub(ansi, '') + if stripped =~ %r{^(?:\[(?:error|warn|info)\]\s+)?(\S+\.java):(\d+):\s+error: reference not found} + ref_errors << "#{$1}:#{$2}" + end + end + ref_errors.uniq! + banner = "=" * 78 $stderr.puts "" $stderr.puts banner @@ -209,6 +233,23 @@ def diagnose_unidoc_failure(log_file) $stderr.puts " NOTE: the '[error]' lines above on files under" $stderr.puts " target/java/... are benign genjavadoc stubs -- every PR" $stderr.puts " emits them and they do not cause the exit. Ignore them." + elsif !ref_errors.empty? + $stderr.puts "" + $stderr.puts " Javadoc reference-resolution errors (each one is a broken" + $stderr.puts " {@link} in a doc comment that genjavadoc copied verbatim" + $stderr.puts " from the corresponding scaladoc; fix the [[link]] in the" + $stderr.puts " Scala source):" + $stderr.puts "" + ref_errors.first(50).each { |e| $stderr.puts " #{e}" } + if ref_errors.size > 50 + $stderr.puts " ... and #{ref_errors.size - 50} more" + end + $stderr.puts "" + $stderr.puts " Common cause: [[Class.member]] in scaladoc when Class is a" + $stderr.puts " regular `class`/`trait` (not a Scala `object`) and there is" + $stderr.puts " no companion-object member with that name. genjavadoc emits" + $stderr.puts " {@link Class.member}, javadoc reads `.` as the inner-class" + $stderr.puts " separator and fails to resolve. Use [[Class#member]] instead." elsif javadoc_exit_idx $stderr.puts "" $stderr.puts " Javadoc exited but no class HTML generation was in progress;" From 819dbafff36dcdd812f5d466fd2e6248f79c6e88 Mon Sep 17 00:00:00 2001 From: Juliusz Sompolski Date: Tue, 28 Apr 2026 09:51:07 +0000 Subject: [PATCH 3/3] Document the [[Class#method]] member-link convention in AGENTS.md Linking to a method on another class from scaladoc using [[Class.method]] is the most common cause of the `error: reference not found` doclint failure that the previous two commits make visible: javadoc reads `.` as the inner-class separator, and if `Class` is a regular class/trait without a matching companion member the resolution fails. Add a one-paragraph note in the Development Notes section pointing at the `#` form, so future contributors and AI agents know to use it from the start instead of triggering the unidoc failure and reading it back out of the diagnostic banner. --- AGENTS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 28272d19fe933..536a7faea605c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,6 +20,8 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" `. +Scaladoc member-link convention: when linking to a method or field of another class from a `/** */` doc comment, use `[[Class#method]]`, not `[[Class.method]]`. genjavadoc passes wiki-style links through as javadoc `{@link ...}`, and javadoc reads `.` as the inner-class separator; if `Class` is a regular `class` / `trait` (not a Scala `object`) and has no companion-object member with that name, javadoc fails to resolve and the unidoc step fails with `error: reference not found`. The `#` form is the Javadoc-canonical member separator and resolves cleanly. Same-class members can still be referenced bare as `[[methodName]]`. + ## Build and Test Build and tests can take a long time. Before running tests, ask the user if they have more changes to make.