[SPARK-56666][INFRA] Reduce unidoc CI log noise with -Xdoclint:-missing and -verbose post-filter#55605
[SPARK-56666][INFRA] Reduce unidoc CI log noise with -Xdoclint:-missing and -verbose post-filter#55605cloud-fan wants to merge 2 commits into
Conversation
41cb984 to
277354d
Compare
…ng and -verbose post-filter
### What changes were proposed in this pull request?
Refines the unidoc javacOptions in `JavaUnidoc / unidoc / javacOptions`
and the post-process stream filter in `docs/_plugins/build_api_docs.rb`
so that the Documentation generation CI log is small enough to scan
visually while still surfacing per-file `error: reference not found`
diagnostics on broken `{@link}` references.
Builds on the `-Xmaxerrs` and `-verbose` insight from
apache#55581 (SPARK-56630 follow-up):
javadoc's default `-Xmaxerrs 100` cap was hit by the ~100 inert
genjavadoc-stub errors during source loading, so doclint never ran on
the real sources, and the per-file `error: reference not found`
diagnostics surfaced only with `-verbose`. That PR's flag set
(`-Xmaxerrs 999999`, `-Xmaxwarns 999999`, `-verbose`) achieved the
diagnostic goal but at a ~77K-line CI log per run.
This PR keeps the diagnostic visibility and brings the visible CI log
down to ~4K lines (95% reduction), with three changes:
1. **`-Xmaxerrs 0`** instead of `-Xmaxerrs 999999`. The `0` value is
treated as unlimited by javadoc (locally verified) and reads
cleaner than the magic number.
2. **`-Xdoclint:all` + `-Xdoclint:-missing`** (two separate flags,
matching the existing `Compile / doc / javacOptions` pattern in
`SparkBuild.scala`). Suppresses the `missing` doclint group at
javadoc level: the ~22K `no comment` / `no @param` / `no @return`
/ `no @throws` warnings (each rendered as a 3-line block) that
dominate the log on every Spark unidoc run. The two-flag form is
load-bearing -- bare `-Xdoclint:-missing` alone demotes other
doclint groups (notably `reference`) to warning level, making
broken `{@link}` non-fatal; the explicit `-Xdoclint:all` first
keeps reference at error level. Locally verified.
3. **Drop `-Xmaxwarns 999999`.** Warnings don't fail CI; error
visibility is governed by `-Xmaxerrs`, not `-Xmaxwarns`. javadoc's
default cap of 100 is sufficient -- shows a sample of any
remaining warnings without flooding. Saves ~4K lines beyond
`-Xdoclint:-missing` alone.
4. **Post-filter `-verbose` progress lines from the build_api_docs.rb
stream.** `-verbose` itself stays (it is load-bearing for per-file
`error: reference not found` emission per apache#55581), but its
progress noise -- `Loading source file ...`, `[parsing
started/completed]`, `[loading /path/X.class]`, `Generating
/path/X.html` -- carries no diagnostic signal. The existing stream
filter is extended with a `verbose_line` regex that drops these
single-line progress entries from stdout. Saves ~13K lines.
### Why are the changes needed?
Documentation generation CI logs were ~77K lines per run after
SPARK-56630's flag set. That is large enough that scanning for
diagnostics by eye is impractical, and grep-piping is the only
reasonable workflow. Most of the volume is structural noise (genjavadoc
stub errors, `no comment` warnings, `-verbose` progress markers) with
no diagnostic signal. After this PR the log is ~4K lines on a
real-failure run; the per-file `error: reference not found`
diagnostics PR apache#55581 added are the dominant content.
Empirical breakdown of the reduction (verified via test PR apache#55605
with deliberately broken `{@link}` plants in both a real `.java`
source and a Scala source):
| State | Log lines | Vs baseline |
| ---------------------------------- | --------: | ----------: |
| PR apache#55581's flag set (baseline) | 77K | |
| Add `-Xdoclint:all,-missing` | 22K | -71% |
| Drop `-Xmaxwarns 999999` | 18K | -77% |
| Post-filter `-verbose` progress | **~4K** | **-95%** |
All four diagnostic targets remain visible in the final form: 2
broken `{@link}`s in `ColumnarMap.java` (Java source) and 2 broken
`[[Class.member]]`-style refs in a Scala source via the genjavadoc
stub.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tested end-to-end on PR apache#55605 (testing-only fork PR) with planted
broken `{@link}` references in both code paths:
- `ColumnarMap.java` (real Java source): `{@link
org.apache.spark.deliberately.NoSuchClass}` and `{@link
ColumnVector#nonExistentMethod()}`.
- `Partition.scala` (Scala source via genjavadoc): `[[Partition.index]]`
-- the wrong `.` separator that javadoc reads as inner-class lookup
and fails to resolve. This is the case PR apache#55581's AGENTS.md note
documents as the most common scaladoc-side cause of unidoc failure.
Both surfaced as per-file `error: reference not found` diagnostics in
the CI log on the test branch, doc gen failed as expected, log size
dropped to 3,977 lines, and zero `Loading source file` /
`[parsing started]` / `[loading X.class]` / `Generating *.html` /
`no comment` lines remained visible.
`-Xmaxerrs 0` and the bare-`-Xdoclint:-missing` demotion behavior
were verified locally with standalone javadoc invocations on a
minimal test file.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)
a2bbe80 to
2618e91
Compare
End-to-end test resultValidated on earlier branch commit Doc gen status: FAILURE ✅ (broken refs are fatal) Per-file diagnostics — all 4 visible: The Scala plant Log composition:
Reduction journey:
The current CI run on this PR's clean state (no plants) is expected to pass. |
These pre-dated SPARK-14790 (2016) and quieted the per-task log output when scalastyle was hooked into (Compile/compile) and (Test/compile). After SPARK-56636 decoupled scalastyle from compile, the tasks are only invoked from dev/lint-scala, so the per-task logLevel settings reference unread keys (sbt's lintUnused surfaces them as warnings).
|
The timeout issue is unrelated, thanks for review, merging to master! |
…ry and CI annotations After the noise filters from apache#55605, the Documentation generation CI log is about 4K lines. The two-line per-file fatal diagnostics (`error: reference not found`) are still buried in the middle of the log and the GitHub Actions check panel only shows "Process completed with exit code 1", which leaves reviewers grepping through the raw log to find the actual problem. This change is purely additive -- it drops no existing log lines. After the unidoc pipe closes, `build_api_docs.rb` prints a trailing `Fatal javadoc errors (N):` block listing each captured diagnostic, then emits a `::error file=,line=::` GitHub Actions workflow command per diagnostic so they appear as inline annotations on the PR check panel. Diagnostics are captured strictly within the Standard Doclet phase bracketed by `Building tree for all the packages and classes...` and `Building index for all classes...`, which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source- loading "error:" chatter outside that window is excluded. The captured count is cross-checked against javadoc's own `N errors` summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a `::warning::` workflow command is emitted so the drift is surfaced without silently masking real failures. Co-authored-by: Isaac
… fatal-error summary Mirrors PR apache#55605's testing pattern. Plants two unresolvable references on the real Java path (ColumnarMap.java) and one on the genjavadoc stub path (Partition.scala) so the fatal-error summary added in the previous commit gets exercised end-to-end in CI. To be dropped before merge. Co-authored-by: Isaac
…ry and CI annotations After the noise filters from apache#55605, the Documentation generation CI log is about 4K lines. The two-line per-file fatal diagnostics (`error: reference not found`) are still buried in the middle of the log and the GitHub Actions check panel only shows "Process completed with exit code 1", which leaves reviewers grepping through the raw log to find the actual problem. This change is purely additive -- it drops no existing log lines. After the unidoc pipe closes, `build_api_docs.rb` prints a trailing `Fatal javadoc errors (N):` block listing each captured diagnostic, then emits a `::error file=,line=::` GitHub Actions workflow command per diagnostic so they appear as inline annotations on the PR check panel. Diagnostics are captured strictly within the Standard Doclet phase bracketed by `Building tree for all the packages and classes...` and `Building index for all classes...`, which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source- loading "error:" chatter outside that window is excluded. The captured count is cross-checked against javadoc's own `N errors` summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a `::warning::` workflow command is emitted so the drift is surfaced without silently masking real failures. Co-authored-by: Isaac
…ry and CI annotations ### What changes were proposed in this pull request? After the noise filters from #55605, the Documentation generation CI log is around 4K lines on a failure run. The two-line per-file `error: reference not found` diagnostics are still buried in the middle of the log, and the GitHub Actions check panel for a failed doc-gen job only surfaces `Process completed with exit code 1`. Reviewers end up scrolling the raw log to find what actually broke. This PR is purely additive in `docs/_plugins/build_api_docs.rb` -- no existing log lines are dropped. After the unidoc pipe closes: 1. A trailing `Fatal javadoc errors (N):` block is printed, listing each captured diagnostic with file, line, and message. 2. One `::error file=<path>,line=<line>,title=javadoc::<msg>` GitHub Actions workflow command is emitted per diagnostic, so they appear as inline annotations on the PR check panel instead of as a single opaque `exit code 1`. Diagnostics are captured strictly within the Standard Doclet phase bracketed by `Building tree for all the packages and classes...` and `Building index for all classes...`, which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source-loading `error:` chatter outside that window is excluded -- it's already non-fatal and matches what javadoc's own `N errors` summary line counts. As a self-check, the captured count is compared against javadoc's own `N errors` summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a `::warning::` workflow command is emitted so the drift is surfaced without silently masking real failures. ### Why are the changes needed? PR #55605 made the doc-gen log small enough to read, but the failure path is still discoverable only via grep. The per-file diagnostics emitted by doclint are the actionable content; promoting them to the PR check panel and a clearly delimited summary block makes a doc-gen failure self-explanatory without leaving the PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? End-to-end on this branch with deliberately broken references planted in two code paths (mirroring the test pattern from PR #55605): - `ColumnarMap.java` (real Java source): `{link org.apache.spark.deliberately.NoSuchClass}` and `{link ColumnVector#nonExistentMethod()}`. - `Partition.scala` (Scala source via genjavadoc): `[[Partition.index]]` -- the `.`-separator case that javadoc treats as inner-class lookup. The Documentation generation job will fail with the expected `Fatal javadoc errors` summary block in the log and per-file inline annotations on this PR's check panel. The plant commit will be dropped before this PR is taken out of draft. The state machine was also exercised locally against a captured log from a prior failing doc-gen run; the captured fatal count matches javadoc's `N errors` summary line. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes #55814 from cloud-fan/unidoc-fatal-summary. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ry and CI annotations ### What changes were proposed in this pull request? After the noise filters from #55605, the Documentation generation CI log is around 4K lines on a failure run. The two-line per-file `error: reference not found` diagnostics are still buried in the middle of the log, and the GitHub Actions check panel for a failed doc-gen job only surfaces `Process completed with exit code 1`. Reviewers end up scrolling the raw log to find what actually broke. This PR is purely additive in `docs/_plugins/build_api_docs.rb` -- no existing log lines are dropped. After the unidoc pipe closes: 1. A trailing `Fatal javadoc errors (N):` block is printed, listing each captured diagnostic with file, line, and message. 2. One `::error file=<path>,line=<line>,title=javadoc::<msg>` GitHub Actions workflow command is emitted per diagnostic, so they appear as inline annotations on the PR check panel instead of as a single opaque `exit code 1`. Diagnostics are captured strictly within the Standard Doclet phase bracketed by `Building tree for all the packages and classes...` and `Building index for all classes...`, which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source-loading `error:` chatter outside that window is excluded -- it's already non-fatal and matches what javadoc's own `N errors` summary line counts. As a self-check, the captured count is compared against javadoc's own `N errors` summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a `::warning::` workflow command is emitted so the drift is surfaced without silently masking real failures. ### Why are the changes needed? PR #55605 made the doc-gen log small enough to read, but the failure path is still discoverable only via grep. The per-file diagnostics emitted by doclint are the actionable content; promoting them to the PR check panel and a clearly delimited summary block makes a doc-gen failure self-explanatory without leaving the PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? End-to-end on this branch with deliberately broken references planted in two code paths (mirroring the test pattern from PR #55605): - `ColumnarMap.java` (real Java source): `{link org.apache.spark.deliberately.NoSuchClass}` and `{link ColumnVector#nonExistentMethod()}`. - `Partition.scala` (Scala source via genjavadoc): `[[Partition.index]]` -- the `.`-separator case that javadoc treats as inner-class lookup. The Documentation generation job will fail with the expected `Fatal javadoc errors` summary block in the log and per-file inline annotations on this PR's check panel. The plant commit will be dropped before this PR is taken out of draft. The state machine was also exercised locally against a captured log from a prior failing doc-gen run; the captured fatal count matches javadoc's `N errors` summary line. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes #55814 from cloud-fan/unidoc-fatal-summary. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 12b2595) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ry and CI annotations ### What changes were proposed in this pull request? After the noise filters from #55605, the Documentation generation CI log is around 4K lines on a failure run. The two-line per-file `error: reference not found` diagnostics are still buried in the middle of the log, and the GitHub Actions check panel for a failed doc-gen job only surfaces `Process completed with exit code 1`. Reviewers end up scrolling the raw log to find what actually broke. This PR is purely additive in `docs/_plugins/build_api_docs.rb` -- no existing log lines are dropped. After the unidoc pipe closes: 1. A trailing `Fatal javadoc errors (N):` block is printed, listing each captured diagnostic with file, line, and message. 2. One `::error file=<path>,line=<line>,title=javadoc::<msg>` GitHub Actions workflow command is emitted per diagnostic, so they appear as inline annotations on the PR check panel instead of as a single opaque `exit code 1`. Diagnostics are captured strictly within the Standard Doclet phase bracketed by `Building tree for all the packages and classes...` and `Building index for all classes...`, which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source-loading `error:` chatter outside that window is excluded -- it's already non-fatal and matches what javadoc's own `N errors` summary line counts. As a self-check, the captured count is compared against javadoc's own `N errors` summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a `::warning::` workflow command is emitted so the drift is surfaced without silently masking real failures. ### Why are the changes needed? PR #55605 made the doc-gen log small enough to read, but the failure path is still discoverable only via grep. The per-file diagnostics emitted by doclint are the actionable content; promoting them to the PR check panel and a clearly delimited summary block makes a doc-gen failure self-explanatory without leaving the PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? End-to-end on this branch with deliberately broken references planted in two code paths (mirroring the test pattern from PR #55605): - `ColumnarMap.java` (real Java source): `{link org.apache.spark.deliberately.NoSuchClass}` and `{link ColumnVector#nonExistentMethod()}`. - `Partition.scala` (Scala source via genjavadoc): `[[Partition.index]]` -- the `.`-separator case that javadoc treats as inner-class lookup. The Documentation generation job will fail with the expected `Fatal javadoc errors` summary block in the log and per-file inline annotations on this PR's check panel. The plant commit will be dropped before this PR is taken out of draft. The state machine was also exercised locally against a captured log from a prior failing doc-gen run; the captured fatal count matches javadoc's `N errors` summary line. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes #55814 from cloud-fan/unidoc-fatal-summary. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 12b2595) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-56666
What changes were proposed in this pull request?
Refines the unidoc javacOptions in
JavaUnidoc / unidoc / javacOptionsand the post-process stream filter indocs/_plugins/build_api_docs.rbso that the Documentation generation CI log is small enough to scan visually while still surfacing per-fileerror: reference not founddiagnostics on broken{@link}references.Builds on the
-Xmaxerrsand-verboseinsight from #55581 (SPARK-56630 follow-up): javadoc's default-Xmaxerrs 100cap was hit by the ~100 inert genjavadoc-stub errors during source loading, so doclint never ran on the real sources, and the per-fileerror: reference not founddiagnostics surfaced only with-verbose. That PR's flag set (-Xmaxerrs 999999,-Xmaxwarns 999999,-verbose) achieved the diagnostic goal but at a ~77K-line CI log per run.This PR keeps the diagnostic visibility and brings the visible CI log down to ~4K lines (95% reduction), with four changes:
-Xmaxerrs 0instead of-Xmaxerrs 999999. The0value is treated as unlimited by javadoc (locally verified) and reads cleaner than the magic number.-Xdoclint:all+-Xdoclint:-missing(two separate flags, matching the existingCompile / doc / javacOptionspattern inSparkBuild.scala). Suppresses themissingdoclint group at javadoc level: the ~22Kno comment/no @param/no @return/no @throwswarnings (each rendered as a 3-line block) that dominate the log on every Spark unidoc run. The two-flag form is load-bearing — bare-Xdoclint:-missingalone demotes other doclint groups (notablyreference) to warning level, making broken{@link}non-fatal; the explicit-Xdoclint:allfirst keeps reference at error level. Locally verified.Drop
-Xmaxwarns 999999. Warnings don't fail CI; error visibility is governed by-Xmaxerrs, not-Xmaxwarns. javadoc's default cap of 100 is sufficient — shows a sample of any remaining warnings without flooding. Saves ~4K lines beyond-Xdoclint:-missingalone.Post-filter
-verboseprogress lines from the build_api_docs.rb stream.-verboseitself stays (it is load-bearing for per-fileerror: reference not foundemission per [SPARK-56630][INFRA][FOLLOWUP] Make unidoc surface real javadoc failures #55581), but its progress noise —Loading source file ...,[parsing started/completed],[loading /path/X.class],Generating /path/X.html— carries no diagnostic signal. The existing stream filter is extended with averbose_lineregex that drops these single-line progress entries from stdout. Saves ~13K lines.Why are the changes needed?
Documentation generation CI logs were ~77K lines per run after SPARK-56630's flag set. That is large enough that scanning for diagnostics by eye is impractical, and grep-piping is the only reasonable workflow. Most of the volume is structural noise (genjavadoc stub errors,
no commentwarnings,-verboseprogress markers) with no diagnostic signal. After this PR the log is ~4K lines on a real-failure run; the per-fileerror: reference not founddiagnostics PR #55581 added are the dominant content.Empirical breakdown of the reduction (verified end-to-end on this branch's earlier test commits with deliberately broken
{@link}plants in both a real.javasource and a Scala source):-Xdoclint:all,-missing-Xmaxwarns 999999-verboseprogressAll diagnostic targets remain visible: per-file
error: reference not foundfor both Java sources and Scala sources via the genjavadoc stub.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Validated end-to-end on earlier (now reverted) test commits of this branch with planted broken
{@link}references in both code paths:ColumnarMap.java(real Java source):{@link org.apache.spark.deliberately.NoSuchClass}and{@link ColumnVector#nonExistentMethod()}.Partition.scala(Scala source via genjavadoc):[[Partition.index]]— the wrong.separator that javadoc reads as inner-class lookup and fails to resolve. This is the case PR [SPARK-56630][INFRA][FOLLOWUP] Make unidoc surface real javadoc failures #55581's AGENTS.md note documents as the most common scaladoc-side cause of unidoc failure.Both surfaced as per-file
error: reference not founddiagnostics in the CI log on the test commit, doc gen failed as expected, log size dropped to 3,977 lines, and zeroLoading source file/[parsing started]/[loading X.class]/Generating *.html/no commentlines remained visible. See the test-result comment below for the full breakdown.-Xmaxerrs 0and the bare--Xdoclint:-missingdemotion behavior were verified locally with standalone javadoc invocations on a minimal test file.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)