Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8d5b4c5
[SPARK-56636][INFRA] Surface scalastyle, unidoc and compile errors as…
cloud-fan Apr 27, 2026
f9081b1
[SPARK-56636][INFRA][FOLLOW-UP] Tighten compile-error annotator: stri…
cloud-fan Apr 27, 2026
b8e856e
[SPARK-56636][INFRA][FOLLOW-UP] Match scalastyle violations that incl…
cloud-fan Apr 27, 2026
c3e5189
VALIDATION (DO NOT MERGE): planted scalastyle violation to confirm an…
cloud-fan Apr 27, 2026
a200110
[SPARK-56636][INFRA][FOLLOW-UP] Match sbt-logger scalastyle output an…
cloud-fan Apr 27, 2026
d78fa27
[SPARK-56636][INFRA] Decouple style checks from compile (SBT + Maven)
cloud-fan Apr 27, 2026
f0790ff
[SPARK-56636][INFRA] Trim PR scope: drop speculative annotators, reve…
cloud-fan Apr 27, 2026
d4df1c5
[SPARK-56636][INFRA][FOLLOW-UP] Fix dangling noLintOnCompile, drop st…
cloud-fan Apr 27, 2026
4e1f1fe
VALIDATION (DO NOT MERGE): plant scalastyle violation to verify decou…
cloud-fan Apr 28, 2026
365edf9
Revert "VALIDATION (DO NOT MERGE): plant scalastyle violation to veri…
cloud-fan Apr 28, 2026
cc62710
[SPARK-56636][INFRA][FOLLOW-UP] Surface non-stub [error] lines in uni…
cloud-fan Apr 28, 2026
fef77d9
Drop AGENTS.md doc-gen-debug subsection -- redundant with the diagnos…
cloud-fan Apr 28, 2026
11170ea
VALIDATION (DO NOT MERGE): plant doclint heading-out-of-sequence to v…
cloud-fan Apr 28, 2026
07386ea
[SPARK-56636][INFRA][FOLLOW-UP] Tighten unidoc diagnostic regex; clea…
cloud-fan Apr 28, 2026
e61b9e3
[SPARK-56636][INFRA][FOLLOW-UP] Replace unidoc diagnostic banner with…
cloud-fan Apr 28, 2026
79c1221
VALIDATION (DO NOT MERGE): plant Scala doclint heading violation to v…
cloud-fan Apr 28, 2026
c4e9d14
[SPARK-56636][INFRA][FOLLOW-UP] Fix pre-existing doclint debt surface…
cloud-fan Apr 28, 2026
74a3760
[SPARK-56636][INFRA][FOLLOW-UP] LauncherServer: wrap ASCII diagram in…
cloud-fan Apr 28, 2026
81964b6
[SPARK-56636][INFRA][FOLLOW-UP] Fix more pre-existing doclint debt (n…
cloud-fan Apr 28, 2026
855fcfe
[SPARK-56636][INFRA][FOLLOW-UP] Scope compile-time doclint to /public…
cloud-fan Apr 28, 2026
d612e84
[SPARK-56636][INFRA][FOLLOW-UP] Narrow compile-time doclint to html g…
cloud-fan Apr 28, 2026
ba7e0ff
[SPARK-56636][INFRA][FOLLOW-UP] Wrap antlr lexer comment's MAP<INT, A…
cloud-fan Apr 28, 2026
d2f02b2
[SPARK-56636][INFRA][FOLLOW-UP] XXH64.java: replace self-closing <p/>…
cloud-fan Apr 28, 2026
472b74f
[SPARK-56636][INFRA][FOLLOW-UP] Drop compile-time doclint (Move A); r…
cloud-fan Apr 28, 2026
0065b7a
[SPARK-56636][INFRA][FOLLOW-UP] Refresh stale comment on JavaUnidoc j…
cloud-fan Apr 28, 2026
750479f
[SPARK-56636][INFRA][FOLLOW-UP] Drop /public access modifier from Jav…
cloud-fan Apr 28, 2026
9705777
[SPARK-56636][INFRA][FOLLOW-UP] Filter genjavadoc-stub noise by messa…
cloud-fan Apr 28, 2026
d6dfb63
[SPARK-56636][INFRA][FOLLOW-UP] Narrow unidoc -Xdoclint to html group…
cloud-fan Apr 29, 2026
4a5d47b
VALIDATION (DO NOT MERGE): swap planted heading violation from Scala …
cloud-fan Apr 29, 2026
2fa7629
DEBUG (DO NOT MERGE): bypass Move C filter; verify whether diagnostic…
cloud-fan Apr 29, 2026
4eebc14
[SPARK-56636][INFRA][FOLLOW-UP] Add SparkUnidocDoclet to mirror javad…
cloud-fan Apr 29, 2026
2484313
[SPARK-56636][INFRA][FOLLOW-UP] SparkUnidocDoclet: drop overrides not…
cloud-fan Apr 29, 2026
2fa1926
[SPARK-56636][INFRA][FOLLOW-UP] SparkUnidocDoclet: delegate getStanda…
cloud-fan Apr 29, 2026
83c8a3c
[SPARK-56636][INFRA][FOLLOW-UP] SparkUnidocDoclet: wrap getStandardWr…
cloud-fan Apr 29, 2026
0597a68
[SPARK-56636][INFRA][FOLLOW-UP] Drop unidoc doclint experiments; resc…
cloud-fan Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,6 @@ jobs:
HIVE_PROFILE: ${{ matrix.hive }}
GITHUB_PREV_SHA: ${{ github.event.before }}
SPARK_LOCAL_IP: localhost
NOLINT_ON_COMPILE: true
SKIP_UNIDOC: true
SKIP_MIMA: true
SKIP_PACKAGING: true
Expand Down Expand Up @@ -599,7 +598,6 @@ jobs:
HIVE_PROFILE: hive2.3
GITHUB_PREV_SHA: ${{ github.event.before }}
SPARK_LOCAL_IP: localhost
NOLINT_ON_COMPILE: true
SKIP_UNIDOC: true
SKIP_MIMA: true
SKIP_PACKAGING: true
Expand Down Expand Up @@ -868,7 +866,6 @@ jobs:
env:
LC_ALL: C.UTF-8
LANG: C.UTF-8
NOLINT_ON_COMPILE: false
GITHUB_PREV_SHA: ${{ github.event.before }}
BRANCH: ${{ inputs.branch }}
container:
Expand Down Expand Up @@ -1060,7 +1057,6 @@ jobs:
env:
LC_ALL: C.UTF-8
LANG: C.UTF-8
NOLINT_ON_COMPILE: false
PYSPARK_DRIVER_PYTHON: python3.9
PYSPARK_PYTHON: python3.9
GITHUB_PREV_SHA: ${{ github.event.before }}
Expand Down
1 change: 0 additions & 1 deletion dev/make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,6 @@ fi
cd "$SPARK_HOME"

if [ "$SBT_ENABLED" == "true" ] ; then
export NOLINT_ON_COMPILE=1
# Store the command as an array because $SBT variable might have spaces in it.
# Normal quoting tricks don't work.
# See: http://mywiki.wooledge.org/BashFAQ/050
Expand Down
56 changes: 56 additions & 0 deletions dev/scalastyle
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,62 @@ ERRORS=$(echo -e "q\n" \

if test ! -z "$ERRORS"; then
echo -e "Scalastyle checks failed at following occurrences:\n$ERRORS"
# When running under GitHub Actions, also emit each scalastyle violation as
# a workflow `::error` annotation so it appears inline on the PR's "Files
# changed" tab. Without this, a violation cascades into ~7 red CI checks
# (Linters, Java 17/25 Maven build, Documentation generation, sparkr,
# Docker integration, TPC-DS) -- all needing catalyst to compile -- and
# each only surfaces a generic "exit code 1" with no file/line, forcing
# the user to download a full job log to find the actual violation.
if [[ "${GITHUB_ACTIONS:-}" == "true" ]]; then
# Strip ANSI color codes from the captured output before regex
# matching. Today sbt under awk's pipe is not a TTY and skips color,
# so the input is already plain. But if sbt color is ever forced
# (`-Dsbt.color=always`, custom CI shell), `\e[31m` would silently
# break every regex below. Cheap to harden.
ERRORS_PLAIN=$(printf '%s' "$ERRORS" | sed -E $'s/\x1b\\[[0-9;]*[A-Za-z]//g')
# Helper: emit one `::error` annotation. Centralised so the two regex
# branches below stay short.
emit_annotation() {
local file="$1" lineno="$2" msg="$3"
# Strip the GitHub Actions workspace prefix so the annotation
# references the path as it appears in the repo.
local file_rel="${file#${GITHUB_WORKSPACE:-}/}"
# Escape the few characters GitHub reserves in annotation values:
# %, \r, \n. (`,` and `:` need not be escaped in the message body,
# only inside parameter values, which we don't use.)
local msg_escaped="${msg//%/%25}"
msg_escaped="${msg_escaped//$'\r'/%0D}"
msg_escaped="${msg_escaped//$'\n'/%0A}"
printf '::error file=%s,line=%s,title=Scalastyle::%s\n' \
"$file_rel" "$lineno" "$msg_escaped"
}
printf '%s\n' "$ERRORS_PLAIN" | while IFS= read -r raw; do
# Two scalastyle output formats reach us:
#
# (a) scalastyle's native console writer (`Tasks.doScalastyle` when
# invoked by the explicit `scalastyle` / `test:scalastyle`
# tasks):
# error file=<path> message=<text> line=<n> [column=<n>]
# The path has no spaces, the message can; `column=<n>` is
# appended for checkers that report a column (e.g.
# `WhitespaceEndOfLineChecker`) and absent otherwise.
#
# (b) sbt's logger format, used when `Tasks.doScalastyle` writes
# through `streams.value.log.error(...)` -- which is what the
# explicit `scalastyle` / `test:scalastyle` tasks invoked by
# this script do, and so this is the format we see in CI:
# [error] <path>:<line>: <message>
# The leading `[error] ` plus a single `:<line>:` (with no
# `:<col>:` follow-up) is what tells it apart from a regular
# Scala compile error of shape `[error] <path>:<line>:<col>: <msg>`.
if [[ "$raw" =~ ^error[[:space:]]+file=([^[:space:]]+)[[:space:]]+message=(.*)[[:space:]]+line=([0-9]+)([[:space:]]+column=[0-9]+)?$ ]]; then
emit_annotation "${BASH_REMATCH[1]}" "${BASH_REMATCH[3]}" "${BASH_REMATCH[2]}"
elif [[ "$raw" =~ ^\[error\][[:space:]]+(/[^:[:space:]]+):([0-9]+):[[:space:]]+(.+)$ ]]; then
emit_annotation "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
fi
done
fi
exit 1
else
echo -e "Scalastyle checks passed."
Expand Down
125 changes: 36 additions & 89 deletions docs/_plugins/build_api_docs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -133,101 +133,48 @@ def build_spark_scala_and_java_docs_if_necessary

command = "build/sbt -Pkinesis-asl unidoc"
puts "Running '#{command}'..."
# Tee sbt output to a log file so we can diagnose failures. The most common
# unidoc failure is a javadoc crash mid-stream while generating HTML for a
# specific class, buried under ~100 benign errors on genjavadoc-generated
# Java stubs (e.g. target/java/org/apache/spark/ErrorInfo.java). Without the
# diagnostic below, the real culprit -- the source whose doc tripped javadoc
# -- is effectively invisible in the CI log.
log_file = File.join(SPARK_PROJECT_ROOT, "target", "unidoc-build.log")
mkdir_p(File.dirname(log_file))
success = stream_and_capture(command, log_file)
unless success
diagnose_unidoc_failure(log_file)
raise("Unidoc generation failed")
end
end

# Runs `command`, streaming every line to both stdout and `log_file`. Returns
# true iff the command exited 0. Ruby-only; no shell pipefail reliance.
def stream_and_capture(command, log_file)
File.open(log_file, 'w') do |f|
IO.popen("#{command} 2>&1", 'r') do |pipe|
pipe.each_line do |line|
# Suppress genjavadoc-stub diagnostic blocks from the visible log. javadoc
# emits ~3500 `[error]` lines per unidoc run on stubs under `target/java/`
# -- all benign because `--ignore-source-errors` is set, but they bury
# everything else. Each diagnostic is a header line followed by 3-5
# `[error|warn]`-prefixed continuation lines (snippet, caret,
# symbol/location); the state machine drops both.
#
# Match by *message text*, not just by `target/java/` path. Otherwise
# legitimate doclint diagnostics on stub paths would be hidden too --
# those messages are real signal. The patterns below are the known-benign
# genjavadoc structural errors; anything else on a `target/java/` path is
# echoed. Diagnostic mirror lines from `SparkUnidocDoclet` use the
# `[unidoc-doclet]` prefix and don't match either regex, so they always
# pass through.
ansi = /\e\[[0-9;]*[A-Za-z]/
stub_header = %r{
\[(?:error|warn)\]\s+
\S*?/target/java/\S+\.java:\d+(?::\d+)?:\s+
error:\s+
(?:cannot\s+find\s+symbol
|illegal\s+combination\s+of\s+modifiers
|non-static\s+type\s+variable\b
|.*?\s+is\s+not\s+public\s+in\s+\S+;\s+cannot\s+be\s+accessed\s+from\s+outside\s+package)
}x
stub_cont = %r{\A\s*\[(?:error|warn)\]\s+(?!/\S+\.java:\d+(?::\d+)?:\s)}
in_stub = false
IO.popen("#{command} 2>&1", 'r') do |pipe|
pipe.each_line do |line|
plain = line.gsub(ansi, '')
if plain =~ stub_header
in_stub = true
elsif in_stub && plain =~ stub_cont
# continuation of a stub block; suppress
else
in_stub = false
$stdout.write(line)
$stdout.flush
f.write(line)
end
end
end
$?.success?
end

# Scans the captured unidoc log and prints a pointer to the most likely
# culprit source file. The heuristic: when javadoc dies mid-HTML-generation,
# the last "Generating .../X.html" line before "javadoc exited with exit code"
# names the class that tripped it. Prints nothing actionable if the failure
# mode doesn't match (e.g. a scaladoc error), in which case the full log above
# already shows what's wrong.
def diagnose_unidoc_failure(log_file)
return unless File.exist?(log_file)
begin
lines = File.readlines(log_file)

javadoc_exit_idx = lines.rindex { |l| l.include?("javadoc exited with exit code") }
last_generating = nil
if javadoc_exit_idx
# Strip ANSI color codes so the regex matches sbt-coloured output too.
ansi = /\e\[[0-9;]*[A-Za-z]/
lines[0...javadoc_exit_idx].reverse_each do |line|
if line.gsub(ansi, '') =~ %r{Generating .+/javaunidoc/(\S+?\.html)\.\.\.}
last_generating = $1
break
end
end
end

banner = "=" * 78
$stderr.puts ""
$stderr.puts banner
$stderr.puts "Unidoc failed -- diagnostic summary"
$stderr.puts banner
if last_generating
class_path = last_generating.sub(/\.html$/, '')
class_name = class_path.tr('/', '.')
$stderr.puts ""
$stderr.puts " Javadoc crashed while generating: #{last_generating}"
$stderr.puts " Likely culprit: doc comment in #{class_name}"
$stderr.puts ""
$stderr.puts " Javadoc can hard-exit (not just warn) on specific scaladoc"
$stderr.puts " patterns once they have been passed through genjavadoc --"
$stderr.puts " wiki-style `[[Class]]` / `[[method]]` links or inline-backticked"
$stderr.puts " code refs in the Scala source for the class above are common"
$stderr.puts " triggers. Start by auditing any recent doc-string changes in"
$stderr.puts " that source file."
$stderr.puts ""
$stderr.puts " NOTE: the '[error]' lines above on files under"
$stderr.puts " target/java/... are benign genjavadoc stubs -- every PR"
$stderr.puts " emits them and they do not cause the exit. Ignore them."
elsif javadoc_exit_idx
$stderr.puts ""
$stderr.puts " Javadoc exited but no class HTML generation was in progress;"
$stderr.puts " the crash predates HTML output -- likely a CLI / classpath /"
$stderr.puts " setup issue. See the full sbt output above."
else
$stderr.puts ""
$stderr.puts " Could not locate a 'javadoc exited with exit code' marker in"
$stderr.puts " the log; the failure is likely outside the javaunidoc step"
$stderr.puts " (scaladoc / sbt / build env). See the full sbt output above."
end
$stderr.puts banner
$stderr.puts ""
rescue => e
# Never let the diagnostic helper itself obscure the underlying unidoc
# failure: if anything here goes wrong (e.g. encoding error reading the
# log), report it briefly and let the caller raise the real error.
$stderr.puts "(diagnostic helper failed: #{e.class}: #{e.message})"
end
raise("Unidoc generation failed") unless $?.success?
end

def build_scala_and_java_docs
Expand Down
38 changes: 31 additions & 7 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3235,6 +3235,11 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
</plugin>
<!-- Scalastyle is intentionally NOT bound to a phase here; activate the
`scalastyle` profile (or run `mvn scalastyle:check` explicitly) to
run it. Default Maven builds skip scalastyle so that a single
violation does not cascade into every Maven-invoked CI job; the
dedicated lint job is the single source of truth for style. -->
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
Expand All @@ -3251,13 +3256,6 @@
<inputEncoding>${project.build.sourceEncoding}</inputEncoding>
<outputEncoding>${project.reporting.outputEncoding}</outputEncoding>
</configuration>
<executions>
<execution>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Expand Down Expand Up @@ -3395,6 +3393,32 @@

<profiles>

<!--
Opt-in profile that binds scalastyle:check to the `verify` phase. Used
by the dedicated lint job; default Maven builds intentionally skip
scalastyle to avoid cascading a style violation into every Maven CI job.
-->
<profile>
<id>scalastyle</id>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
</build>
</profile>

<!--
This profile is enabled automatically by the sbt build. It changes the scope for shaded
dependencies, since we don't shade it in the artifacts generated by the sbt build.
Expand Down
25 changes: 12 additions & 13 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -242,19 +242,20 @@ object SparkBuild extends PomBuild {
Set(file)
}

// Defines the standalone `scalaStyleOnCompile` / `scalaStyleOnTest` tasks
// invoked by `dev/lint-scala`. Style is intentionally NOT attached to
// `(Compile / compile)` -- a violation in one module would otherwise abort
// compile for that module and every transitive dependent, cascading style
// failures into every job that recompiles those sources (Build modules,
// Documentation generation, Java 17/25 Maven build, sparkr, ...). Each
// cascaded job then surfaces only a generic "exit code 1" with no file/line.
// After decoupling, the dedicated lint job is the single place style
// violations surface, with file/line annotations from `dev/scalastyle`.
def enableScalaStyle: Seq[sbt.Def.Setting[_]] = Seq(
scalaStyleOnCompile := cachedScalaStyle(Compile).value,
scalaStyleOnTest := cachedScalaStyle(Test).value,
(scalaStyleOnCompile / logLevel) := Level.Warn,
(scalaStyleOnTest / logLevel) := Level.Warn,
(Compile / compile) := {
scalaStyleOnCompile.value
(Compile / compile).value
},
(Test / compile) := {
scalaStyleOnTest.value
(Test / compile).value
}
(scalaStyleOnTest / logLevel) := Level.Warn
)

lazy val compilerWarningSettings: Seq[sbt.Def.Setting[_]] = Seq(
Expand Down Expand Up @@ -290,12 +291,10 @@ object SparkBuild extends PomBuild {
}
)

val noLintOnCompile = sys.env.contains("NOLINT_ON_COMPILE") &&
!sys.env.get("NOLINT_ON_COMPILE").contains("false")
lazy val sharedSettings = checkJavaVersionSettings ++
sparkGenjavadocSettings ++
compilerWarningSettings ++
(if (noLintOnCompile) Nil else enableScalaStyle) ++ Seq(
enableScalaStyle ++ Seq(
(Compile / exportJars) := true,
(Test / exportJars) := false,
javaHome := sys.env.get("JAVA_HOME")
Expand Down Expand Up @@ -401,7 +400,7 @@ object SparkBuild extends PomBuild {
/* Enable shared settings on all projects */
(allProjects ++ optionallyEnabledProjects ++ assemblyProjects ++ copyJarsProjects ++ Seq(spark, tools))
.foreach(enable(sharedSettings ++ DependencyOverrides.settings ++
ExcludedDependencies.settings ++ (if (noLintOnCompile) Nil else Checkstyle.settings) ++
ExcludedDependencies.settings ++ Checkstyle.settings ++
ExcludeShims.settings))

/* Enable tests settings for all projects except examples, assembly and tools */
Expand Down