-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54583][SS] Add SQLConf to enable use of OffsetMap #53365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/checkpointing/OffsetSeq.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/checkpointing/OffsetSeq.scala
Show resolved
Hide resolved
anishshri-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm pending green CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more clear about the mappings between OffsetSeqMetadata and OffsetSeq/OffsetMap? Can any version of OffsetSeqMetadata be used with OffsetSeq? Can any version of metadata be used with OffsetMap? Looking at the PR description the answer is yet, but there's only 1 conf for both. How do we control the metadata version, do we need to introduce another conf?
| object OffsetSeqLog { | ||
| private[streaming] val VERSION_1 = 1 | ||
| private[streaming] val VERSION_2 = 2 | ||
| val VERSION_1 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we scope this?
| val metadata = v1.metadataOpt.get.asInstanceOf[OffsetSeqMetadata] | ||
| v1.copy(metadataOpt = Some(metadata.copy( | ||
| conf = metadata.conf + (SQLConf.SHUFFLE_PARTITIONS.key -> numPartitions.toString)))) | ||
| case _ => throw OfflineStateRepartitionErrors.unsupportedOffsetSeqVersionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create a TODO + add ticket to support OffsetMap here?
| */ | ||
| case class OffsetSeqV2( | ||
| offsets: Seq[Option[OffsetV2]], | ||
| metadataOpt: Option[OffsetSeqMetadataV2] = None) extends OffsetSeqBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we require metadata for OffsetSeqV2 (don't make it an option)?
| case class OffsetMap( | ||
| offsetsMap: Map[String, Option[OffsetV2]], | ||
| metadataOpt: Option[OffsetSeqMetadata] = None) extends OffsetSeqBase { | ||
| metadataOpt: Option[OffsetSeqMetadataBase] = None) extends OffsetSeqBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
What changes were proposed in this pull request?
This PR decouples the versioning of OffsetSeqMetadata from OffsetLog, allowing them to evolve independently while maintaining backward compatibility
with existing streaming checkpoints.
Key changes:
- sourceMetadataInfo: Map of source metadata keyed by sourceId
- controlBatchInfo: Information about control batches (e.g., rewind batches)
Why are the changes needed?
Previously, OffsetSeqMetadata version was tightly coupled to the OffsetLog version. This meant that any change to metadata format would require
bumping the entire offset log version, making it difficult to evolve metadata independently.
This change enables:
Does this PR introduce any user-facing change?
No. The changes are backward compatible:
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No