Skip to content

alter_table: Support adding columns to tables#30470

Merged
ParkMyCar merged 10 commits into
MaterializeInc:mainfrom
ParkMyCar:alter_table2/support-adding-columns-to-tables
Jan 17, 2025
Merged

alter_table: Support adding columns to tables#30470
ParkMyCar merged 10 commits into
MaterializeInc:mainfrom
ParkMyCar:alter_table2/support-adding-columns-to-tables

Conversation

@ParkMyCar
Copy link
Copy Markdown
Contributor

@ParkMyCar ParkMyCar commented Nov 13, 2024

This PR implements the SQL feature ALTER TABLE ... ADD COLUMN ....

Note: There are a lot of lines changed but the majority are new tests!

Specifically it:

  1. Uses VersionedRelationDesc on Tables to track new columns
  2. Adds a CatalogCollectionEntry which adds some typing around getting the current RelationDesc for an entry.
  3. Updates the storage-controller to create new Persist WriteHandles and pass them to the TxnsTableWorker
  4. When creating a new version of a Table, will add the new version as dependency for the old version. This proved necessary to get proper read handles for objects on top of tables.

Otherwise it also adds several tests:

  1. alter-table.slt which exercises a number of different scenarios
  2. A new Check for the platform-checks test framework
  3. A new Action for the parallel-workload test framework.
    • This new action is currently disabled because off a race condition in Persist's schema registry
  4. A legacy upgrade test to make sure we have coverage on the restart of MZ test case.

Motivation

Fixes https://github.com/MaterializeInc/database-issues/issues/8233

Tips for reviewer

I split the PR up into separate commits to ideally make it easier to review, most the changes here are new tests!

  1. Changes to the Catalog APIs and name resolution to support versions.
  2. Changes to sequencing and the storage controller.
  3. Tests
  4. Formatting and clippy

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch 6 times, most recently from b1ba847 to 6edd439 Compare November 19, 2024 18:13
@ParkMyCar ParkMyCar marked this pull request as ready for review November 19, 2024 18:47
@ParkMyCar ParkMyCar requested review from a team as code owners November 19, 2024 18:47
@ParkMyCar ParkMyCar requested review from bkirwi and jkosh44 November 19, 2024 18:47
@shepherdlybot
Copy link
Copy Markdown

shepherdlybot Bot commented Nov 19, 2024

Risk Score:80 / 100 Bug Hotspots:9 Resilience Coverage:66%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability 🔍 Detected
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The risk score for this pull request is high at 80, driven by predictors such as the sum of bug reports of files and the delta of executable lines. Historically, pull requests with these predictors are 114% more likely to cause a bug compared to the repository baseline. Additionally, the repository's observed and predicted bug trends are both decreasing, which is a positive sign.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../catalog/apply.rs 98
../src/main.rs 97
../catalog/state.rs 91
../src/coord.rs 100
../src/catalog.rs 98
../src/names.rs 92
../catalog/open.rs 99
../src/lib.rs 95
../catalog/transact.rs 93

Copy link
Copy Markdown
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much appreciate all the tests!

What happens if you call ALTER TABLE ADD COLUMN on a continual task/table from source/mv/...? What if you try to add a column with the name of the table? Can you keep adding columns or do things get slower with number of columns? Could add some tests checking for correct errors in the SLT.

Comment thread misc/python/materialize/checks/all_checks/alter_table.py Outdated
Comment thread misc/python/materialize/parallel_workload/action.py
Comment thread misc/python/materialize/checks/all_checks/alter_table.py
Copy link
Copy Markdown
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapter code LGTM

Comment on lines +1381 to +1452
.map(|id| self.get_entry_by_global_id(id))
.filter_map(|entry| entry.index().map(|index| index.on));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be missing something, what was the change here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just some Rust lifetime shenanigans. .index() returns an Option<&Index> but .get_entry_by_global_id(...) returns an owned type that only lives for the duration of the .map(...) call

Comment on lines -178 to -183
storage_collection_metadata: TableTransaction::new_with_uniqueness_fn(
storage_collection_metadata,
|a: &StorageCollectionMetadataValue, b| a.shard == b.shard,
)?,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming, now we can have multiple global IDs for the same object that all point to the same shard?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

.unwrap_or_else(|| panic!("catalog out of sync, missing id {id:?}"));
self.get_entry(item_id)

let entry = self.get_entry(item_id).clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little bad to clone the entry in this function. This used to be pretty cheap but now involves cloning potentially large expressions and create sql statements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree, it's a bit tricky with Rust lifetimes and the trait CatalogItem, I'll circle back and see if I can improve this though. There might be a Cow<...> like thing we can do

Comment thread src/repr/src/relation.rs
Comment on lines +419 to +438
impl From<RelationVersion> for SchemaId {
fn from(value: RelationVersion) -> Self {
SchemaId(usize::cast_from(value.0))
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this, what's the correlation b/w a relation version and a schema version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now RelationVersions are 1:1 with SchemaIds. At some point we can break this relationship and store the mapping somewhere in the Catalog, but it's not necessary at the moment.

Comment on lines +806 to +807
fn latest_version(&self) -> Option<RelationVersion> {
self.entry.latest_version()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this returns an Option, when would an entry ever not have a version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An entry only has a version if it's version-able, i.e. only Tables will return Some here.

Comment thread src/catalog/src/memory/objects.rs Outdated
Comment on lines +276 to +284
let is_versioned = c
.options
.iter()
.any(|o| matches!(o.option, ColumnOption::Versioned { .. }));
!is_versioned
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we filtering here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, but it's because of how the names collection is used, I took a note for myself to refactor this entire block

@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch from 6edd439 to 02b0137 Compare December 5, 2024 15:13
@ParkMyCar ParkMyCar requested a review from petrosagg December 10, 2024 15:10
let since_handle = {
// If the collection we're openning is versioned, be sure to use a
// different CriticalId so the SinceHandles don't conflict.
let reader_id = match version {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear to me that we're managing the lifecycle of this handle properly... for example, I think the finalization task will only force-downgrade the handle of the controller-global handle, not these per-version handles. (And at a first pass it's not clear to me how all N critical handles get updated when the write frontier advances...)

@ParkMyCar
Copy link
Copy Markdown
Contributor Author

Still iterating a bit, but the most recent commit removes the need for multiple CriticalSinceHandles and implements an approach @petrosagg and I talked about where earlier versions of a table track later versions as dependencies, and through initial testing things seems to workout!

At a high level the implementation is there, but pushed up the commits to run against CI

@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch 8 times, most recently from 1b30ffa to 7c3e244 Compare December 18, 2024 20:35
@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch 3 times, most recently from f18f1b3 to 8e3af36 Compare December 22, 2024 19:17
@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch 3 times, most recently from 4a9de3d to 7ecf800 Compare January 8, 2025 17:05
// capability of the existing collection. This would cause runtime panics because it
// would eventually result in negative read capabilities.
let mut changes = ChangeBatch::new();
for (time, diff) in existing_read_capabilities.updates() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this has to be existing_read_capabilities.frontier(): the "old table" might maintain a bunch of outstanding read holds, but we only need to install the frontier of what it needs at the new primary.

When the frontier of the view table's read holds changes, it will then go and downgrade this one read hold it holds on the primary table. I have a feeling this is the source of the panic you're seeing.

@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch from 5557ec7 to 8ddd280 Compare January 14, 2025 00:38
match schema_result {
CaESchema::Ok(id) => id,
// TODO(alter_table): Better handling of these errors.
CaESchema::ExpectedMismatch { .. } => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen? Also, in this case shouldn't we check if the actual schema is the one we wanted and if so continue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't ever happen because the Coordinator should be the only one evolving the schema of this shard. I added a soft assertion to see if it ever does get hit, and a TODO to follow up. Right now this would just fail the ALTER TABLE command and the user would have to re-run it on their own.

)))
}
CaESchema::Incompatible => {
return Err(StorageError::Generic(anyhow::anyhow!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a soft_panic_or_log here since it almost certainly a bug if we hit this case, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added!

.await;

// TODO(alter_table): Do we need to advance the since of the table to match the time this
// new version was registered with txn-wal?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the txn wal involved here? Does compare and evolve schema write something there? Also, who knows the answer to this question?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this TODO because when creating a table we advance the since to the register_ts. I don't think it's necessary here but I'll follow up with @aljoscha or @jkosh44

// Because this is a Table, we know it's managed by txn_wal, and thus it's logical write
// frontier is possibly in advance of the write_handle's upper. So we fast forward the
// write frontier to match that of the existing collection.
let write_frontier =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems to imply that write_handle.upper() <= existing_writer_frontier is an invariant. Should we unconditionally use the existing_write_frontier then? Is there any case where that inequality doesn't hold?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only case this inequality doesn't hold is if a shard that is registered with txn-wal is written to outside of txn-wal, so misuse. I added a soft_assert here incase there is a separate edge case, e.g. on initialization or something.

if !dependency_read_holds.is_empty() {
//
// TODO(parkmycar): Include Tables (is_in_txns) in this check.
if !dependency_read_holds.is_empty() && !is_in_txns(id, &metadata) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is checking if the collection is in txn needed here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded this comment!

* changes Table to use a VersionedRelationDesc
* adds CatalogCollectionEntry and move desc(...) method to it
* renamed CatalogEntry::desc to CatalogEntry::desc_latest
* implement sequencing in the adapter
* updates to the storage controller
* when adding a new version of a table, add the new version as a dependency for the old version
    * this allows us to accurately track ReadHolds for tables and the underlying Persist Shard
* add a good number of test cases to alter-table.slt
* delete duplicate table_alter.slt
* add a platform-check for ALTER TABLE
* add a (disabled) parallel-workload case for ALTER TABLE
* add a legacy upgrade test for ALTER TABLE
* acquire a read hold on the 'original' collection while the alter is running
* update tests for v0.130
* add comments explaining TODOs
* add soft assertions for conditions that we shouldn't hit
@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch from a7728e8 to 4892cc6 Compare January 17, 2025 15:48
@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch from c96f739 to 1a95918 Compare January 17, 2025 17:52
@ParkMyCar ParkMyCar force-pushed the alter_table2/support-adding-columns-to-tables branch from 1a95918 to ff8f4a3 Compare January 17, 2025 17:53
@ParkMyCar ParkMyCar merged commit fa46c3b into MaterializeInc:main Jan 17, 2025
def- added a commit to def-/materialize that referenced this pull request May 7, 2026
The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in MaterializeInc#30470.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
def- added a commit to def-/materialize that referenced this pull request May 8, 2026
The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in MaterializeInc#30470.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
def- added a commit to def-/materialize that referenced this pull request May 20, 2026
The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in MaterializeInc#30470.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
def- added a commit that referenced this pull request May 21, 2026
…ALTER SOURCE ... SET (TIMESTAMP INTERVAL)` (#36449)

The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in #30470.

Nightly: https://buildkite.com/materialize/nightly/builds/16495

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bobbyiliev pushed a commit to bobbyiliev/materialize that referenced this pull request May 21, 2026
…ALTER SOURCE ... SET (TIMESTAMP INTERVAL)` (MaterializeInc#36449)

The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in MaterializeInc#30470.

Nightly: https://buildkite.com/materialize/nightly/builds/16495

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
antiguru pushed a commit to antiguru/materialize that referenced this pull request May 25, 2026
…ALTER SOURCE ... SET (TIMESTAMP INTERVAL)` (MaterializeInc#36449)

The `Op::AlterAddColumn` handler was missing an audit log entry, unlike
every other ALTER operation. This adds an `AlterAddColumnV1` event that
records the table ID, column name, column type, and nullability.

Introduced in MaterializeInc#30470.

Nightly: https://buildkite.com/materialize/nightly/builds/16495

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants