Add metric view support to OSS UC server and Spark connector#1
Open
chenwang-databricks wants to merge 26 commits into
Open
Add metric view support to OSS UC server and Spark connector#1chenwang-databricks wants to merge 26 commits into
chenwang-databricks wants to merge 26 commits into
Conversation
- Add view_dependencies field to CreateTable and TableInfo in OpenAPI spec - Add dependent parameter to GenerateTemporaryTableCredential for view-mediated authorization (definer's rights model) - Create DependencyDAO and DependencyRepository for persisting view dependencies in uc_dependencies table - Update TableRepository to handle METRIC_VIEW table type with dependency storage/retrieval on create, get, and delete - Update TemporaryTableCredentialsService to support view-mediated credential vending via the dependent parameter - Make AuthorizeKey repeatable for multiple authorization parameters - Add BaseMetricViewCRUDTest and SdkMetricViewCRUDTest for metric view CRUD integration testing
- Add MetricViewContext thread-local for passing metric view identity during credential vending (definer's rights model) - Handle METRIC_VIEW table type in UCSingleCatalog.createTable to route metric view creation through UCProxy with proper metadata - Add createMetricView to UCProxy for constructing METRIC_VIEW CreateTable requests with view_definition, schema, and dependency properties - Update build.sbt to support building against Spark 4.2.0-SNAPSHOT with unmanaged JARs from assembly directory - Fix Guava import references: replace org.sparkproject.guava with com.google.common across connector Java files to support Spark 4.2 where Guava is no longer shaded
33b4079 to
665fb83
Compare
…ctured TableInfo - Replace MetricViewContext thread-local with Spark's AnalysisContext.metricViewId - Use CatalogTableType.METRIC_VIEW instead of view.viewWithMetrics property - Handle createTable(ident, TableInfo) for metric views with structured dependencies - Populate view dependencies in listTables for METRIC_VIEW type
Implement GET /tables/{full_name}/metadata-snapshot that returns the
metric view metadata plus resolved source table metadata in a single
call. This allows the Spark connector to resolve source table metadata
without requiring direct SELECT on source tables (definer's rights for
metadata access).
Server changes:
- Add MetadataSnapshot model and endpoint to OpenAPI spec
- Implement getMetadataSnapshot in TableRepository with dependency resolution
- Add authorized endpoint in TableService (SELECT on metric view only)
- Fix backtick typo in handleDependentCredentialRequest method name
Connector changes:
- Add metadataSnapshotCache to UCSingleCatalog companion object
- Call metadata snapshot API in loadMetricView and cache source table info
- Check cache in loadTable before calling getTable (consume-once semantics)
Tests:
- Add SdkMetricViewAccessControlTest with 8 test cases covering both
credential vending (4 tests) and metadata snapshot (4 tests) permissions
…mpatibility Replace the custom GET /metadata-snapshot endpoint with the standard POST /metadata-and-permissions-snapshot (MAPS) endpoint to align with Databricks UC wire format. This enables the OSS connector to work against both OSS UC and Databricks UC without code changes. Key changes: - OpenAPI: new MAPS endpoint with nested response shape (MetadataAndPermissionsSnapshotResponse wrapping MetadataSnapshotResponse), rename TableResult.missing_reason to reason, add Dependent/TableDependent schemas for structured credential dependent field - Server: new MetadataSnapshotService handler for MAPS, remove old per-table GET endpoint from TableService, update credential handler to parse nested Dependent structure - Connector: call getMetadataAndPermissionsSnapshot, unwrap nested response, construct structured Dependent for credential vending, convert metadataSnapshotCache to ThreadLocal for thread safety - Tests: update all 9 integration tests in SdkMetricViewAccessControlTest
Per UC TLG decision, external/untrusted engines cannot be trusted to enforce definer's rights. OSS Spark uses invoker's rights: users must have SELECT on both the metric view and all source tables. Removed: - MAPS endpoint (MetadataSnapshotService, route, OpenAPI schemas) - Structured Dependent/TableDependent types from OpenAPI spec - dependent parameter from GenerateTemporaryTableCredential - Definer's rights logic in TemporaryTableCredentialsService - ThreadLocal metadataSnapshotCache in connector - MAPS call and AnalysisContext.setMetricViewId in loadMetricView - Dependent construction in credential vending Kept: - view_definition and view_dependencies on CreateTable/TableInfo - uc_dependencies table and DependencyDAO - DependencyList/Dependency/TableDependency/FunctionDependency schemas - Metric view CRUD and backward compatibility Tests: Rewrote SdkMetricViewAccessControlTest from 9 definer's rights tests to 4 invoker's rights tests (all passing).
The @repeatable annotation and AuthorizeKeys container were only needed for the definer's rights implementation (multiple @AuthorizeKey on the credential vending parameter). No longer needed with invoker's rights.
The original file had no functional changes from definer's rights removal -- only cosmetic diffs (import reordering, whitespace, unused authorizer parameter). Revert to main version to keep the diff clean.
With invoker's rights, metric views use the same standard permission checks as regular tables. There is no metric-view-specific permission logic to test -- the existing table access control tests already cover this behavior.
Revert Guava import changes (org.sparkproject.guava -> com.google.common) and credential vending reformatting that were unrelated to metric views. The connector diff now only contains metric view additions: - createTable(TableInfo) override with METRIC_VIEW routing - loadMetricView method returning V1Table with YAML - createMetricViewFromTableInfo for V2 catalog create path - METRIC_VIEW detection in loadTable
OSS UC no longer persists view dependencies (no uc_dependencies table). Under invoker's rights, Spark resolves dependencies at query time by parsing the YAML. The view_dependencies field is accepted in CreateTable payloads for wire compatibility with Databricks UC but not persisted. Removed: - DependencyDAO.java and DependencyRepository.java - DependencyRepository from Repositories.java - DependencyDAO from HibernateConfigurator - Dependency create/read/delete logic in TableRepository - Dependency assertions in BaseMetricViewCRUDTest (now tests that view_dependencies is accepted without error)
Remove metric-view-specific branching in createTable(ident, tableInfo). The method now forwards all TableInfo fields (tableType, viewDefinition, viewDependencies, columns, properties) to the UC server generically, working for any table type without type-specific conditional logic.
Now that DelegatingCatalogExtension forwards createTable(Identifier, TableInfo) properly (Spark PR), the override can live on UCProxy like all other TableCatalog methods. UCSingleCatalog simply delegates to the delegate chain without casting or bypassing DeltaCatalog.
Factor out common logic between createTable(TableInfo) and createTable(StructType, ...) into two shared helpers: - initCreateTable: sets name, schema, catalog, comment, properties - convertColumns: converts Spark Column[] to UC ColumnInfo[] Both createTable overloads now use the same base initialization, ensuring consistent behavior for common fields. The TableInfo overload adds tableType, viewDefinition, and viewDependencies on top; the legacy overload adds storage location, data source format, and partitions.
The old createTable(StructType, Transform[], Map) now converts its arguments into a TableInfo and calls createTable(Identifier, TableInfo). All table creation logic lives in one method that handles all fields: tableType, viewDefinition, viewDependencies, columns, partitions, storage location, data source format, and properties.
Merge loadMetricView into loadTable by branching on whether the table has storage (storageLocation != null) rather than checking table type. Tables with storage get credential vending, partition extraction, and storage format. Tables without storage (metric views, future view types) get empty storage, viewText from viewDefinition, and skip credential vending. No more type-specific method.
Re-introduce uc_dependencies persistence so that view_dependencies round-trips through the API: what clients send on create is persisted and returned on read. This makes the API self-consistent and enables non-Spark clients to discover metric view dependencies. Restored: - DependencyDAO and DependencyRepository - DependencyRepository in Repositories and HibernateConfigurator - Dependency create/read/delete logic in TableRepository (via shared attachDependencies helper) - Dependency assertions in BaseMetricViewCRUDTest Also removed unnecessary blank line in UCSingleCatalog.scala.
Server now validates that view_dependencies is provided when creating a metric view, matching how view_definition is already required. Rewrote tests: - testMetricViewCRUD: includes dependencies in payload (mimics Spark), verifies dependency round-trip on GET, tests full CRUD lifecycle - testMetricViewWithSqlSource: tests SQL statement as source with dependencies - testCreateMetricViewWithoutDefinitionFails: negative test - testCreateMetricViewWithoutDependenciesFails: negative test - Fixed VIEW_DEFINITION to use YAML format instead of SQL
UDFs are not supported in metric view expressions (the OSS connector does not implement FunctionCatalog), so function dependencies are not applicable. Removed: - FunctionDependency schema from all.yaml - function field from Dependency schema - FunctionDependency case in connector dependency conversion - FUNCTION handling in DependencyDAO.from()
This reverts commit 31c7ab4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Checklist
docsis updatedDescription of changes
Summary
This PR adds end-to-end support for Spark metric views in Unity Catalog, spanning the OpenAPI spec, server persistence, and the Spark connector. It uses invoker's rights (not definer's rights) -- the querying user must have SELECT on both the metric view and all source tables.
Permission Model: Invoker's Rights
Per UC TLG decision, external/untrusted engines (including OSS Spark) cannot be trusted to enforce definer's rights. This is consistent with how Databricks UC handles non-PE (Single User) clusters. The user's own permissions are checked directly on each table -- no MAPS endpoint, no credential caching, no view-mediated authorization bypass.
Server Changes
OpenAPI spec (
api/all.yaml)METRIC_VIEWtoTableTypeenumview_definition(string) andview_dependencies(DependencyList) toCreateTableandTableInfoschemasDependencyList,Dependency,TableDependency,FunctionDependencyschemasDependency storage
DependencyDAO-- Hibernate entity for theuc_dependenciestable, storing view-to-source-table dependency relationshipsDependencyRepository-- CRUD operations for dependency recordsHibernateConfigurator-- registeredDependencyDAOfor automatic table creationTable repository (
TableRepository.java)createTable()-- handlesMETRIC_VIEWtable type: stores view definition, schema, and persists dependenciesgetTable()-- forMETRIC_VIEWtables, loads and attaches dependency information to the responsedeleteTable()-- cascading deletion of associated dependency recordsTests
BaseMetricViewCRUDTest/SdkMetricViewCRUDTest-- integration tests for metric view CRUD with dependency tracking (2 tests)SdkMetricViewAccessControlTest-- invoker's rights access control tests (4 tests):Connector Changes
UCSingleCatalog.scalacreateTable()-- detectstable_type=METRIC_VIEWand routes to dedicatedcreateMetricViewFromTableInfo()methodcreateMetricViewFromTableInfo()-- constructsCreateTablerequest withMETRIC_VIEWtype, view definition, schema, and dependenciesloadTable()-- detectsMETRIC_VIEWtype and routes toloadMetricView(), which returns aV1TablewithCatalogTableType.METRIC_VIEWand the YAML asviewTextgetTableand credentials are vended with the user's own permissionsCompanion PRs
CreateMetricViewCommand,TableInfofields for metric views,collectTableDependenciesfor SQL source dependency extraction