Skip to content

[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824

Open
ayushbilala wants to merge 2 commits into
apache:masterfrom
ayushbilala:master
Open

[SPARK-53840][SQL] Add AS JSON output support for SHOW TABLES and SHOW TABLE EXTENDED#54824
ayushbilala wants to merge 2 commits into
apache:masterfrom
ayushbilala:master

Conversation

@ayushbilala
Copy link
Copy Markdown

What changes were proposed in this pull request?

Support SHOW TABLES ... [AS JSON] and SHOW TABLE EXTENDED ... [AS JSON] to optionally display table listing metadata in JSON format.
SQL syntax:

SHOW TABLES [(IN|FROM) database_name] [[LIKE] pattern] [AS JSON]
SHOW TABLE EXTENDED [(IN|FROM) database_name] LIKE pattern [AS JSON]

Output: json_metadata: String

SHOW TABLES AS JSON:
{"tables":[{"name":"t1","namespace":["db"],"isTemporary":false}]}

SHOW TABLE EXTENDED ... AS JSON additionally includes catalog and type:
{"tables":[{"catalog":"spark_catalog","namespace":["db"],"name":"t1","type":"TABLE","isTemporary":false}]}

Why are the changes needed?

The existing text-based output of SHOW TABLES requires fragile string parsing for programmatic consumption.
A structured JSON format provides a stable, machine-readable contract for tooling and automation.

Does this PR introduce any user-facing change?

Yes. Two new SQL syntax variants that return JSON output. Existing commands without AS JSON are unaffected.
SHOW TABLE EXTENDED with both PARTITION and AS JSON is explicitly rejected.

How was this patch tested?

  • Parser tests in ShowTablesParserSuite for all AS JSON variants and the PARTITION + AS JSON error case.
  • Execution tests in ShowTablesSuiteBase covering JSON schema validation, empty databases, EXTENDED output, and temp view inclusion.
  • Manual verification in spark-shell

Was this patch authored or co-authored using generative AI tooling?

No

@ayushbilala ayushbilala force-pushed the master branch 3 times, most recently from 91aeb50 to 625fbe2 Compare March 18, 2026 05:33
@ayushbilala
Copy link
Copy Markdown
Author

@asl3 Have you had the opportunity to review this pull request yet?

Copy link
Copy Markdown
Contributor

@asl3 asl3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Overall looks good to me.

Can you also add test coverage for temp views (uses a separate listTempViews call)?

@ayushbilala
Copy link
Copy Markdown
Author

@asl3 I have added tests for temp views. Can you please check?

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds AS JSON output mode to SHOW TABLES and SHOW TABLE EXTENDED, threading an asJson: Boolean flag through the parser, logical plan nodes (ShowTables, ShowTablesExtended), resolution (ResolveSessionCatalog, DataSourceV2Strategy), and execution (ShowTablesCommand for V1, ShowTablesExec/ShowTablesExtendedExec for V2). When asJson=true, a single-row result with a json_metadata column is returned. PARTITION + AS JSON is rejected at parse time.

General comments

  • The JSON generation logic is duplicated across three execution classes (ShowTablesCommand, ShowTablesExec, ShowTablesExtendedExec), and findings #1 and #2 below are direct consequences of that duplication — different listing strategies for temp views in V2, different ways to obtain the catalog name in V1. The existing DESCRIBE TABLE ... AS JSON avoids this entirely by using a separate command class (DescribeRelationJsonCommand) — a single UnaryRunnableCommand that pattern-matches on the already-resolved child and handles all catalog types in one place, with no need for separate V1 and V2 implementations. Consider following the same pattern: a single ShowTablesJsonCommand that receives the resolved namespace, gets the catalog from it, and centralizes JSON generation. This eliminates the V1/V2 divergence that caused both bugs.

  • No test checks for the absence of duplicate entries in JSON output — all tests use .find() which returns the first match. Adding an assertion on tables.length (expected count) would catch duplicates.

}.toList

val jsonTempViews = if (CatalogV2Util.isSessionCatalog(catalog)) {
val sessionCatalog = session.sessionState.catalog
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalog.listTables() via V2SessionCatalog calls SessionCatalog.listTables(db, "*", includeLocalTempViews=true) which already includes local temp views. This separate listTempViews() call adds them a second time, causing duplicate entries in the JSON output when the V2 session catalog is used (when useV1Command=false).

The non-JSON path at line 78 does not list temp views separately — it correctly relies on what catalog.listTables() returns.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will remove the redundant sessionCatalog.listTempViews(...) block entirely.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON path has been moved to the new ShowTablesJsonExec which guards the separate listTempViews() call with !CatalogV2Util.isSessionCatalog(catalog), matching the logic of the non-JSON path.

When catalog is V2SessionCatalog, temp views are already included in listTables() results and the separate call is skipped.

Comment on lines +976 to +977
"catalog" -> JString(
sparkSession.sessionState.catalogManager.currentCatalog.name()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentCatalog returns the user's currently active catalog (set via USE), but ShowTablesCommand always operates on the session catalog (matched via ResolvedV1Database). If the user runs e.g. USE my_v2_catalog; SHOW TABLE EXTENDED IN spark_catalog.mydb LIKE '*' AS JSON, this reports "catalog": "my_v2_catalog" instead of "spark_catalog". The V2 exec correctly uses catalog.name() instead.

Suggested change
"catalog" -> JString(
sparkSession.sessionState.catalogManager.currentCatalog.name()),
"catalog" -> JString(
sparkSession.sessionState.catalogManager.v2SessionCatalog.name()),

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the scoping distinction with USE. I'll update the implementation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - now uses sparkSession.sessionState.catalogManager.v2SessionCatalog.name() as suggested. This correctly returns spark_catalog regardless of which catalog the user has set active via USE.

"DESC TABLE COLUMN for a specific partition."
]
},
"SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION" : {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNSUPPORTED_FEATURE sub-classes are alphabetically ordered. SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION (S-H) is placed between DESC_TABLE_COLUMN_PARTITION (D-E) and DROP_DATABASE (D-R), but it should be placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - SHOW_TABLE_EXTENDED_JSON_WITH_PARTITION is now placed between SET_VARIABLE_USING_SET and SQL_CURSOR to maintain alphabetical order.

@ayushbilala
Copy link
Copy Markdown
Author

Addressed all review comments. Please take another pass when you get a chance. Thanks!

case Seq(db) => Some(db)
case _ => None
}
val tempViews = sessionCatalog.listTempViews(db.getOrElse(""), pattern)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For V2SessionCatalog, catalog.listTables(namespace.toArray) already includes local temp views (and globals when ns is global_temp) via SessionCatalog.listTables(db, "*", includeLocalTempViews=true).

This block lists them again, so each temp view is emitted twice: once with type=TABLE, isTemporary=false, once with type=VIEW, isTemporary=true. The same duplicate path was removed from ShowTablesExec.

Should either skip this listTempViews block (matching ShowTablesExec) and use an isTempView-style helper above, or filter temp views out of tables before emission.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON path has been extracted to ShowTablesJsonExec. There is a guard !CatalogV2Util.isSessionCatalog(catalog) to prevent the duplicate listTempViews() fetch when catalog is V2SessionCatalog.

Remaining code in ShowTablesExtendedExec is the text-only path, which is unchanged.

val tables = catalog.listTables(namespace.toArray)
tables.foreach { tableIdent =>
if (StringUtils.filterPattern(Seq(tableIdent.name()), pattern).nonEmpty) {
// V2 persistent tables are always TABLE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "V2 persistent tables are always TABLE" and unconditional type=TABLE, isTemporary=false is wrong when catalog is V2SessionCatalog. Can use a per-row isTempView check (example: ShowTablesExec.isTempView)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. In ShowTablesJsonExec, each identifier from listTables() / listTableAndViewSummaries() is now passed through isTempView(ident), which checks SessionCatalog.isTempView when catalog is V2SessionCatalog.

Temp views returned by listTables() are correctly identified and emitted with type=VIEW, isTemporary=true.

Try(catalog.getTempViewOrPermanentTableMetadata(tableIdent)) match {
case Success(meta) if meta.tableType == CatalogTableType.VIEW => "VIEW"
case Success(_) => "TABLE"
case Failure(_) => "UNKNOWN"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure(_) => "UNKNOWN" swallows any exception from getTempViewOrPermanentTableMetadata. The text path lets exceptions propagate - consider following the same here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - Try/catch block that returned "UNKNOWN" on failure has been removed. getTempViewOrPermanentTableMetadata now propagates exceptions directly, consistent with the text path.

val json = parse(jsonStr)
val tables = (json \ "tables").asInstanceOf[JArray].arr

assert(tables.length == 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test seems failing in CI?

}
}

test("SHOW TABLE EXTENDED AS JSON with both local and global temp views") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems @cloud-fan 's comment about .find() masking duplicates still needs to be addressed here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added assert(tables.length == N) to the tests that used .find() without a count assertion. This will catch silent duplicates.

case class ShowTables(
namespace: LogicalPlan,
pattern: Option[String],
asJson: Boolean = false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently have duplicated JSON emission code across Exec classes. Consider a ShowTablesJsonCommand which takes the resolved namespace/catalog and handles all variants centrally, similar to DescribeRelationJsonCommand

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. We extracted a single ShowTablesJsonExec physical plan node covering both SHOW TABLES AS JSON and SHOW TABLE EXTENDED AS JSON, controlled by an isExtended: Boolean flag.

if (asJson) {
val jsonTables = filteredTables.map { table =>
JObject(
"name" -> JString(table.name()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field order should be consistent between non-extended {name, namespace, isTemporary} and extended {catalog, namespace, name, type, isTemporary}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. JSON code is consolidated in ShowTablesJsonExec.toJsonEntry(). Both schemas now have name as the first field: non-extended is {name, namespace, isTemporary} and extended is {name, catalog, namespace, type, isTemporary}.

V1 path in ShowTablesCommand follows the same ordering.

@ayushbilala ayushbilala force-pushed the master branch 2 times, most recently from e6c7b33 to 23c5cbc Compare May 8, 2026 11:53
@ayushbilala
Copy link
Copy Markdown
Author

@cloud-fan Both addressed.

For (a): we extracted a single ShowTablesJsonExec physical plan node that handles both SHOW TABLES AS JSON (isExtended=false) and SHOW TABLE EXTENDED AS JSON (isExtended=true).

DataSourceV2Strategy now routes asJson=true to it for both variants, eliminating the duplicated JSON emission from ShowTablesExec and ShowTablesExtendedExec. V1 path in ShowTablesCommand remains separate since it operates on a different plan type (ResolvedV1Database).

For (b): added assert(tables.length == N) to the tests that used .find() without a count check.

override def child: LogicalPlan = namespace
override protected def withNewChildInternal(newChild: LogicalPlan): ShowTables =
copy(namespace = newChild)
override protected def stringArgs: Iterator[Any] = Iterator(pattern, output)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asJson: Boolean = false field added to ShowTables (and
the parallel field on ShowTablesExtended and ShowTablesCommand) is then
immediately hidden from treeString by an override protected def stringArgs: Iterator[Any] = Iterator(pattern, output) further down. SHOW TABLES
(3-column schema) and SHOW TABLES AS JSON (1-column json_metadata schema)
produce different output schemas; their treeString should differ. The
override exists to keep golden files passing, but that hides a real plan-level
distinction.

Suggestion: Drop the stringArgs overrides and regenerate the affected
explain/golden files. If a specific suite needs the legacy string, scope the
override narrowly to that suite's plan view rather than to the case class.

}
}

private def runAsJson(sparkSession: SparkSession): Seq[Row] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShowTablesCommand.runAsJson (V1) and ShowTablesJsonExec (V2)
are two emitters that have already drifted: extended catalog name is
v2SessionCatalog.name() here vs catalog.name() in V2; V1 uses
SessionCatalog.listTables (includes temp views) while V2 has a guarded
listTempViews branch.

Suggestion: collapse V1 + V2 into one ShowTablesJsonCommand built in
AstBuilder.visitShowTables / visitShowTableExtended after resolution,
pattern-matching on the resolved child (V1 session, V2 session, non-session
V2). Drop asJson from ShowTables / ShowTablesExtended /
ShowTablesCommand and the corresponding stringArgs overrides. The two emit
functions become one and the drift class closes.

// listTableAndViewSummaries(), fetch them separately. For V2SessionCatalog,
// listTables() already includes local temp views, so we skip this to avoid duplicates.
// For TableViewCatalog (non-extended path), views come from listTableAndViewSummaries().
if (!CatalogV2Util.isSessionCatalog(catalog) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-session V2 catalog branch
(!CatalogV2Util.isSessionCatalog(catalog) && (isExtended || !catalog.isInstanceOf[TableViewCatalog])) fetches session temp views from
sparkSession.sessionState.catalog and passes the V2 catalog's namespace as
the db argument. Local temp views live in the session catalog's reserved
temp database; global temp views live under global_temp. There is no
semantic in which a non-session V2 catalog's namespace contains session temp
views. Two potential problems:

  1. The first match arm passes namespace.head as db. Passing a non-session
    V2 catalog's namespace as the session catalog's temp DB is a category error.
  2. The fallback arm assigns db = "" for multi-component namespaces.
    SessionCatalog.listTempViews interprets "" as local temp DB - silent
    semantic mismatch.

Suggestion: Remove the branch

}
}

test("SHOW TABLES AS JSON returns single row with json_metadata column") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems tests cover happy paths only. The following code paths added by this PR are not exercised -- can you add test coverage for the below?

  1. Pattern-LIKE exclusion. Both V1 (tableIdentifierPattern) and V2
    (StringUtils.filterPattern) run pattern filtering, but every existing filter
    test uses a pattern where all tables match. No test creates foo1 and bar1
    and asserts SHOW TABLES IN ns LIKE 'foo*' AS JSON emits only foo1.
  2. Current-namespace SHOW TABLES AS JSON (no IN). All tests use IN $catalog.ns. The grammar allows SHOW TABLES AS JSON resolving to
    CurrentNamespace (AstBuilder.scala:5798).
  3. V1 vs V2 schema parity. The PR has two emitters but no test asserts
    they produce equivalent JSON for an equivalent input. Add a parity test that
    runs the same query under useV1Command=true and useV1Command=false, parses
    both, and asserts equal tables arrays modulo ordering. This catches future
    drift.
  4. Raw envelope assertion. All assertions parse JSON back and check via
    \. Add one literal assertion on the envelope to catch an accidental rename
    "tables" -> "results".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants