Skip to content

feat: include outputFacets and inputFacets in Dataset API responses#3103

Open
psaikaushik wants to merge 2 commits intoMarquezProject:mainfrom
psaikaushik:feat/1746-include-output-input-facets
Open

feat: include outputFacets and inputFacets in Dataset API responses#3103
psaikaushik wants to merge 2 commits intoMarquezProject:mainfrom
psaikaushik:feat/1746-include-output-input-facets

Conversation

@psaikaushik
Copy link
Copy Markdown

Summary

Updates the DatasetDao and DatasetVersionDao SQL queries to include facets of type 'output' in API responses, in addition to the existing 'dataset', 'unknown', and 'input' types.

Closes #1746

Problem

The OpenLineage spec includes outputFacets and inputFacets on Datasets reported by a RunEvent. Integrations like Spark and GreatExpectations report facets such as OutputStatistics and DataQualityMetrics using these fields.

Marquez already correctly stores these facets in the dataset_facets table with the appropriate type (INPUT or OUTPUT) via DatasetFacetsDao. However, the read queries in DatasetDao and DatasetVersionDao filtered with:

(df.type ILIKE 'dataset' OR df.type ILIKE 'unknown' OR df.type ILIKE 'input')

This excluded 'output' type facets (e.g., OutputStatistics) from API responses.

Fix

Added OR df.type ILIKE 'output' to the facet type filter in:

DAO Query Method Change
DatasetDao findDatasetByName Added 'output'
DatasetDao findAll Added 'output'
DatasetVersionDao findByUuid Added 'output'
DatasetVersionDao findAll Added 'output'

Note: DatasetVersionDao.findBy already includes all facet types (no type filter), so no change needed there.

Impact

API responses for Datasets and DatasetVersions will now include facets like:

  • OutputStatistics (row count, byte size)
  • DataQualityMetrics
  • DataQualityAssertions
  • Any other facets reported as outputFacets or inputFacets

Update the DatasetDao SQL queries (findDatasetByName and findAll) to
include facets of type 'output' in addition to the existing 'dataset',
'unknown', and 'input' types.

Previously, facets reported as outputFacets (e.g., OutputStatistics) or
inputFacets (e.g., DataQualityMetrics, DataQualityAssertions) by
OpenLineage integrations were stored in the dataset_facets table with
their correct type but were filtered out by the query because it only
matched 'dataset', 'unknown', and 'input' types.

Note: 'input' type was already included in findDatasetByName but
'output' was missing from both queries. The findAll query was also
missing 'output'.

Closes MarquezProject#1746
Update DatasetVersionDao SQL queries (findByUuid and findAll) to also
include facets of type 'output' alongside the existing 'dataset',
'unknown', and 'input' types.

This ensures that facets like OutputStatistics reported by OpenLineage
integrations are returned in DatasetVersion API responses.
@boring-cyborg boring-cyborg bot added the api API layer changes label Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Marquez Dataset APIs don't return outputFacets or inputFacets

1 participant