Skip to content

feat: Add named parameter support for DuckDB read_parquet#5563

Merged
max-sixty merged 2 commits intoPRQL:mainfrom
max-sixty:5548
Nov 17, 2025
Merged

feat: Add named parameter support for DuckDB read_parquet#5563
max-sixty merged 2 commits intoPRQL:mainfrom
max-sixty:5548

Conversation

@max-sixty
Copy link
Copy Markdown
Member

Summary

Adds support for DuckDB's read_parquet optional boolean parameters to address issue #5548.

Users can now control DuckDB-specific behavior when reading Parquet files:

std.read_parquet 'data.parquet' union_by_name:true
std.read_parquet 'data.parquet' union_by_name:true binary_as_string:true

Changes

  • Added 4 optional boolean parameters to read_parquet function:

    • binary_as_string (default: false) - Load binary columns as strings
    • file_row_number (default: false) - Include file_row_number column
    • hive_partitioning (default: null) - Interpret path as Hive partitioned
    • union_by_name (default: false) - Union columns by name instead of position
  • Note: The filename parameter was intentionally excluded as it's deprecated since DuckDB v1.3.0 (automatically added as a virtual column)

Implementation Details

  • Parameters are defined in std.prql with defaults
  • DuckDB-specific SQL generation in std.sql.prql maps parameters to DuckDB's SQL named arguments
  • Generic SQL implementation accepts parameters for signature compatibility but only uses source (consistent with PRQL's dialect handling)

Test Results

  • ✅ All 608 core tests pass
  • ✅ New tests verify correct SQL generation with named arguments
  • ✅ Backward compatibility maintained (existing code continues to work)

Example SQL Output

read_parquet(
  'data.parquet',
  binary_as_string = false,
  file_row_number = false,
  hive_partitioning = NULL,
  union_by_name = true
)

Fixes #5548

🤖 Generated with Claude Code

max-sixty and others added 2 commits November 17, 2025 13:26
Add support for DuckDB's read_parquet optional boolean parameters:
- binary_as_string: Load binary columns as strings
- file_row_number: Include file_row_number column
- hive_partitioning: Interpret path as Hive partitioned
- union_by_name: Union columns by name instead of position

Note: The 'filename' parameter was excluded as it's deprecated since
DuckDB v1.3.0 (automatically added as virtual column).

Fixes PRQL#5548

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The read_parquet function now explicitly outputs all parameters with
their default values in the generated SQL, making the behavior more
transparent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@max-sixty max-sixty merged commit a258339 into PRQL:main Nov 17, 2025
37 checks passed
@max-sixty max-sixty deleted the 5548 branch November 17, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support named arguments in DuckDB function read_parquet

1 participant