feat(pkg-py): add PolarsLazySource for Polars LazyFrame support#191
Merged
feat(pkg-py): add PolarsLazySource for Polars LazyFrame support#191
Conversation
bb84cdd to
438c92e
Compare
Add a new DataSource implementation that keeps Polars LazyFrames lazy
until the render boundary. Key changes:
- Add `AnyFrame` type alias (`Union[nw.DataFrame, nw.LazyFrame]`)
- Widen DataSource ABC return types to support lazy frames
- Implement `PolarsLazySource` using Polars SQLContext for lazy SQL
- Update `normalize_data_source()` to detect and route LazyFrames
- Collect LazyFrames at render boundary in `app()` method
- Update type hints throughout
Usage:
```python
import polars as pl
from querychat import QueryChat
lf = pl.scan_parquet("large_data.parquet")
qc = QueryChat(data_source=lf, table_name="data")
```
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PolarsLazySource._polars_dtype_to_sql was mapping pl.Time to "TIMESTAMP" but it should map to "TIME". Time-only values are not timestamps. Also added noqa comment for PLR0911 (too many return statements) since the function now has 7 return statements after the fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
438c92e to
e5555b3
Compare
Previously, test_query only validated schema structure via collect_schema() without executing the query. This meant runtime errors (e.g., invalid casts) wouldn't surface until actual collection. Now test_query collects one row to catch runtime errors, matching the behavior of DataFrameSource.test_query. The return type changes from LazyFrame to DataFrame since we've already done the work. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The noqa: A005 comment was accidentally removed from types/__init__.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This was referenced Jan 14, 2026
- Remove lazy_frame_demo.py example script - Fix empty LazyFrame handling in get_schema to prevent .row() failure - Add .head() limit when collecting unique values to reduce memory usage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
The abstract test_query() method was declared to return AnyFrame but all concrete implementations return nw.DataFrame. This is intentional since test_query collects data to catch runtime errors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…LazyFrame support - Batch categorical value collection into single scan using implode() instead of N separate scans (one per categorical column) - Extract _get_categorical_values() helper method for clarity - Rename AnyFrame to LazyOrDataFrame for better readability - Store native Polars LazyFrame internally instead of narwhals wrapper - Simplify df_to_html() implementation - Improve error messages for unsupported LazyFrame backends Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…eta dataclass Introduce a ColumnMeta dataclass to consolidate column metadata into a single data structure, replacing multiple parallel lists and dicts. Changes: - Add ColumnMeta dataclass with name, sql_type, kind, min_val, max_val, categories - Refactor get_schema() into three clear steps: classify, add stats, format - Extract static helper methods: _make_column_meta, _add_column_stats, _format_schema - Use .row(0, named=True) consistently for extracting aggregate results - Fix test to check native LazyFrame identity instead of wrapper identity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update DataFrameSource to require narwhals DataFrame as input, removing implicit conversion from raw pandas/polars DataFrames. Update all tests to wrap DataFrames with nw.from_native(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace specific benchmark numbers with qualitative explanation of lazy evaluation benefits (deferred loading, query optimization, reduced memory). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Source Remove conditional that skipped range output when both min/max were None, matching DataFrameSource behavior of always showing range info. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflicts and integrate LazyFrame support with new multi-framework architecture: - Update _querychat_base.py with PolarsLazySource support in normalize_data_source - Add LazyFrame handling in _shiny.py (collect before render) - Update _shiny_module.py with LazyOrDataFrame type - Keep GT-based df_to_html in _utils.py - Combine dev dependencies (polars) with new docs dependencies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update polars tests to wrap DataFrames with nw.from_native() - Fix df_to_html test to match actual truncation message format Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update StateDictAccessorMixin.df() to return LazyOrDataFrame - Update AppState.get_current_data() to return LazyOrDataFrame - Update StreamlitQueryChat.df() to return LazyOrDataFrame Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
cpsievert
commented
Jan 15, 2026
cpsievert
commented
Jan 15, 2026
cpsievert
commented
Jan 15, 2026
|
|
||
| @abstractmethod | ||
| def execute_query(self, query: str) -> nw.DataFrame: | ||
| def execute_query(self, query: str) -> DataOrLazyFrame: |
Contributor
Author
There was a problem hiding this comment.
This will be an annoying thing to workaround as a user with type checking enabled.
I have some ideas on how to improve this, but I think it's better to address that in a follow up PR.
LazyFrame doesn't have .shape or .to_pandas() attributes directly - these require the frame to be collected first. Added isinstance checks and collection calls in _dash.py, _gradio.py, and _streamlit.py. Also fixed test assertion in test_df_to_html.py to match actual implementation output format. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PolarsLazySourceDataSource implementation for Polars LazyFramesSQLContextfor native lazy SQL executionAnyFrametype alias to support both DataFrame and LazyFrame return typesUsage
If using the Quick start
.app(), just drop in aLazyFrame, same as you would aDataFrame.If using the
.df()to build a custom app, note that this will return the filteredLazyFrame, so you may need to.collect()as needed.Changes
_datasource.py: AddAnyFrametype alias, widen ABC return types, implementPolarsLazySource_querychat.py: Updatenormalize_data_source()to detect LazyFrames, collect at render boundary_querychat_module.py: UpdateServerValues.dftype hinttools.py: Handle LazyFrames in query tooldata-sources.qmd: Add documentation for LazyFrame supportPolarsLazySource, 1 integration testPerformance
With a 100K row test dataset:
Demo script for manual verification
Run with:
Sample output (100K rows):
Test plan
PolarsLazySourcetests cover init, execute_query, get_data, get_schema, test_query🤖 Generated with Claude Code