Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d988214
feat(pkg-py): add PolarsLazySource for Polars LazyFrame support
cpsievert Jan 14, 2026
36e4d61
chore: add polars to dev dependencies, remove skip logic
cpsievert Jan 14, 2026
e5555b3
fix(pkg-py): correct Time type mapping in PolarsLazySource schema
cpsievert Jan 14, 2026
83abc5e
fix(pkg-py): make PolarsLazySource.test_query actually execute the query
cpsievert Jan 14, 2026
0577842
fix(pkg-py): restore noqa comment for A005 lint rule
cpsievert Jan 14, 2026
582f26d
docs: add LazyFrame section and demo script
cpsievert Jan 14, 2026
76409f7
Merge main into feat/py-lazy-frames
cpsievert Jan 15, 2026
1f62cd9
fix(pkg-py): address PR feedback for LazyFrame support
cpsievert Jan 15, 2026
791975f
fix(pkg-py): correct test_query return type in abstract DataSource
cpsievert Jan 15, 2026
f737755
undo unnecessary gitignore
cpsievert Jan 15, 2026
45e1ca5
Cleanup changelog
cpsievert Jan 15, 2026
315764c
refactor(pkg-py): optimize PolarsLazySource.get_schema() and cleanup …
cpsievert Jan 15, 2026
1aa5865
refactor(pkg-py): simplify PolarsLazySource.get_schema() with ColumnM…
cpsievert Jan 15, 2026
5bfd083
docs(pkg-py): add docstrings for ColumnMeta attributes
cpsievert Jan 15, 2026
0f37105
fix(pkg-py): require narwhals DataFrame for DataFrameSource
cpsievert Jan 15, 2026
1175d98
docs(pkg-py): improve LazyFrame performance explanation
cpsievert Jan 15, 2026
c2ae48d
fix(pkg-py): always show range for numeric/date columns in PolarsLazy…
cpsievert Jan 15, 2026
82b52db
Merge main into feat/py-lazy-frames
cpsievert Jan 15, 2026
5dc37b0
fix(pkg-py): update tests for narwhals DataFrame requirement
cpsievert Jan 15, 2026
761154c
fix(pkg-py): update return types for LazyOrDataFrame support
cpsievert Jan 15, 2026
f05dc81
Simplify
cpsievert Jan 15, 2026
8856fcc
Revert unnecessary change
cpsievert Jan 15, 2026
1d37c06
Better name on union type
cpsievert Jan 15, 2026
2f727a5
Apply suggestions from code review
cpsievert Jan 15, 2026
659b60b
fix(pkg-py): collect LazyFrame before accessing shape/to_pandas
cpsievert Jan 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pkg-py/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### New features

* Added `PolarsLazySource` to support Polars LazyFrames as data sources. Data stays lazy until the render boundary, enabling efficient handling of large datasets. Pass a `polars.LazyFrame` directly to `QueryChat()` and queries will be executed lazily via Polars' SQLContext.

* Added support for Gradio, Dash, and Streamlit web frameworks in addition to Shiny. Import from the new submodules:
* `from querychat.gradio import QueryChat`
* `from querychat.dash import QueryChat`
Expand Down
70 changes: 66 additions & 4 deletions pkg-py/docs/data-sources.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ title: Data Sources
lightbox: true
---

`querychat` supports several different data sources, including:
`querychat` supports several different data sources, including:

1. Any [narwhals-compatible](https://narwhals-dev.github.io/narwhals/) data frame.
2. Any [SQLAlchemy](https://www.sqlalchemy.org/) database.
3. A custom [DataSource](reference/types.DataSource.qmd) interface/protocol.
2. Polars LazyFrames for efficient handling of large datasets.
3. Any [SQLAlchemy](https://www.sqlalchemy.org/) database.
4. A custom [DataSource](reference/types.DataSource.qmd) interface/protocol.

The sections below describe how to use each type of data source with `querychat`.

Expand Down Expand Up @@ -63,7 +64,68 @@ app = qc.app()

:::

If you're [building an app](build.qmd), note you can read the queried data frame reactively using the `df()` method, which returns a `narwhals.DataFrame`. Call `.to_native()` on the result to get the underlying pandas or polars DataFrame.
If you're [building an app](build.qmd), note you can read the queried data frame reactively using the `df()` method, which returns a `narwhals.DataFrame` (or `narwhals.LazyFrame` for lazy sources). Call `.to_native()` on the result to get the underlying pandas or polars DataFrame.

## Polars LazyFrames {#lazy-frames}

For large datasets, you can use [Polars LazyFrames](https://docs.pola.rs/user-guide/lazy/using/) to keep data on disk until it's actually needed. This is particularly useful when:

- Your dataset is too large to fit comfortably in memory
- You only need filtered or aggregated subsets of the data
- You want faster startup times for your application

With lazy evaluation, data stays on disk and queries are optimized by Polars before execution. Only the final results are loaded into memory.

```{.python filename="lazy-app.py"}
import polars as pl
from querychat import QueryChat

# Scan a large parquet file (doesn't load data yet!)
lf = pl.scan_parquet("large_dataset.parquet")

# Pass the LazyFrame directly to QueryChat
qc = QueryChat(lf, "sales")
app = qc.app()
```

::: {.callout-tip}
### Why use lazy evaluation?

The lazy approach can be significantly faster for large datasets because:

- **Deferred loading**: Data stays on disk until actually needed, so startup is nearly instant
- **Query optimization**: Polars optimizes the query plan before execution, potentially skipping unnecessary columns and rows
- **Reduced memory**: Only the filtered/aggregated results are loaded into memory, not the entire dataset

This is especially beneficial when users typically query small subsets of a large dataset.
:::

You can create LazyFrames from various sources:

```python
# From parquet (most efficient)
lf = pl.scan_parquet("data.parquet")

# From CSV
lf = pl.scan_csv("data.csv")

# From multiple files
lf = pl.scan_parquet("data/*.parquet")

# From an existing DataFrame
df = pl.read_csv("data.csv")
lf = df.lazy()
```

When using a LazyFrame source, the `df()` method returns a `narwhals.LazyFrame`. Call `.collect()` to materialize the results when needed:

```python
# Get the lazy result
result_lazy = qc.df()

# Materialize when ready
result_df = result_lazy.collect()
```

## Databases

Expand Down
11 changes: 9 additions & 2 deletions pkg-py/src/querychat/_dash.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from typing import TYPE_CHECKING, Literal, Optional, cast

import narwhals.stable.v1 as nw
from chatlas import Turn

from ._dash_ui import IDs, card_ui, chat_container_ui, chat_messages_ui
Expand Down Expand Up @@ -374,6 +375,9 @@ def update_display(state_data: AppStateDict, reset_clicks):
sql_code = f"```sql\n{state.get_display_sql()}\n```"

df = state.get_current_data()
# Collect if lazy before accessing .to_pandas() or .shape
if isinstance(df, nw.LazyFrame):
df = df.collect()

display_df = df.to_pandas()
table_data = display_df.to_dict("records")
Expand Down Expand Up @@ -404,8 +408,11 @@ def update_display(state_data: AppStateDict, reset_clicks):
)
def export_csv(n_clicks: int, state_data: AppStateDict):
state = deserialize_state(state_data)
df = state.get_current_data().to_pandas()
return send_data_frame(df.to_csv, "querychat_data.csv", index=False)
df = state.get_current_data()
# Collect if lazy before converting to pandas
if isinstance(df, nw.LazyFrame):
df = df.collect()
return send_data_frame(df.to_pandas().to_csv, "querychat_data.csv", index=False)


def register_chat_callbacks(
Expand Down
Loading