posit-dev · cpsievert · Jan 15, 2026 · Jan 14, 2026 · Jan 14, 2026 · Jan 14, 2026
diff --git a/pkg-py/CHANGELOG.md b/pkg-py/CHANGELOG.md
@@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### New features
 
+* Added `PolarsLazySource` to support Polars LazyFrames as data sources. Data stays lazy until the render boundary, enabling efficient handling of large datasets. Pass a `polars.LazyFrame` directly to `QueryChat()` and queries will be executed lazily via Polars' SQLContext.
+
 * Added support for Gradio, Dash, and Streamlit web frameworks in addition to Shiny. Import from the new submodules:
   * `from querychat.gradio import QueryChat`
   * `from querychat.dash import QueryChat`

diff --git a/pkg-py/docs/data-sources.qmd b/pkg-py/docs/data-sources.qmd
@@ -3,11 +3,12 @@ title: Data Sources
 lightbox: true
 ---
 
-`querychat` supports several different data sources, including: 
+`querychat` supports several different data sources, including:
 
 1. Any [narwhals-compatible](https://narwhals-dev.github.io/narwhals/) data frame.
-2. Any [SQLAlchemy](https://www.sqlalchemy.org/) database.
-3. A custom [DataSource](reference/types.DataSource.qmd) interface/protocol.
+2. Polars LazyFrames for efficient handling of large datasets.
+3. Any [SQLAlchemy](https://www.sqlalchemy.org/) database.
+4. A custom [DataSource](reference/types.DataSource.qmd) interface/protocol.
 
 The sections below describe how to use each type of data source with `querychat`.
 
@@ -63,7 +64,68 @@ app = qc.app()
 
 :::
 
-If you're [building an app](build.qmd), note you can read the queried data frame reactively using the `df()` method, which returns a `narwhals.DataFrame`. Call `.to_native()` on the result to get the underlying pandas or polars DataFrame. 
+If you're [building an app](build.qmd), note you can read the queried data frame reactively using the `df()` method, which returns a `narwhals.DataFrame` (or `narwhals.LazyFrame` for lazy sources). Call `.to_native()` on the result to get the underlying pandas or polars DataFrame.
+
+## Polars LazyFrames {#lazy-frames}
+
+For large datasets, you can use [Polars LazyFrames](https://docs.pola.rs/user-guide/lazy/using/) to keep data on disk until it's actually needed. This is particularly useful when:
+
+- Your dataset is too large to fit comfortably in memory
+- You only need filtered or aggregated subsets of the data
+- You want faster startup times for your application
+
+With lazy evaluation, data stays on disk and queries are optimized by Polars before execution. Only the final results are loaded into memory.
+
+```{.python filename="lazy-app.py"}
+import polars as pl
+from querychat import QueryChat
+
+# Scan a large parquet file (doesn't load data yet!)
+lf = pl.scan_parquet("large_dataset.parquet")
+
+# Pass the LazyFrame directly to QueryChat
+qc = QueryChat(lf, "sales")
+app = qc.app()
+```
+
+::: {.callout-tip}
+### Why use lazy evaluation?
+
+The lazy approach can be significantly faster for large datasets because:
+
+- **Deferred loading**: Data stays on disk until actually needed, so startup is nearly instant
+- **Query optimization**: Polars optimizes the query plan before execution, potentially skipping unnecessary columns and rows
+- **Reduced memory**: Only the filtered/aggregated results are loaded into memory, not the entire dataset
+
+This is especially beneficial when users typically query small subsets of a large dataset.
+:::
+
+You can create LazyFrames from various sources:
+
+```python
+# From parquet (most efficient)
+lf = pl.scan_parquet("data.parquet")
+
+# From CSV
+lf = pl.scan_csv("data.csv")
+
+# From multiple files
+lf = pl.scan_parquet("data/*.parquet")
+
+# From an existing DataFrame
+df = pl.read_csv("data.csv")
+lf = df.lazy()
+```
+
+When using a LazyFrame source, the `df()` method returns a `narwhals.LazyFrame`. Call `.collect()` to materialize the results when needed:
+
+```python
+# Get the lazy result
+result_lazy = qc.df()
+
+# Materialize when ready
+result_df = result_lazy.collect()
+```
 
 ## Databases
 

diff --git a/pkg-py/src/querychat/_dash.py b/pkg-py/src/querychat/_dash.py
@@ -4,6 +4,7 @@
 
 from typing import TYPE_CHECKING, Literal, Optional, cast
 
+import narwhals.stable.v1 as nw
 from chatlas import Turn
 
 from ._dash_ui import IDs, card_ui, chat_container_ui, chat_messages_ui
@@ -374,6 +375,9 @@ def update_display(state_data: AppStateDict, reset_clicks):
         sql_code = f"```sql\n{state.get_display_sql()}\n```"
 
         df = state.get_current_data()
+        # Collect if lazy before accessing .to_pandas() or .shape
+        if isinstance(df, nw.LazyFrame):
+            df = df.collect()
 
         display_df = df.to_pandas()
         table_data = display_df.to_dict("records")
@@ -404,8 +408,11 @@ def update_display(state_data: AppStateDict, reset_clicks):
     )
     def export_csv(n_clicks: int, state_data: AppStateDict):
         state = deserialize_state(state_data)
-        df = state.get_current_data().to_pandas()
-        return send_data_frame(df.to_csv, "querychat_data.csv", index=False)
+        df = state.get_current_data()
+        # Collect if lazy before converting to pandas
+        if isinstance(df, nw.LazyFrame):
+            df = df.collect()
+        return send_data_frame(df.to_pandas().to_csv, "querychat_data.csv", index=False)
 
 
 def register_chat_callbacks(