-
Notifications
You must be signed in to change notification settings - Fork 29
Update Python README to match R #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
b599e3d
Update Python README to match R
jcheng5 06ebb42
add a few more clarifications in sections
chendaniely ba8f3d0
add link to sidebot template
chendaniely 8787a5d
Fix typo
jcheng5 f1eca6f
sync r readme with python readme
chendaniely a98ee58
update r readme
chendaniely f43464b
Merge pull request #15 from posit-dev/r-readme
chendaniely 0496f9a
Merge pull request #12 from chendaniely/python-readme-clarify
chendaniely File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,14 @@ | ||
| # querychat for Python | ||
|
|
||
| Chat with your Shiny Python apps using natural language. | ||
| # querychat: Chat with Shiny apps (Python) | ||
|
|
||
| Imagine typing questions like these directly into your Shiny dashboard, and seeing the results in realtime: | ||
|
|
||
| * "Show only data from 2008 with highway MPG greater than 30." | ||
| * "What's the average city MPG for SUVs vs compact cars?" | ||
| * "Sort the data by highway fuel efficiency descending." | ||
| * "Show only penguins that are not species Gentoo and have a bill length greater than 50mm." | ||
| * "Show only blue states with an incidence rate greater than 100 per 100,000 people." | ||
| * "What is the average mpg of cars with 6 cylinders?" | ||
|
|
||
| querychat is a drop-in component for Shiny that allows users to query a data frame using natural language. The results are available as a reactive data frame, so they can be easily used from Shiny outputs, reactive expressions, downloads, etc. | ||
|
|
||
| **This is not as terrible an idea as you might think!** We need to be very careful when bringing LLMs into data analysis, as we all know that they are prone to hallucinations and other classes of errors. querychat is designed to excel in reliability, transparency, and reproducibility by using this one technique: denying it raw access to the data, and forcing it to write SQL queries instead. See the section below on ["How it works"](#how-it-works) for more. | ||
|
|
||
| ## Installation | ||
|
|
||
|
|
@@ -18,7 +20,7 @@ pip install "querychat @ git+https://github.com/posit-dev/querychat#subdirectory | |
|
|
||
| First, you'll need access to an LLM that supports tools/function calling. querychat uses [chatlas](https://github.com/posit-dev/chatlas) to interface with various providers. | ||
|
|
||
| Here's a minimal example (see [examples/app.py](examples/app.py) for an unabridged version): | ||
| Here's a very minimal example that shows the three function calls you need to make: | ||
|
|
||
| ```python | ||
| from pathlib import Path | ||
|
|
@@ -28,16 +30,17 @@ from shiny import App, render, ui | |
|
|
||
| import querychat | ||
|
|
||
| # 1. Configure querychat. This is where you specify the dataset and can also | ||
| # override options like the greeting message, system prompt, model, etc. | ||
| titanic = load_dataset("titanic") | ||
|
|
||
| # 1. Configure querychat | ||
| querychat_config = querychat.init(titanic, "titanic") | ||
|
|
||
| # Create UI | ||
| app_ui = ui.page_sidebar( | ||
| # 2. Place the chat component in the sidebar | ||
| # 2. Use querychat.sidebar(id) in a ui.page_sidebar. | ||
| # Alternatively, use querychat.ui(id) elsewhere if you don't want your | ||
| # chat interface to live in a sidebar. | ||
| querychat.sidebar("chat"), | ||
| # Main panel with data viewer | ||
| ui.output_data_frame("data_table"), | ||
| title="querychat with Python", | ||
| fillable=True, | ||
|
|
@@ -46,73 +49,194 @@ app_ui = ui.page_sidebar( | |
|
|
||
| # Define server logic | ||
| def server(input, output, session): | ||
| # 3. Initialize querychat server with the config from step 1 | ||
| # 3. Create a querychat object using the config from step 1. | ||
| chat = querychat.server("chat", querychat_config) | ||
|
|
||
| # 4. Display the filtered dataframe | ||
| # 4. Use the filtered/sorted data frame anywhere you wish, via the | ||
| # chat["df"]() reactive. | ||
| @render.data_frame | ||
| def data_table(): | ||
| # Access filtered data via chat.df() reactive | ||
| return chat["df"]() | ||
|
|
||
|
|
||
| # Create Shiny app | ||
| app = App(app_ui, server) | ||
| ``` | ||
|
|
||
| ## Features | ||
| ## How it works | ||
|
|
||
| ### Powered by LLMs | ||
|
|
||
| querychat's natural language chat experience is powered by LLMs. You may use any model that [chatlas](https://github.com/posit-dev/chatlas) supports that has the ability to do tool calls, but we currently recommend (as of March 2025): | ||
|
|
||
| * GPT-4o | ||
| * Claude 3.5 Sonnet | ||
| * Claude 3.7 Sonnet | ||
|
|
||
| In our testing, we've found that those models strike a good balance between accuracy and latency. Smaller models like GPT-4o-mini are fine for simple queries but make surprising mistakes with moderately complex ones; and reasoning models like o3-mini slow down responses without providing meaningfully better results. | ||
|
|
||
| The small open source models (8B and below) we've tested have fared extremely poorly. Sorry. 🤷 | ||
|
|
||
| ### Powered by SQL | ||
|
|
||
| querychat does not have direct access to the raw data; it can _only_ read or filter the data by writing SQL `SELECT` statements. This is crucial for ensuring relability, transparency, and reproducibility: | ||
|
|
||
| - **Reliability:** Today's LLMs are excellent at writing SQL, but bad at direct calculation. | ||
| - **Transparency:** querychat always displays the SQL to the user, so it can be vetted instead of blindly trusted. | ||
| - **Reproducibility:** The SQL query can be easily copied and reused. | ||
|
|
||
| Currently, querychat uses DuckDB for its SQL engine. It's extremely fast and has a surprising number of [statistical functions](https://duckdb.org/docs/stable/sql/functions/aggregates.html#statistical-aggregates). | ||
|
|
||
| ## Customizing querychat | ||
|
|
||
| ### Provide a greeting (recommended) | ||
|
|
||
| When the querychat UI first appears, you will usually want it to greet the user with some basic instructions. By default, these instructions are auto-generated every time a user arrives; this is slow, wasteful, and unpredictable. Instead, you should create a file called `greeting.md`, and when calling `querychat.init`, pass `greeting=Path("greeting.md").read_text()`. | ||
|
|
||
| You can provide suggestions to the user by using the `<span class="suggestion"> </span>` tag. | ||
|
|
||
| For example: | ||
|
|
||
| ```markdown | ||
| * **Filter and sort the data:** | ||
| * <span class="suggestion">Show only survivors</span> | ||
| * <span class="suggestion">Filter to first class passengers under 30</span> | ||
| * <span class="suggestion">Sort by fare from highest to lowest</span> | ||
|
|
||
| * **Answer questions about the data:** | ||
| * <span class="suggestion">What was the survival rate by gender?</span> | ||
| * <span class="suggestion">What's the average age of children who survived?</span> | ||
| * <span class="suggestion">How many passengers were traveling alone?</span> | ||
| ``` | ||
|
|
||
| These suggestions appear in the greeting and automatically populate the chat text box when clicked. | ||
| This gives the user a few ideas to explore on their own. | ||
| You can see this behavior in our [`querychat template`](https://shiny.posit.co/py/templates/querychat/). | ||
|
|
||
| If you need help coming up with a greeting, your own app can help you! Just launch it and paste this into the chat interface: | ||
|
|
||
| > Help me create a greeting for your future users. Include some example questions. Format your suggested greeting as Markdown, in a code block. | ||
|
|
||
| querychat uses LLMs to generate SQL queries from natural language, offering: | ||
| And keep giving it feedback until you're happy with the result, which will then be ready to be pasted into `greeting.md`. | ||
|
|
||
| - **Reliability**: LLMs are excellent at writing SQL but bad at direct calculation | ||
| - **Transparency**: SQL is always displayed to the user | ||
| - **Reproducibility**: Generated SQL can be copied and reused elsewhere | ||
| Alternatively, you can completely suppress the greeting by passing `greeting=""`. | ||
|
|
||
| ## Customization | ||
| ### Augment the system prompt (recommended) | ||
|
|
||
| ### Configure with your own data | ||
| In LLM parlance, the _system prompt_ is the set of instructions and specific knowledge you want the model to use during a conversation. querychat automatically creates a system prompt which is comprised of: | ||
|
|
||
| 1. The basic set of behaviors the LLM must follow in order for querychat to work properly. (See `querychat/prompt/prompt.md` if you're curious what this looks like.) | ||
| 2. The SQL schema of the data frame you provided. | ||
| 3. (Optional) Any additional description of the data you choose to provide. | ||
| 4. (Optional) Any additional instructions you want to use to guide querychat's behavior. | ||
|
|
||
| #### Data description | ||
|
|
||
| If you give querychat your dataset and nothing else, it will provide the LLM with the basic schema of your data: | ||
|
|
||
| - Column names | ||
| - DuckDB data type (integer, float, boolean, datetime, text) | ||
| - For text columns with less than 10 unique values, we assume they are categorical variables and include the list of values. This threshold is configurable. | ||
| - For integer and float columns, we include the range | ||
|
|
||
| And that's all the LLM will know about your data. | ||
| The actual data does not get passed into the LLM. | ||
| We calculate these values before we pass the schema information into the LLM. | ||
|
|
||
| If the column names are usefully descriptive, it may be able to make a surprising amount of sense out of the data. But if your data frame's columns are `x`, `V1`, `value`, etc., then the model will need to be given more background info--just like a human would. | ||
|
|
||
| To provide this information, use the `data_description` argument. For example, if you're using the `titanic` dataset, you might create a `data_description.md` like this: | ||
|
|
||
| ```markdown | ||
| This dataset contains information about Titanic passengers, collected for predicting survival. | ||
|
|
||
| - survived: Survival (0 = No, 1 = Yes) | ||
| - pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) | ||
| - sex: Sex of passenger | ||
| - age: Age in years | ||
| - sibsp: Number of siblings/spouses aboard | ||
| - parch: Number of parents/children aboard | ||
| - fare: Passenger fare | ||
| - embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) | ||
| - class: Same as pclass but as text | ||
| - who: Man, woman, or child | ||
| - adult_male: Boolean for adult males | ||
| - deck: Deck of the ship | ||
| - embark_town: Town of embarkation | ||
| - alive: Survival status as text | ||
| - alone: Whether the passenger was alone | ||
| ``` | ||
|
|
||
| which you can then pass via: | ||
|
|
||
| ```python | ||
| querychat_config = querychat.init( | ||
| df=titanic, | ||
| table_name="titanic", | ||
| data_description=Path("data_description.md").read_text() | ||
| ) | ||
| ``` | ||
|
|
||
| querychat doesn't need this information in any particular format; just put whatever information, in whatever format, you think a human would find helpful. | ||
|
|
||
| #### Additional instructions | ||
|
|
||
| You can add additional instructions of your own to the end of the system prompt, by passing `extra_instructions` into `querychat.init`. | ||
|
|
||
| ```python | ||
| querychat_config = querychat.init( | ||
| df=my_dataframe, | ||
| table_name="my_data", # Optional: defaults to "data" | ||
| greeting="Welcome! Ask me anything about the data.", | ||
| data_description=""" | ||
| This dataset contains information about... | ||
| - column1: Description of column 1 | ||
| - column2: Description of column 2 | ||
| """, | ||
| extra_instructions="Please use British English spelling conventions." | ||
| df=titanic, | ||
| table_name="titanic", | ||
| extra_instructions=[ | ||
| "You're speaking to a British audience--please use appropriate spelling conventions.", | ||
| "Use lots of emojis! 😃 Emojis everywhere, 🌍 emojis forever. ♾️", | ||
| "Stay on topic, only talk about the data dashboard and refuse to answer other questions." | ||
| ] | ||
| ) | ||
| ``` | ||
|
|
||
| You can also put these instructions in a separate file and use `Path("instructions.md").read_text()` to load them, as we did for `data_description` above. | ||
|
|
||
| **Warning:** It is not 100% guaranteed that the LLM will always—or in many cases, ever—obey your instructions, and it can be difficult to predict which instructions will be a problem. So be sure to test extensively each time you change your instructions, and especially, if you change the model you use. | ||
|
|
||
| ### Use a different LLM provider | ||
|
|
||
| You can change the LLM provider or model by providing a callback function that takes a `system_prompt` argument and returns a `chatlas.Chat` object. (The parameter must be called `system_prompt` as it is passed by keyword.) | ||
| By default, querychat uses GPT-4o via the OpenAI API. If you want to use a different model, you can provide a `create_chat_callback` function that takes a `system_prompt` parameter, and returns a chatlas Chat object: | ||
|
|
||
| ```python | ||
| import chatlas | ||
| from functools import partial | ||
|
|
||
| # Option 1: Define a function | ||
| def my_chat_func(system_prompt: str) -> chatlas.Chat: | ||
| return chatlas.ChatAnthropic( | ||
| model="claude-3-5-sonnet-latest", | ||
| model="claude-3-7-sonnet-latest", | ||
| system_prompt=system_prompt | ||
| ) | ||
|
|
||
| # Option 2: Use partial | ||
| my_chat_func = partial(chatlas.ChatAnthropic, model="claude-3-7-sonnet-latest") | ||
|
|
||
| querychat_config = querychat.init( | ||
| df=my_dataframe, | ||
| df=titanic, | ||
| table_name="titanic", | ||
| create_chat_callback=my_chat_func | ||
| ) | ||
| ``` | ||
|
Comment on lines
202
to
225
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's really good you have these examples. Shiny prides itself in not having to write a callback, so seeing that term can be a bit scary. I suspect most people will use Option 1. |
||
|
|
||
| ## How it works | ||
| This would use Claude 3.7 Sonnet instead, which would require you to provide an API key. See the [chatlas documentation](https://github.com/posit-dev/chatlas) for more information on how to authenticate with different providers. | ||
|
|
||
| ## Complete example | ||
|
|
||
| querychat works by: | ||
| For a complete working example, see the [examples/app.py](examples/app.py) file in the repository. This example includes: | ||
|
|
||
| 1. Converting your pandas DataFrame to a DuckDB table | ||
| 2. Creating a system prompt with your data schema | ||
| 3. Setting up tools that let the LLM execute SQL queries | ||
| 4. Processing natural language to SQL queries | ||
| 5. Returning filtered data to your Shiny app | ||
| - Loading a dataset | ||
| - Reading greeting and data description from files | ||
| - Setting up the querychat configuration | ||
| - Creating a Shiny UI with the chat sidebar | ||
| - Displaying the filtered data in the main panel | ||
|
|
||
| See the `examples/` directory for more complete examples. | ||
| If you have Shiny installed, and want to get started right away, you can use our | ||
| [querychat template](https://shiny.posit.co/py/templates/querychat/) | ||
| or | ||
| [sidebot template](https://shiny.posit.co/py/templates/sidebot/). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can provide suggestions to the user by using the
<span class="suggestion"> </span>tag.For example:
This will provide a clickable message in the greeting that will auto populate the chat text box. This gives the user a few ideas to explore on their own. You can see this behavior in our
querychat template.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
<span class="suggestion">is a good example to point out for the greeting. It really makes the greeting flow better, and we should showcase how to do it. It really polishes up the application and chat.This was a suggestion of the block of text to put in this part of the README