Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pkg-r/NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# querychat (development version)

* Initial CRAN submission.

* Added `prompt_template` support for `querychat_system_prompt()`. (Thank you, @oacar! #37, #45)

* `querychat_init()` now accepts a `client`, replacing the previous `create_chat_func` argument. (#60)
Expand All @@ -25,3 +23,5 @@
* New `querychat_app()` function lets you quickly launch a Shiny app with a querychat chat interface. (#66)

* `querychat_ui()` now adds a `.querychat` class to the chat container and `querychat_sidebar()` adds a `.querychat-sidebar` class to the sidebar, allowing for easier customization via CSS. (#68)

* querychat now uses a separate tool to reset the dashboard. (#80)
15 changes: 11 additions & 4 deletions pkg-r/R/data_source.R
Original file line number Diff line number Diff line change
Expand Up @@ -159,13 +159,19 @@ get_db_type.data_frame_source <- function(source, ...) {
#' @export
get_db_type.dbi_source <- function(source, ...) {
conn <- source$conn
conn_info <- DBI::dbGetInfo(conn)
# default to 'POSIX' if dbms name not found
dbms_name <- purrr::pluck(conn_info, "dbms.name", .default = "POSIX")

# Special handling for known database types
if (inherits(conn, "duckdb_connection")) {
return("DuckDB")
}
if (inherits(conn, "SQLiteConnection")) {
return("SQLite")
}

# default to 'POSIX' if dbms name not found
conn_info <- DBI::dbGetInfo(conn)
dbms_name <- purrr::pluck(conn_info, "dbms.name", .default = "POSIX")

# remove ' SQL', if exists (SQL is already in the prompt)
return(gsub(" SQL", "", dbms_name))
}
Expand Down Expand Up @@ -219,7 +225,8 @@ create_system_prompt.querychat_data_source <- function(
schema = schema,
data_description = data_description,
extra_instructions = extra_instructions,
db_type = db_type
db_type = db_type,
is_duck_db = identical(db_type, "DuckDB")
)
)
}
Expand Down
7 changes: 7 additions & 0 deletions pkg-r/R/querychat.R
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,12 @@ querychat_server <- function(id, querychat_config) {
)
}

reset_query <- function() {
current_query("")
current_title(NULL)
querychat_tool_result(action = "reset")
}

# Preload the conversation with the system prompt. These are instructions for
# the chat model, and must not be shown to the end user.
chat <- client$clone()
Expand All @@ -225,6 +231,7 @@ querychat_server <- function(id, querychat_config) {
tool_update_dashboard(data_source, current_query, current_title)
)
chat$register_tool(tool_query(data_source))
chat$register_tool(tool_reset_dashboard(reset_query))

# Prepopulate the chat UI with a welcome message that appears to be from the
# chat model (but is actually hard-coded). This is just for the user, not for
Expand Down
47 changes: 38 additions & 9 deletions pkg-r/R/querychat_tools.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,38 @@ tool_update_dashboard_impl <- function(
}
}


tool_reset_dashboard <- function(reset_fn) {
ellmer::tool(
reset_fn,
name = "querychat_reset_dashboard",
description = "Resets the data dashboard to show all data.",
arguments = list(),
annotations = ellmer::tool_annotations(
title = "Reset Dashboard",
icon = '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" class="bi bi-arrow-counterclockwise " style="height:1em;width:1em;fill:currentColor;vertical-align:-0.125em;" aria-hidden="true" role="img" ><path fill-rule="evenodd" d="M8 3a5 5 0 1 1-4.546 2.914.5.5 0 0 0-.908-.417A6 6 0 1 0 8 2v1z"></path><path d="M8 4.466V.534a.25.25 0 0 0-.41-.192L5.23 2.308a.25.25 0 0 0 0 .384l2.36 1.966A.25.25 0 0 0 8 4.466z"></path></svg>'
)
)
}

# Perform a SQL query on the data, and return the results as JSON.
# @param query A SQL query; must be a SELECT statement.
# @return The results of the query as a data frame.
tool_query <- function(data_source) {
force(data_source)

ellmer::tool(
function(query) {
function(query, `_intent` = "") {
querychat_tool_result(data_source, query, action = "query")
},
name = "querychat_query",
description = "Perform a SQL query on the data, and return the results.",
arguments = list(
query = ellmer::type_string(
"A SQL query; must be a SELECT statement."
),
`_intent` = ellmer::type_string(
"The intent of the query, in brief natural language for user context."
)
),
annotations = ellmer::tool_annotations(
Expand All @@ -86,7 +103,12 @@ querychat_tool_result <- function(
title = NULL,
action = "update"
) {
action <- rlang::arg_match(action, c("update", "query"))
action <- rlang::arg_match(action, c("update", "query", "reset"))

if (action == "reset") {
query <- ""
title <- NULL
}

res <- tryCatch(
switch(
Expand All @@ -95,7 +117,8 @@ querychat_tool_result <- function(
test_query(data_source, query)
NULL
},
query = execute_query(data_source, query)
query = execute_query(data_source, query),
reset = "The dashboard has been reset to show all data."
),
error = function(err) err
)
Expand All @@ -115,13 +138,13 @@ querychat_tool_result <- function(
)
}

if (!is_error && action == "update") {
if (!is_error && action %in% c("update", "reset")) {
output <- format(
shiny::tags$button(
class = "btn btn-outline-primary btn-sm float-end mt-3 querychat-update-dashboard-btn",
"data-query" = query,
"data-title" = title,
"Apply Filter"
switch(action, update = "Apply Filter", reset = "Reset Filter")
)
)
output <- paste0("\n\n", output)
Expand All @@ -130,19 +153,25 @@ querychat_tool_result <- function(
value <-
switch(
action,
query = res,
update = "Dashboard updated. Use `querychat_query` tool to review results, if needed."
update = "Dashboard updated. Use `querychat_query` tool to review results, if needed.",
res
)

display_md <- switch(
action,
reset = output,
sprintf("```sql\n%s\n```%s", query, output)
)

ellmer::ContentToolResult(
value = if (!is_error) value,
error = if (is_error) res,
extra = list(
display = list(
title = if (action == "update" && !is.null(title)) title,
show_request = is_error,
markdown = sprintf("```sql\n%s\n```%s", query, output),
open = !is_error
markdown = display_md,
open = !is_error && action != "reset"
)
)
)
Expand Down
66 changes: 34 additions & 32 deletions pkg-r/inst/prompt/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,58 +22,58 @@ There are several tasks you may be asked to do:

## Task: Filtering and sorting

The user may ask you to perform filtering and sorting operations on the dashboard; if so, your job is to write the appropriate SQL query for this database. Then, call the tool `update_dashboard`, passing in the SQL query and a new title summarizing the query (suitable for displaying at the top of dashboard). This tool will not provide a return value; it will filter the dashboard as a side-effect, so you can treat a null tool response as success.
The user may ask you to perform filtering and sorting operations on the dashboard; if so, your job is to write the appropriate SQL query for this database. Then, call the tool `querychat_update_dashboard`, passing in the SQL query and a new title summarizing the query (suitable for displaying at the top of dashboard). This tool will not provide a return value; it will filter the dashboard as a side-effect, so you can treat a null tool response as success.

* **Call `update_dashboard` every single time** the user wants to filter/sort; never tell the user you've updated the dashboard unless you've called `update_dashboard` and it returned without error.
* **Call `querychat_update_dashboard` every single time the user wants to filter/sort.** Never tell the user you've updated the dashboard unless you've called `querychat_update_dashboard` and it returned without error.
* The SQL query must be a **{{db_type}} SQL** SELECT query. You may use any SQL functions supported by {{db_type}} SQL, including subqueries, CTEs, and statistical functions.
* The user may ask to "reset" or "start over"; that means clearing the filter and title. Do this by calling `update_dashboard({"query": "", "title": ""})`.
* Queries passed to `update_dashboard` MUST always **return all columns that are in the schema** (feel free to use `SELECT *`); you must refuse the request if this requirement cannot be honored, as the downstream code that will read the queried data will not know how to display it. You may add additional columns if necessary, but the existing columns must not be removed.
* When calling `update_dashboard`, **don't describe the query itself** unless the user asks you to explain. Don't pretend you have access to the resulting data set, as you don't.
* The user may ask to "reset" or "start over"; that means clearing the filter and title. Do this by calling `querychat_reset_dashboard()`.
* Queries passed to `querychat_update_dashboard` MUST always **return all columns that are in the schema** (feel free to use `SELECT *`); you must refuse the request if this requirement cannot be honored, as the downstream code that will read the queried data will not know how to display it. You may add additional columns if necessary, but the existing columns must not be removed.
* When calling `querychat_update_dashboard`, **don't describe the query itself** unless the user asks you to explain. Don't pretend you have access to the resulting data set, as you don't.

For reproducibility, follow these rules as well:

* Optimize the SQL query for **readability over efficiency**.
* Always filter/sort with a **single SQL query** that can be passed directly to `update_dashboard`, even if that SQL query is very complicated. It's fine to use subqueries and common table expressions.
* In particular, you MUST NOT use the `query` tool to retrieve data and then form your filtering SQL SELECT query based on that data. This would harm reproducibility because any intermediate SQL queries will not be preserved, only the final one that's passed to `update_dashboard`.
* Always filter/sort with a **single SQL query** that can be passed directly to `querychat_update_dashboard`, even if that SQL query is very complicated. It's fine to use subqueries and common table expressions.
* In particular, you MUST NOT use the `query` tool to retrieve data and then form your filtering SQL SELECT query based on that data. This would harm reproducibility because any intermediate SQL queries will not be preserved, only the final one that's passed to `querychat_update_dashboard`.
* To filter based on standard deviations, percentiles, or quantiles, use a common table expression (WITH) to calculate the stddev/percentile/quartile that is needed to create the proper WHERE clause.
* Include comments in the SQL to explain what each part of the query does.

Example of filtering and sorting:

> [User]
> Show only rows where the value of x is greater than average.
> [/User]
> [ToolCall]
> update_dashboard({query: "SELECT * FROM table\nWHERE x > (SELECT AVG(x) FROM table)", title: "Above average x values"})
> [/ToolCall]
> [ToolResponse]
> null
> [/ToolResponse]
> [Assistant]
> I've filtered the dashboard to show only rows where the value of x is greater than average.
> [User]
> Show only rows where the value of x is greater than average.
> [/User]
> [ToolCall]
> querychat_update_dashboard({query: "SELECT * FROM table\nWHERE x > (SELECT AVG(x) FROM table)", title: "Above average x values"})
> [/ToolCall]
> [ToolResponse]
> null
> [/ToolResponse]
> [Assistant]
> I've filtered the dashboard to show only rows where the value of x is greater than average.
> [/Assistant]

## Task: Answering questions about the data

The user may ask you questions about the data. You have a `query` tool available to you that can be used to perform a SQL query on the data.
The user may ask you questions about the data. You have a `querychat_query` tool available to you that can be used to perform a SQL query on the data.

The response should not only contain the answer to the question, but also, a comprehensive explanation of how you came up with the answer. You can assume that the user will be able to see verbatim the SQL queries that you execute with the `query` tool.
The response should not only contain the answer to the question, but also, a comprehensive explanation of how you came up with the answer. You can assume that the user will be able to see verbatim the SQL queries that you execute with the `querychat_query` tool.

Always use SQL to count, sum, average, or otherwise aggregate the data. Do not retrieve the data and perform the aggregation yourself--if you cannot do it in SQL, you should refuse the request.

Example of question answering:

> [User]
> What are the average values of x and y?
> [/User]
> [ToolCall]
> query({query: "SELECT AVG(x) AS average_x, AVG(y) as average_y FROM table"})
> [/ToolCall]
> [ToolResponse]
> [{"average_x": 3.14, "average_y": 6.28}]
> [/ToolResponse]
> [Assistant]
> The average value of x is 3.14. The average value of y is 6.28.
> [User]
> What are the average values of x and y?
> [/User]
> [ToolCall]
> query({query: "SELECT AVG(x) AS average_x, AVG(y) as average_y FROM table"})
> [/ToolCall]
> [ToolResponse]
> [{"average_x": 3.14, "average_y": 6.28}]
> [/ToolResponse]
> [Assistant]
> The average value of x is 3.14. The average value of y is 6.28.
> [/Assistant]

## Task: Providing general help
Expand All @@ -90,8 +90,10 @@ If you find yourself offering example questions to the user as part of your resp
* <span class="suggestion">Suggestion 3.</span>
```

{{#is_duck_db}}
## DuckDB SQL tips

* `percentile_cont` and `percentile_disc` are "ordered set" aggregate functions. These functions are specified using the WITHIN GROUP (ORDER BY sort_expression) syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the first argument. For example, `percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)])` is equivalent to `quantile_cont(column, fraction ORDER BY column [(ASC|DESC)])`.

{{extra_instructions}}
{{/is_duck_db}}
{{extra_instructions}}
Loading