Add sqlc templates for arbitrary text substitution (minimal) #794

brandur · 2025-03-07T03:58:28Z

This one aims to give us a workable resolution to one of our most common
problems with sqlc. Namely, that although it allows substitution for
parameters that work with a prepared query, it can't replace arbitrary
parts of a SQL query, leading to operations that aren't possible so that
we either don't do them or end up degrading to raw SQL that's only
checked at runtime.

Here, we add a sqlctemplate package that's designed to be run from
inside custom implementations of sqlc's DBTX interface so that it it
runs after sqlc's generated code but before the query goes to Postgres.

In sqlc code, templates look like this:

-- name: JobCountByState :one
SELECT count(*)
FROM /* TEMPLATE: schema */river_job
WHERE state = @state;

The template replacement is modeled as a comment so that it doesn't
interfere with with sqlc's parsing of SQL syntax. The above is valid SQL
with or without the template, but with it, sqlctemplate can add an
arbitrary schema name to the queried table.

It also supports a form of syntax where a value is required for SQL to
be valid. For example, WHERE and ORDER BY clauses both require a
value for them to be valid. Here, a stand in value is provide between
template tags. It's processed by sqlc's parser, but then replace by the
template engine before the SQL is executed:

-- name: JobList :many
SELECT *
FROM river_job
WHERE /* TEMPLATE_BEGIN: where_clause */ 1 /* TEMPLATE_END */
ORDER BY /* TEMPLATE_BEGIN: order_by_clause */ id /* TEMPLATE_END */
LIMIT @max::int;

Template values are injected via context (don't love this, but there's
no other way in getting information down to a layer below DBTX):

ctx = sqlctemplate.WithReplacements(ctx, map[string]sqlctemplate.Replacement{
    "order_by_clause": {Value: params.OrderByClause},
    "where_clause":    {Value: params.WhereClause},
}, params.NamedArgs)

jobs, err := dbsqlc.New().JobList(ctx, e.dbtx, params.Max)

The template engine is written to be root out as many error as possible
by noticing if a template replacement is passed that doesn't have an
equivalent template in SQL, or if a template in SQL is present for which
there's no replacement.

Named args are support in templates similar to how sqlc supports them.
This allows pgx's prepared statement cache to continue to operate as it
did before, thereby keeping everything fast.

Lastly, I should note that templates are meant as a utility of last
resort. All effort should be made to resolve problems via mainstream
sqlc, and only bring in templates when there's no other option.

brandur · 2025-03-07T04:39:55Z

@bgentry Alright, so this one's the same as #792 except that I'e reverted the changes to job listing + use of injected schema, and added a test suite for the templating package.

If you're okay with it I'd propose:

Reviewing and getting this merged.
Cut release of River for other packages to use an updated rivershared.
In parallel:
1. Start updating the concurrency query.
2. I can go back and update JobList based on Add sqlc templates for arbitrary text substitution #792. An explicit schema will be a little more work (all queries would need to be updated), but we could look into that from there.

@max

This one aims to give us a workable resolution to one of our most common problems with sqlc. Namely, that although it allows substitution for parameters that work with a prepared query, it can't replace arbitrary parts of a SQL query, leading to operations that aren't possible so that we either don't do them or end up degrading to raw SQL that's only checked at runtime. Here, we add a `sqlctemplate` package that's designed to be run from inside custom implementations of sqlc's `DBTX` interface so that it it runs after sqlc's generated code but before the query goes to Postgres. In sqlc code, templates look like this: -- name: JobCountByState :one SELECT count(*) FROM /* TEMPLATE: schema */river_job WHERE state = @State; The template replacement is modeled as a comment so that it doesn't interfere with with sqlc's parsing of SQL syntax. The above is valid SQL with or without the template, but with it, `sqlctemplate` can add an arbitrary schema name to the queried table. It also supports a form of syntax where a value is required for SQL to be valid. For example, `WHERE` and `ORDER BY` clauses both require a value for them to be valid. Here, a stand in value is provide between template tags. It's processed by sqlc's parser, but then replace by the template engine before the SQL is executed: -- name: JobList :many SELECT * FROM river_job WHERE /* TEMPLATE_BEGIN: where_clause */ 1 /* TEMPLATE_END */ ORDER BY /* TEMPLATE_BEGIN: order_by_clause */ id /* TEMPLATE_END */ LIMIT @max::int; Template values are injected via context (don't love this, but there's no other way in getting information down to a layer below `DBTX`): ctx = sqlctemplate.WithReplacements(ctx, map[string]sqlctemplate.Replacement{ "order_by_clause": {Value: params.OrderByClause}, "where_clause": {Value: params.WhereClause}, }, params.NamedArgs) jobs, err := dbsqlc.New().JobList(ctx, e.dbtx, params.Max) The template engine is written to be root out as many error as possible by noticing if a template replacement is passed that doesn't have an equivalent template in SQL, or if a template in SQL is present for which there's no replacement. Named args are support in templates similar to how sqlc supports them. This allows pgx's prepared statement cache to continue to operate as it did before, thereby keeping everything fast. Lastly, I should note that templates are meant as a utility of last resort. All effort should be made to resolve problems via mainstream sqlc, and only bring in templates when there's no other option.

bgentry

Looks great!

brandur · 2025-03-08T04:34:39Z

Excellent, thanks. Let me get this in and start work on formally moving JobList over.

Here, reimplement the job list API's internals using sqlc using the new sqlc template system introduced in #794. Admittedly this one isn't quite as big of a win as we expect the templating to be in some cases, but it gets rid of a case of SQL code in a raw string, and lets us drop the `JobListFields` driver interface function. We don't add tests because the APIs are relatively well tested already, but make some small modifications to the driver `JobList` test cases so they're a little more thorough.

Here, follow up #794 to proof out the injection of an explicit schema into SQL queries so that putting River in a custom schema doesn't require the use of `search_path`. We use the newly introduced sqlc template package to prefix table names like so: -- name: JobCountByState :one SELECT count(*) FROM /* TEMPLATE: schema */river_job WHERE state = @State; The main advantage of this approach compared to `search_path` is that connections no longer require that their `search_path` be set with an explicit value to work. `search_path` can be set or misset, but because table names are prefixed with the right schema name already, queries still go through. This is especially useful in the context of PgBouncer, where a valid `search_path` setting can't be guaranteed.

Follows up the addition of `sqlctemplate` in #794. I noticed while adding functionality in with templates that it was quite easy to (1) add a `/* TEMPLATE` tag to `river_job.sql`, (2) put in context parameters to a driver, but then (3) forget to run `make generate`. The context parameters are injected, but `sqlctemplate` no ops with a fast short circuit because there's no `/* TEMPLATE` tag present in the generated Go code that the driver is executing. This leads to confusion. Here, add a few more error conditions: * If a context container is present without any `/* TEMPLATE` tags, error. * If any `/* TEMPLATE` tags are present without a context container, error. This makes dumb bugs easier to catch because we get an explicit error instead of them failing silently. Tests are updated to check for them.

bgentry · 2025-03-09T00:08:56Z

rivershared/sqlctemplate/sqlc_template.go

+// Be careful not to place a template on a line by itself because sqlc will
+// strip any lines that start with a comment. For example, this does NOT work:
+//
+//	-- name: JobList :many
+//	SELECT *
+//	FROM river_job
+//	/* TEMPLATE_BEGIN: where_clause */
+//	LIMIT @max::int;


Is this actually true @brandur? I have some comments in another branch that I would actually love to be stripped out of the final query, but they're not, regardless of whether they're on a -- or /* */ comment line all by themselves.

Ah, it seems to only happen when the line doesn't have any leading whitespace, i.e. solely the comment and nothing else (similar to comment lines that denote a query). But if you've indented even a single space, the comment is maintained.

Ah yeah, sounds about right. TBH, I find the behavior kind of confusing, and especially given use of prepared statements with pgx, I doubt the comment stripping is really doing that much perf wise and it probably would've been better if they kept everything in.

Here's the sqlc issue on the subject FWIW. No activity in a while:

sqlc-dev/sqlc#2207

Follows up the addition of `sqlctemplate` in #794. I noticed while adding functionality in with templates that it was quite easy to (1) add a `/* TEMPLATE` tag to `river_job.sql`, (2) put in context parameters to a driver, but then (3) forget to run `make generate`. The context parameters are injected, but `sqlctemplate` no ops with a fast short circuit because there's no `/* TEMPLATE` tag present in the generated Go code that the driver is executing. This leads to confusion. Here, add a few more error conditions: * If a context container is present without any `/* TEMPLATE` tags, error. * If any `/* TEMPLATE` tags are present without a context container, error. This makes dumb bugs easier to catch because we get an explicit error instead of them failing silently. Tests are updated to check for them.

…799) Follows up the addition of `sqlctemplate` in #794. I noticed while adding functionality in with templates that it was quite easy to (1) add a `/* TEMPLATE` tag to `river_job.sql`, (2) put in context parameters to a driver, but then (3) forget to run `make generate`. The context parameters are injected, but `sqlctemplate` no ops with a fast short circuit because there's no `/* TEMPLATE` tag present in the generated Go code that the driver is executing. This leads to confusion. Here, add a few more error conditions: * If a context container is present without any `/* TEMPLATE` tags, error. * If any `/* TEMPLATE` tags are present without a context container, error. This makes dumb bugs easier to catch because we get an explicit error instead of them failing silently. Tests are updated to check for them.

Here, reimplement the job list API's internals using sqlc using the new sqlc template system introduced in #794. Admittedly this one isn't quite as big of a win as we expect the templating to be in some cases, but it gets rid of a case of SQL code in a raw string, and lets us drop the `JobListFields` driver interface function. We don't add tests because the APIs are relatively well tested already, but make some small modifications to the driver `JobList` test cases so they're a little more thorough.

A few more small follow ups for #794. I started renaming things late in the process while refactoring it, and a few of the renames didn't end up getting bubbled all the way through. Try to maintain roughly this vernacular: * **Template:** A template itself like `/* TEMPLATE: ...` along with its assigned template name. * **Replacement:** A replacement value for a template, possibly including some metadata like `Stable`. Get rid of the use of `tmpl` in favor of `replacement` for locals.

This one follows up #794 again. That change included a template cache, with the idea being that we could reuse a rendered sqlc query when all input values were expected to be stable. For example, when replacing something like a schema name, we'd presumably always be replacing the template value with the same schema over and over again, so it'd be better to save on the work. But that caching system hadn't adequately been thought through, and wouldn't really work. I realized when I was putting #798 (explicit schemas) together that if you injected two schema values from two different clients then you'd end up using the same cache value, which is wrong. Here, we tool out the cache layer a little more so that it considers input values and named args, which all must be consistent for a cached value to be returned. We also add tests to make sure of it. Building a cache key unfortunately has to rely on concatenating some strings together and presorting map keys for stability, but even with the extra work involved, a cache hit still comes out to be quite a bit faster than a miss (~2.5x faster), so it seems worth doing: $ go test ./rivershared/sqlctemplate -bench=. goos: darwin goarch: arm64 pkg: github.com/riverqueue/river/rivershared/sqlctemplate cpu: Apple M4 BenchmarkReplacer/WithCache-10 1626988 735.7 ns/op BenchmarkReplacer/WithoutCache-10 695517 1710 ns/op PASS ok github.com/riverqueue/river/rivershared/sqlctemplate 3.419s

Here, follow up #794 to proof out the injection of an explicit schema into SQL queries so that putting River in a custom schema doesn't require the use of `search_path`. We use the newly introduced sqlc template package to prefix table names like so: -- name: JobCountByState :one SELECT count(*) FROM /* TEMPLATE: schema */river_job WHERE state = @State; The main advantage of this approach compared to `search_path` is that connections no longer require that their `search_path` be set with an explicit value to work. `search_path` can be set or misset, but because table names are prefixed with the right schema name already, queries still go through. This is especially useful in the context of PgBouncer, where a valid `search_path` setting can't be guaranteed.

Here, follow up #794 to proof out the injection of an explicit schema into SQL queries so that putting River in a custom schema doesn't require the use of `search_path`. We use the newly introduced sqlc template package to prefix table names like so: -- name: JobCountByState :one SELECT count(*) FROM /* TEMPLATE: schema */river_job WHERE state = @State; The main advantage of this approach compared to `search_path` is that connections no longer require that their `search_path` be set with an explicit value to work. `search_path` can be set or misset, but because table names are prefixed with the right schema name already, queries still go through. This is especially useful in the context of PgBouncer, where a valid `search_path` setting can't be guaranteed. Notably, this change doesn't bring in enough new testing to prove that River really works with explicit schema injection, so for the time being this setting remains completely internal. What I'd like to try and do is get some basic infrastructure like this in place first, then prove it out by starting to move the test suite over to schema-specific tests. By virtue of doing that we'd be putting the entire load of the River test suite on the new paths, which should give us high confidence that it's all working as intended.

brandur marked this pull request as draft March 7, 2025 04:05

brandur force-pushed the brandur-sqlc-templates-basic branch 3 times, most recently from 6325450 to ae13b6c Compare March 7, 2025 04:34

brandur requested a review from bgentry March 7, 2025 04:39

brandur marked this pull request as ready for review March 7, 2025 04:40

brandur mentioned this pull request Mar 7, 2025

Add sqlc templates for arbitrary text substitution #792

Closed

brandur force-pushed the brandur-sqlc-templates-basic branch from ae13b6c to f335397 Compare March 7, 2025 04:44

brandur force-pushed the brandur-sqlc-templates-basic branch from f335397 to 433cd72 Compare March 7, 2025 04:46

bgentry approved these changes Mar 7, 2025

View reviewed changes

brandur merged commit ab892c3 into master Mar 8, 2025
10 checks passed

brandur deleted the brandur-sqlc-templates-basic branch March 8, 2025 04:34

brandur mentioned this pull request Mar 8, 2025

Implement job list with sqlc using templates #795

Merged

brandur mentioned this pull request Mar 8, 2025

Proof out explicit schema injection without search_path #798

Merged

brandur mentioned this pull request Mar 8, 2025

Safer template usage with more aggressive errors on likely problems #799

Merged

bgentry reviewed Mar 9, 2025

View reviewed changes

brandur mentioned this pull request Mar 9, 2025

More consistent naming for sqlctemplate + docs #800

Merged

brandur mentioned this pull request Mar 9, 2025

Better thought out caching mechanism that actually works #802

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sqlc templates for arbitrary text substitution (minimal) #794

Add sqlc templates for arbitrary text substitution (minimal) #794

Uh oh!

brandur commented Mar 7, 2025 •

edited

Loading

Uh oh!

brandur commented Mar 7, 2025

Uh oh!

bgentry left a comment

Uh oh!

brandur commented Mar 8, 2025

Uh oh!

Uh oh!

bgentry Mar 9, 2025

Uh oh!

bgentry Mar 9, 2025

Uh oh!

brandur Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add sqlc templates for arbitrary text substitution (minimal) #794

Add sqlc templates for arbitrary text substitution (minimal) #794

Uh oh!

Conversation

brandur commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandur commented Mar 7, 2025

Uh oh!

bgentry left a comment

Choose a reason for hiding this comment

Uh oh!

brandur commented Mar 8, 2025

Uh oh!

Uh oh!

bgentry Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

bgentry Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

brandur Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brandur commented Mar 7, 2025 •

edited

Loading