Skip to content

Implement JOIN and subquery column elimination in dynamic select#284

Open
jongleb wants to merge 1 commit into
ygrek:masterfrom
jongleb:dyn-sel-joi
Open

Implement JOIN and subquery column elimination in dynamic select#284
jongleb wants to merge 1 commit into
ygrek:masterfrom
jongleb:dyn-sel-joi

Conversation

@jongleb

@jongleb jongleb commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR introduces LEFT JOIN elimination for dynamic selects. It also works for subqueries as sources, it means that any columns not used in the final select will be removed automatically."

For the user, there are two cases when it is convenient for the user to use this.

For example

Given this ddl

CREATE TABLE users (
  id INT PRIMARY KEY,
  org_id INT NOT NULL,
  name TEXT NOT NULL,
  email TEXT NOT NULL,
  created_at TIMESTAMP NOT NULL,
  deleted BOOLEAN NOT NULL
);
CREATE TABLE profiles (
  user_id INT PRIMARY KEY,
  bio TEXT,
  avatar_url TEXT,
  location TEXT,
  website TEXT
);
CREATE TABLE billing (
  user_id INT PRIMARY KEY,
  plan TEXT NOT NULL,
  paid_until DATETIME,
  balance INT NOT NULL
);
Before:
-- @user_brief
SELECT u.name, u.email
FROM users u
WHERE u.org_id = @org AND u.deleted = FALSE;

-- @user_card
SELECT u.name, p.bio, p.avatar_url
FROM users u
LEFT JOIN profiles p ON p.user_id = u.id
WHERE u.org_id = @org AND u.deleted = FALSE;

-- @user_admin
SELECT u.id, u.name, u.email, u.created_at,
       p.bio, p.avatar_url, p.location, p.website,
       b.plan, b.paid_until, b.balance
FROM users u
LEFT JOIN profiles p ON p.user_id = u.id
LEFT JOIN billing  b ON b.user_id = u.id
WHERE u.org_id = @org AND u.deleted = FALSE;

We could write three separate queries.Not only does the old approach force us to duplicate fields and parts of the query, but it also requires repeating the exact same WHERE conditions every time.

Since this PR
-- [sqlgg] dynamic_select=true
-- @user_info
SELECT u.id, u.name, u.email, u.created_at,
       p.bio, p.avatar_url, p.location, p.website,
       b.plan, b.paid_until, b.balance
FROM users u
LEFT JOIN profiles p ON p.user_id = u.id
LEFT JOIN billing  b ON b.user_id = u.id
WHERE u.org_id = @org AND u.deleted = FALSE;

And then it could be used this way:

let open Db.User_info_col in

(* 1. single column  *)
List.select db name ~org (fun n -> ...)
(* SELECT u.name FROM users u
   WHERE u.org_id = ? AND u.deleted = FALSE *)

(* 2. a pair of columns *)
List.select db (let+ n = name and+ b = bio in (n, b)) ~org (fun (n, b) -> ...)
(* SELECT u.name, p.bio
   FROM users u LEFT JOIN profiles p ON p.user_id = u.id
   WHERE u.org_id = ? AND u.deleted = FALSE *)

(* 3. any number of columns across tables and joins are pulled in
   automatically*)
List.select db
  (let+ n = name and+ b = bio and+ pl = plan and+ bal = balance in (n, b, pl, bal))
  ~org (fun (n, b, pl, bal) -> ...)
(* SELECT u.name, p.bio, b.plan, b.balance
   FROM users u LEFT JOIN profiles p ON p.user_id = u.id LEFT JOIN billing  b ON b.user_id = u.id
   WHERE u.org_id = ? AND u.deleted = FALSE *)
What kind of joins can be eliminated

Not every join can be eliminated. We can safely remove LEFT JOIN only if right table is joined by unique key and its columns are not used anywhere else in query. In this case join doesn't affect row count or filtering. Extra AND conditions on top of key match are fine, so ON a.id = b.id AND b.type = 'x' is still droppable. But if ON clause doesn't guarantee unique match (for example ON a.id = b.id OR b.type = 'x', inequality or just part of composite key), we must keep this join. Otherwise it can multiply rows and change final result.

Subqueries

Before this PR the pick only affected the outer SELECT. The subquery always ran as written: all columns, all joins.

-- [sqlgg] dynamic_select=true
-- @user_info
SELECT * FROM (
  SELECT u.name, u.email, p.bio, b.plan
  FROM users u
  LEFT JOIN profiles p ON p.user_id = u.id
  LEFT JOIN billing b ON b.user_id = u.id
  WHERE u.org_id = @org AND u.deleted = FALSE
) AS sub;

Before: pick name, pay for everything anyway:

The resulting query

SELECT sub.name
FROM (SELECT u.name, u.email, p.bio, b.plan 
      FROM users u
      LEFT JOIN profiles p ON p.user_id = u.id     
      LEFT JOIN billing  b ON b.user_id = u.id    
      WHERE u.org_id = ? AND u.deleted = FALSE) AS sub

After the subquery narrows together with the pick

(* pick: name *)
List.select db name ~org (fun n -> ...)
(* SELECT * FROM (SELECT u.name
                  FROM users u
                  WHERE u.org_id = ? AND u.deleted = FALSE) AS sub *)

(* pick: name + bio *)
List.select db (let+ n = name and+ b = bio in (n, b)) ~org (fun (n, b) -> ...)
(* SELECT * FROM (SELECT u.name, p.bio
                  FROM users u LEFT JOIN profiles p ON p.user_id = u.id
                  WHERE u.org_id = ? AND u.deleted = FALSE) AS sub *)

(* pick: everything *)
List.select db
  (let+ n = name and+ e = email and+ b = bio and+ pl = plan in (n, e, b, pl))
  ~org (fun (n, e, b, pl) -> ...)
(* SELECT * FROM (SELECT u.name, u.email, p.bio, b.plan
                  FROM users u
                  LEFT JOIN profiles p ON p.user_id = u.id
                  LEFT JOIN billing b ON b.user_id = u.id
                  WHERE u.org_id = ? AND u.deleted = FALSE) AS sub *)

@jongleb jongleb changed the title Add eliminate JOINs to Dynamic select Add eliminate JOINs and consider subqueries to Dynamic select Jun 11, 2026
@jongleb jongleb force-pushed the dyn-sel-joi branch 4 times, most recently from 88b5f44 to bfb9529 Compare June 12, 2026 11:40
Comment thread lib/sql.ml
Comment on lines +753 to +773
let sub_exprs = function
| Value _ | Param _ | Inparam _ | Column _ | Of_values _ | SelectExpr _ -> []
| Choices (_, l) -> List.filter_map snd l
| InChoice (_, _, e) -> [e]
| OptionActions { choice; _ } -> [choice]
| Fun { kind = Agg (With_order { order; _ }); parameters; _ } -> parameters @ List.map fst order
| Fun { parameters; _ } -> parameters
| InTupleList { value = { exprs; _ }; _ } -> exprs
| Case { case; branches; else_ } ->
option_list case
@ List.concat_map (fun (b : case_branch) -> [b.when_; b.then_]) branches
@ option_list else_

let map_sub_exprs f = function
| Value _ | Param _ | Inparam _ | Column _ | Of_values _ | SelectExpr _ as e -> e
| Choices (n, l) -> Choices (n, List.map (fun (n, e) -> n, Option.map f e) l)
| InChoice (n, k, e) -> InChoice (n, k, f e)
| OptionActions ({ choice; _ } as o) -> OptionActions { o with choice = f choice }
| Fun ({ kind = Agg (With_order ({ order; _ } as wo)); parameters; _ } as fn) ->
Fun { fn with
kind = Agg (With_order { wo with order = List.map (fun (e, dir) -> f e, dir) order });

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was writing a tree walkthrough again and realized this is time to move it to a separate place.
Besides, adding a new expression forces us to go through every places adding a new match

https://gitlab.inria.fr/fpottier/visitors
and by the way it’s even easier to get all these maps and traverses by adding this ppx

Comment thread lib/sql.ml
Comment on lines +604 to +627
let sub_vars = function
| Single _ | SingleIn _ | TupleList _ | DynamicSelectJoin _ -> []
| ChoiceIn { vars; _ } -> vars
| OptionActionChoice (_, vars, _, _) -> vars
| SharedVarsGroup (vars, _) -> vars
| Choice (_, ctors) | DynamicSelect (_, ctors) -> List.concat_map ctor_vars ctors

let map_sub_vars f =
let map_ctor = function
| Simple (n, vars) -> Simple (n, Option.map f vars)
| Verbatim _ as c -> c
in
function
| Single _ | SingleIn _ | TupleList _ | DynamicSelectJoin _ as v -> v
| ChoiceIn t -> ChoiceIn { t with vars = f t.vars }
| OptionActionChoice (p, vars, pos, kind) -> OptionActionChoice (p, f vars, pos, kind)
| SharedVarsGroup (vars, id) -> SharedVarsGroup (f vars, id)
| Choice (p, ctors) -> Choice (p, List.map map_ctor ctors)
| DynamicSelect (p, ctors) -> DynamicSelect (p, List.map map_ctor ctors)

let var_pos = function
| Single (p, _) | SingleIn (p, _) -> fst p.id.pos
| Choice (id, _) | DynamicSelect (id, _) | TupleList (id, _)
| OptionActionChoice (id, _, _, _) -> fst id.pos

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#284 (comment)
taken out separately for the same reason

@jongleb jongleb requested a review from ygrek June 12, 2026 19:04
@jongleb jongleb self-assigned this Jun 12, 2026
@jongleb jongleb changed the title Add eliminate JOINs and consider subqueries to Dynamic select Implement JOIN and subquery column elimination in dynamic select Jun 12, 2026
@jongleb jongleb marked this pull request as ready for review June 12, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant