Skip to content

Replace node-sql-parser with sqlparser-ts#61111

Merged
pierrejeambrun merged 1 commit intoapache:mainfrom
guan404ming:replace-with-sqlparser-rs
Feb 16, 2026
Merged

Replace node-sql-parser with sqlparser-ts#61111
pierrejeambrun merged 1 commit intoapache:mainfrom
guan404ming:replace-with-sqlparser-rs

Conversation

@guan404ming
Copy link
Copy Markdown
Member

@guan404ming guan404ming commented Jan 27, 2026

TL;DR

Replaced node-sql-parser with sqlparser-ts
This new one is 14x smaller, 5x faster and is a library I actively maintain.

Why

  • node-sql-parser adds a massive ~88 MB footprint to the package just to handle SQL string detection.
  • The existing implementation lacked the broad SQL dialect coverage needed for diverse user queries.

How

  • Switched to sqlparser-ts, a library I develop and maintain, which brings the power of Rust’s sqlparser from Apache DataFusion to JS/TS via WebAssembly. It offers superior performance with zero JS/TS dependencies.
  • Refactored detectLanguage.ts to use sqlparser-ts's validate() function instead of node-sql-parser's parse() for SQL detection
  • Added comprehensive unit tests for detectLanguage covering JSON, SQL, Bash, YAML and plain text detection

Benchmark

This optimization slashed the package size by 93% and boosted validation speed by 5.4x, while delivering superior SQL compatibility and maintaining stable load times compare to pure js implementation.

Metric Before After Change
Package size 87 MB 6.4 MB -93%
JS bundle 7,846 KB 5,248 KB -33%
Validate speed 70.6 μs/call 12.86 μs/call 5.48x faster
Build time ~44s ~27s -39%
Load time ~459.6ms ~463.9ms +0.94%
Compatibility fail in complex SQL in SQLite near fully support and update actively -

More detailed benchmark between sqlparser-ts and node-sql-parser --> https://github.com/guan404ming/sqlparser-ts/tree/main/benchmark

Bundle Analyzer Treemap

before

image

after

image
Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Code with Opus 4.5 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added the area:UI Related to UI/UX. For Frontend Developers. label Jan 27, 2026
@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 3 times, most recently from fd75111 to 63e11de Compare January 27, 2026 11:01
@guan404ming
Copy link
Copy Markdown
Member Author

guan404ming commented Jan 27, 2026

Load Time Benchmark

Result

Metric main (node-sql-parser) this pr (sqlparser-ts)
Avg Page Load 459.6ms 463.9ms
Median Page Load 446.3ms 450.2ms
Min 382.0ms 359.1ms
Max 549.0ms 643.2ms
Avg DOM Content Loaded 456.4ms 460.3ms
Individual Values 549, 439, 382, 390, 433, 438, 453, 519, 469, 524 443, 452, 368, 359, 406, 449, 480, 574, 643, 467

No measurable performance difference. Both branches are within noise margin (~4ms difference in median)


Config

  • Browser: Playwright Chromium (Default)
  • Runs: 10 (with 1 warmup run excluded)
  • Wait Condition: networkidle
  • test page: TI render template tab with sql inside
image

Playwright Script

async (page) => {
  const url = 'http://localhost:28080/dags/long_log_dag/runs/manual__2026-01-27T09:58:47.311305+00:00/tasks/execute_query/rendered_templates';
  const runs = 10;
  const results = [];

  // Warm up
  await page.goto(url, { waitUntil: 'networkidle' });

  for (let i = 0; i < runs; i++) {
    await page.goto(url, { waitUntil: 'networkidle' });

    const timing = await page.evaluate(() => {
      const nav = performance.getEntriesByType('navigation')[0];
      return {
        pageLoad: nav.loadEventEnd - nav.startTime,
        domContentLoaded: nav.domContentLoadedEventEnd - nav.startTime,
      };
    });

    results.push(timing);
  }

  const avg = (arr) => (arr.reduce((a, b) => a + b, 0) / arr.length).toFixed(1);
  const median = (arr) => {
    const s = [...arr].sort((a, b) => a - b);
    const m = Math.floor(s.length / 2);
    return s.length % 2 ? s[m] : (s[m - 1] + s[m]) / 2;
  };

  const p = results.map(r => r.pageLoad);
  const d = results.map(r => r.domContentLoaded);

  return {
    runs,
    pageLoad: { 
      avg: avg(p), 
      median: median(p).toFixed(1), 
      min: Math.min(...p).toFixed(1), 
      max: Math.max(...p).toFixed(1), 
      values: p.map(v => v.toFixed(1)) 
    },
    domContentLoaded: { 
      avg: avg(d), 
      median: median(d).toFixed(1), 
      min: Math.min(...d).toFixed(1), 
      max: Math.max(...d).toFixed(1) 
    },
  };
}

Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guan404ming for this proposal, indeed such optimization would be a good addition I believe.

I don't feel super confident adding a third party dependency freshly created and that hasn't a track record of 'working' in production. (and being popular amongst the community).

Do you know if there are more popular alternatives? If there is not what would it take to directly embed datafusion-sqlparser-rs within airflow without relying on an external package?

Just exploring our options here.

@guan404ming
Copy link
Copy Markdown
Member Author

guan404ming commented Jan 30, 2026

Thanks for the feedback! Totally valid concern regarding the new dependency.

Do you know if there are more popular alternatives?

Here is some alternatives but most of them are either support specific dialects or lack of maintenance. After this investigation, I then decide to start to build by myself.

  • node-sql-parser --> current pkg, large and have some compat issue for different dialects
  • pgsql-parser --> only support pg
  • dt-sql-parser --> build for flink / spark
  • js-sql-parser --> outdated, last release is 2 years ago
  • antlr4ts --> only provide engine. we need to built from scratch, which is high maintenance needed

If there is not what would it take to directly embed datafusion-sqlparser-rs within airflow without relying on an external package

I've also consider this option before. But since the core logic of datafusion-sqlparser-rs is written on Rust, embedding it directly would force Airflow to introduce a Rust toolchain and WASM build steps (like wasm-pack) into the CI/CD pipeline, which increases the barrier and lots code. So packaging it separately to abstracts this complexity away, keeping the Airflow focus on Python/JS is kind of better solution if we want to take speed and size advantage of the datafusion-sqlparser-rs

I don't feel super confident adding a third party dependency freshly created and that hasn't a track record of 'working' in production. (and being popular amongst the community).

Technically, I think the package is robust. I’ve ported the test suite directly from the upstream Rust repo to typescript to ensure the behavior matches the source of truth 1:1. On top of that, I’ve included comprehensive unit tests in this PR to guarantee it integrates with the Airflow UI without causing regressions.

I completely understand the hesitation with relying on a fresh personal repo. I will continue promoting this package to other projects to build a stronger track record. My ideal target is try to donate and transfer this package to the Apache Datafusion upstream. This would make it into an community-governed artifact, removing the dependency on a single individual. However I hope we can still keep this conversation open. I’d really appreciate it if you could give the pkg a spin to see the value. Thanks!

@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 5 times, most recently from 206a3fb to 49d1485 Compare February 7, 2026 12:25
@pierrejeambrun
Copy link
Copy Markdown
Member

pierrejeambrun commented Feb 10, 2026

I checked with Jarek and we think it's a great opportunity to make this solution more known and used across the community.

We already trusted similar packages in the past such as https://github.com/kaxil/boring-cyborg from kaxil

Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, looking good to me.

Do you mind rebasing the PR, this will make it easier for me to test locally.

@pierrejeambrun pierrejeambrun added this to the Airflow 3.2.0 milestone Feb 10, 2026
@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 2 times, most recently from ac0bb70 to 578e1fa Compare February 11, 2026 02:25
@guan404ming
Copy link
Copy Markdown
Member Author

guan404ming commented Feb 11, 2026

I'm glad to hear that! Thank you and Jarek for the trust. I will keep maintaining this pkg and ensuring it continues to provide value to the community and also be reliable.

Do you mind rebasing the PR, this will make it easier for me to test locally.

Sure, just rebased.

@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 2 times, most recently from 73c062f to 57a036f Compare February 12, 2026 13:57
@pierrejeambrun pierrejeambrun merged commit c5206f5 into apache:main Feb 16, 2026
78 checks passed
OscarLigthart pushed a commit to OscarLigthart/airflow that referenced this pull request Feb 17, 2026
choo121600 pushed a commit to choo121600/airflow that referenced this pull request Feb 22, 2026
Subham-KRLX pushed a commit to Subham-KRLX/airflow that referenced this pull request Mar 4, 2026
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
Ankurdeewan pushed a commit to Ankurdeewan/airflow that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:UI Related to UI/UX. For Frontend Developers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants