feat: add DSL output format for graph tools#536
Open
indrajeet0510 wants to merge 2 commits into
Open
Conversation
Opt-in format="dsl" parameter on get_impact_radius_tool, query_graph_tool, and get_review_context_tool returns nodes and edges as one-line strings instead of JSON dicts. Per-row: ~4x nodes, ~2.5x edges. Full-response on a 71-node blast radius: ~8,000 tokens saved per call. Default stays "dict" — fully backwards compatible.
44bd88c to
84f29e6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
format="dsl"parameter toget_impact_radius_tool,query_graph_tool, andget_review_context_tool. When set, the graphpayload comes back as one-line strings instead of JSON dicts. Default
stays
"dict"so nothing existing breaks.Why
I was reading through
graph.pyand noticed thatnode_to_dictandedge_to_dict(around line 1349) get called from roughly half of the 30tools. Every node carries 9 JSON keys, every edge carries 8. On a
get_impact_radiusthat hits the default 500-node cap, that's roughly30k tokens of structural metadata before any actual code shows up.
Two of those fields don't seem to do anything for the LLM consumer: the
internal database
id, andconfidence_tier(which is fully derivablefrom the
confidencefloat — same thresholds you already encodeelsewhere). And in the edge encoding,
sourcealmost always starts withthe same path as
file_path, so that prefix gets duplicated on everyedge in the response.
Felt like there was real headroom to claw back without changing any
semantics:
Same information, about 4× fewer characters per node. Edges compress
roughly 2.5× once you strip the redundant
file_path::prefix fromsource. Each response also carries alegendfield documenting thecodes so the LLM doesn't need external docs to decode the lines.
Numbers
I wrote a reproducible benchmark in
scripts/bench_dsl_output.py. Itbuilds a synthetic 50-file backend service with cross-module fan-out and
runs
get_impact_radiusagainst it. On a typical 71-node blast radius:Worth flagging that per-row compression is more like 3-4×, but the
whole-response ratio comes out closer to 2.2× because the envelope
(summary, file lists, the legend itself) is identical in both modes.
On larger payloads the per-row encoding dominates and the ratio creeps
up toward the per-row number. Run the script to see for yourself.
Tests
26 new tests in
tests/test_dsl_output.pycovering the encoders, thedispatch helpers, control-char sanitization (security parity with the
dict encoder —
_sanitize_nameis called from the DSL encoders too, sothe existing
TestSanitizeNameprotections carry over), and the toolintegration. All pass.
ruff checkandmypy --ignore-missing-imports --no-strict-optionalboth clean on the modified files.Heads up — there are 7 unrelated failures in
tests/test_main.py(TestApplyToolFilter and one in TestLongRunningToolsAreAsync). They
fail on clean
maintoo, doesn't look related to this PR.Notes
"dict"means every existing caller is byte-identical.Nothing changes unless someone explicitly opts in.
detect_changes_toolandtraverse_graph_toolalone eventhough they share the same encoder bottleneck — felt cleaner as
separate follow-up PRs once we agree on the format here.
README.md(added a short paragraph belowthe MCP tools table mentioning the new parameter). The four
translated READMEs (Hindi, Japanese, Korean, Chinese Simplified)
aren't updated — happy to take a follow-up for the Hindi sync if
you'd like, but I'd defer to native speakers for the others.
character / etc. The current shape was a first stab at something
compact-but-still-readable; happy to iterate if you have preferences.