Heterogeneous multi-graph view: inheritance / type-use / import / field-rw graphs with joins, polyglot edges across languages

## Summary

Promote CLDK from "the call graph + a few accessors" to a **first-class heterogeneous multi-graph substrate**: every relation in a codebase modelled as a typed graph, all sharing a node namespace, joinable against each other, and **followable across language boundaries** through explicit bridge edges (HTTP routes, RPC service definitions, message-bus topics, FFI declarations, ORM ↔ SQL, config-file references).

This is the **substrate** that the composable chain API in #155 queries against. #155 proposed the surface; this issue proposes the data model underneath that makes the chains meaningful for any non-trivial relation.

## The graphs

Today CLDK exposes essentially one graph (call) plus inventory accessors from which other relations can be derived ad-hoc. Each of the following should be a first-class graph with its own `pa.<graph_name>()` accessor:

| Graph | Nodes | Edges | Today |
|---|---|---|---|
| **Call** | callables | "A calls B" | exists (`get_call_graph`) |
| **Inheritance** | classes/interfaces | "A extends/implements B" | derivable, not first-class |
| **Type-use** | types, callables, fields | "callable C declares/returns/accepts type T"; "field F has type T" | not exposed |
| **Module-import** | modules/packages | "module M imports module N" (with symbol granularity where the language has it) | partial via `get_imports` |
| **Field read/write** | fields/globals × callables | "callable C reads field F" / "writes field F" | not exposed |
| **Decorator / annotation** | callables/classes × decorators | "C is decorated D with kwargs K" | partial via `.decorators` attribute |
| **Exception-flow** | callables × exception types | "C raises E"; "C catches E" | not exposed |
| **Configuration-reference** | code symbols × config keys | "callable C reads config key K"; "key K defined in file F" | not exposed |
| **Test-link** | tests × prod callables | "test T exercises callable C" (mined from imports + coverage where available) | not exposed |
| **Resource graph** | I/O sites × resources | "callable C opens file/socket/db connection X" | not exposed |

Every graph shares the **same node namespace** wherever possible, so a callable referenced in the call graph IS the same node referenced in the type-use graph, the field-rw graph, the decorator graph, etc. This is the whole point — it makes joins natural.

## Joins across graphs (homogeneous, within one language)

The killer queries live in the joins. Examples that should be one expression each:

- "Every public-API method (decorator graph) that calls a deprecated method (call graph) whose return type is a database model class (type-use graph)" — refactoring impact
- "Every method that writes field `password` (field-rw graph) but does not catch `EncryptionError` (exception-flow graph)" — security gap
- "Every controller class (inheritance graph) whose method reads config key `enable_legacy_*` (config-ref graph) but has no test coverage (test-link graph)" — risk-prioritized review
- "Every module (import graph) that depends on package `foo` AND exports a class implementing interface `Bar` (inheritance graph)" — license / supply-chain audit

Joins like this are how you turn "I have eight graphs" into "I have a code-analysis substrate." The chain API in #155 is the surface that makes them ergonomic; without the multi-graph substrate the chain is stuck on calls.

## Cross-language: the bridge edges (the real prize)

Real codebases are polyglot. The Odoo audit I just did had Python controllers, XML view definitions, JS frontend hooks, SQL access lists, and YAML config — and the most interesting *trust boundaries* were between languages, not within them. The current state of the art is the analyst stitching these together in their head.

A cross-language multi-graph substrate models the bridging edges explicitly:

| Bridge | Source side | Target side | Linking signal |
|---|---|---|---|
| **HTTP route** | server handler decorated/declared with URL path | client `fetch`/`axios`/`requests` call with URL string | URL string match (or schema like OpenAPI/Swagger when present) |
| **RPC service** | server-side service method (gRPC/Thrift/SOAP) | client stub call | service+method name from `.proto` / IDL |
| **Message bus** | publisher (Kafka/SQS/RabbitMQ/NATS) | subscriber/handler | topic/queue name |
| **ORM ↔ SQL** | ORM model class + field declarations | SQL DDL / migration / hand-written query | table+column names |
| **FFI / shared library** | native declaration | bindings on the other side | symbol name + ABI |
| **Config file ↔ code** | code site reading a key | YAML/JSON/TOML/INI/ENV file defining the key | key path |
| **Template ↔ code** | template variable | view/handler passing it | variable name in scope |
| **Build artifact** | source files of one component | declared input/output of build step | filename / target name in Makefile/Bazel/etc. |
| **Container ↔ binary** | Dockerfile `CMD`/`ENTRYPOINT` | program entry point | path / image layer |
| **Schema ↔ deserialiser** | JSON Schema / Protobuf / Avro / OpenAPI | parse call site | schema reference / mime type |

The substrate should:
1. Detect these bridges automatically using framework recipes (Flask routes, FastAPI, Express, Spring, gRPC `.proto`, OpenAPI specs, common ORM patterns, common message-bus client libs).
2. Allow user-declared bridges for project-specific conventions.
3. Expose them as **typed edges in the multi-graph** so existing chain queries (`reachable_to`, `callers`, `callees`) traverse them transparently when the user opts in (`via=[\"call\", \"http\", \"rpc\"]`).
4. Carry a **confidence label per bridge edge** (see next section).

## Honest visibility: heuristic linking must be visible

Cross-language linking is *mostly heuristic* — a URL string in a JS file is matched to a Python route by string equality and parameter-shape compatibility, not by static proof. The substrate must not pretend otherwise.

Every bridge edge carries:

- `bridge_type`: `http_route` / `rpc_service` / `message_topic` / `orm_table` / `ffi_symbol` / `config_key` / ...
- `confidence`: `static_proof` (e.g. resolved via a typed schema like OpenAPI/protobuf), `string_match` (URL or topic literal matched), `heuristic` (name similarity), `manual` (user-declared)
- `evidence`: the literal/schema reference that produced the link
- `direction`: who sends, who receives

Same visibility model as the within-language graphs (resolved / structural / unresolved from #155), extended across the language boundary. The analyst's confidence tier follows mechanically from the weakest link in the chain.

## Why this matters more for security than for refactoring

For **refactoring**, within-language graphs already cover ~80% of the value because most refactors are within one component. The polyglot story is nice-to-have.

For **security audits**, polyglot is the prize. The real attack surface in a modern app is the seams: untrusted JSON crossing from JS to Python, a topic name shared between two services, a config file flag that disables auth on a route. Today the analyst stitches these seams from grep + memory; a substrate that exposes them as typed graph edges, joinable with the call/type/field graphs on either side, would be transformative.

Concretely, the Odoo audit I just published would extend to:

- Cross-reference each Python controller route against the **XML view definitions** that declare which actions/clients invoke them
- Cross-reference each `@http.route` against the **JS frontend** to see which routes are actually called from the SPA and which are orphans
- Cross-reference each model field against the **`ir.model.access.csv`** ACL and **`ir.rule`** record rules to compute the actual effective permission on the field for each user role

None of that is doable today. All of it is one query each on top of a multi-graph substrate.

## Relationship to #155

- **#155 is the surface**: composable chains over CLDK's graphs.
- **This issue is the substrate**: which graphs exist, that they share a node namespace, that joins are first-class, and that cross-language bridge edges are typed and visible.

They are complementary and roughly independent. Either can ship first; both together are what makes CLDK a general code-analysis substrate (the "pandas of code analysis" framing).

## Suggested incremental rollout

The full vision is big; the incremental path is small:

1. **Promote within-language graphs to first-class.** Inheritance, type-use, module-import, field-rw, decorator, exception-flow each get a `pa.<graph_name>()` accessor returning a graph with a stable node-id scheme shared with the call graph. (Most of these are derivable from the existing analysis; the work is exposing them, not recomputing.)
2. **Add cross-graph joins on shared node ids.** No new analysis; just the surface so users can ask `pa.field_rw().writes_of(\"password\").intersect(pa.exception_flow().not_catching(\"EncryptionError\"))`.
3. **Ship two cross-language bridge types as the proof of concept**: HTTP route (server↔client) and ORM↔SQL. Each driven by a framework recipe (Flask/FastAPI/Django/Express + SQLAlchemy/Django-ORM/Sequelize). Honest confidence labels from day one.
4. **Open a recipe registry** so the community can contribute additional bridge detectors (gRPC, Kafka, NATS, Spring, Rails, etc.). Recipes ship as data, not code.

Step 1 alone is a big win and unlocks #155's chain API to operate on more than the call graph. Step 3 is what makes CLDK uniquely valuable for polyglot security audits.

## Out of scope (separate issues)

- The chain query API itself (#155)
- A framework-recipe registry as its own subsystem (worth its own issue)
- Coverage / dynamic-instrumentation bridges that produce graph edges from runtime observation
- Differential graphs between commits

## The framing

If CLDK is going to be the pandas of code analysis (the broader thesis from the discussion these issues came out of), this is the equivalent of pandas moving from "Series + DataFrame" to "DataFrame + MultiIndex + merge + groupby + cross-table joins." A single graph is a Series. The multi-graph substrate with shared node ids and cross-language bridges is the DataFrame join — which is where pandas went from useful to indispensable.

---

Context: same Odoo audit and `poe-with-cldk` skill methodology that motivated #155. Within one Python file, the call graph carried me most of the way; the next step up — auditing the seams between the Python controllers, the XML actions, the JS frontend, the ACL CSVs, and the SQL access patterns — runs into a wall today because there is no substrate that represents those relations as joinable typed graphs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heterogeneous multi-graph view: inheritance / type-use / import / field-rw graphs with joins, polyglot edges across languages #156

Summary

The graphs

Joins across graphs (homogeneous, within one language)

Cross-language: the bridge edges (the real prize)

Honest visibility: heuristic linking must be visible

Why this matters more for security than for refactoring

Relationship to #155

Suggested incremental rollout

Out of scope (separate issues)

The framing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Graph	Nodes	Edges	Today
Call	callables	"A calls B"	exists (`get_call_graph`)
Inheritance	classes/interfaces	"A extends/implements B"	derivable, not first-class
Type-use	types, callables, fields	"callable C declares/returns/accepts type T"; "field F has type T"	not exposed
Module-import	modules/packages	"module M imports module N" (with symbol granularity where the language has it)	partial via `get_imports`
Field read/write	fields/globals × callables	"callable C reads field F" / "writes field F"	not exposed
Decorator / annotation	callables/classes × decorators	"C is decorated D with kwargs K"	partial via `.decorators` attribute
Exception-flow	callables × exception types	"C raises E"; "C catches E"	not exposed
Configuration-reference	code symbols × config keys	"callable C reads config key K"; "key K defined in file F"	not exposed
Test-link	tests × prod callables	"test T exercises callable C" (mined from imports + coverage where available)	not exposed
Resource graph	I/O sites × resources	"callable C opens file/socket/db connection X"	not exposed

Bridge	Source side	Target side	Linking signal
HTTP route	server handler decorated/declared with URL path	client `fetch`/`axios`/`requests` call with URL string	URL string match (or schema like OpenAPI/Swagger when present)
RPC service	server-side service method (gRPC/Thrift/SOAP)	client stub call	service+method name from `.proto` / IDL
Message bus	publisher (Kafka/SQS/RabbitMQ/NATS)	subscriber/handler	topic/queue name
ORM ↔ SQL	ORM model class + field declarations	SQL DDL / migration / hand-written query	table+column names
FFI / shared library	native declaration	bindings on the other side	symbol name + ABI
Config file ↔ code	code site reading a key	YAML/JSON/TOML/INI/ENV file defining the key	key path
Template ↔ code	template variable	view/handler passing it	variable name in scope
Build artifact	source files of one component	declared input/output of build step	filename / target name in Makefile/Bazel/etc.
Container ↔ binary	Dockerfile `CMD`/`ENTRYPOINT`	program entry point	path / image layer
Schema ↔ deserialiser	JSON Schema / Protobuf / Avro / OpenAPI	parse call site	schema reference / mime type

Heterogeneous multi-graph view: inheritance / type-use / import / field-rw graphs with joins, polyglot edges across languages #156

Description

Summary

The graphs

Joins across graphs (homogeneous, within one language)

Cross-language: the bridge edges (the real prize)

Honest visibility: heuristic linking must be visible

Why this matters more for security than for refactoring

Relationship to #155

Suggested incremental rollout

Out of scope (separate issues)

The framing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions