You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why Hybrid (AST + Joern)?
Technically, Joern's CPG subsumes AST. However, the current AST parser is highly optimized and stable for CodeWiki's specific documentation needs.
To achieve a Quick Win, we use a "Enhancement" strategy:
AST: Provides the stable skeleton (Repository structure, files, methods).
Joern: Infuses "Superpowers" (Data flow, cross-module tagging) that AST alone cannot provide.
This prevents a "high-risk replacement" and allows for incremental migration.
Joern Integration - Quick Win Plan (v5)
This plan adopts a Python-native approach using pyjoern. By leveraging the pyjoern wrapper, we gain a direct API for CPG manipulation and automated backend management, while still keeping AST as the stable structural substrate.
User Review Required
Tip
Why pyjoern?
Native Python API: No more complex subprocess.run parsing; we get function and CFG objects directly.
Auto-Setup: pyjoern --install manages the Joern binary download and updates.
Deep Analysis: Specifically optimized for CFG, PDG, and Data Dependency extraction.
Important
System Requirements:
Java 19+ (Required by Joern backend)
Graphviz (For visualization/export)
Python 3.8+
🚀 Execution Phases
Phase 1: pyjoern PoC & Baseline 🎯
Goal: Verify pyjoern environment and establish performance/accuracy baselines.
Environment: pip install pyjoern followed by pyjoern --install.
Baseline: Record AST analysis time for 10/100/500 files.
PoC: Use from pyjoern import parse_source on a test project and print function CFGs.
Tip
Why Hybrid (AST + Joern)?
Technically, Joern's CPG subsumes AST. However, the current AST parser is highly optimized and stable for CodeWiki's specific documentation needs.
To achieve a Quick Win, we use a "Enhancement" strategy:
This prevents a "high-risk replacement" and allows for incremental migration.
Joern Integration - Quick Win Plan (v5)
This plan adopts a Python-native approach using
pyjoern. By leveraging thepyjoernwrapper, we gain a direct API for CPG manipulation and automated backend management, while still keeping AST as the stable structural substrate.User Review Required
Tip
Why
pyjoern?subprocess.runparsing; we get function and CFG objects directly.pyjoern --installmanages the Joern binary download and updates.Important
System Requirements:
🚀 Execution Phases
Phase 1: pyjoern PoC & Baseline 🎯
Goal: Verify
pyjoernenvironment and establish performance/accuracy baselines.pip install pyjoernfollowed bypyjoern --install.from pyjoern import parse_sourceon a test project and print function CFGs.pyjoern_check.log,performance_baseline.md.Phase 2: Hybrid Data Flow Visualization 📊
Goal: Introduce Data Flow analysis as a "Plugin" enrichment.
HybridAnalysisServiceusingpyjoern's traversals to extract data dependencies.DataFlowRelationship(source, target, flow_type) to the analysis result.hybrid_analysis_service.py, Sample documentation with data flow context.Phase 3: Production Integration & Efficiency 🏭
Goal: Robust integration with caching and hybrid fallback.
--use-joernflag (default=False).pyjoernobjects (as they are native Python dicts/objects).pyjoernfails (missing Java or incompatible file), revert toASTParser.JOERN_USER_GUIDE.md.Verification Plan
Success Metrics
pyjoernlogs.fast_cfgs_from_sourcewhere possible).