PaperBanana 🍌

AI-powered academic diagrams that actually communicate ideas — not just labeled boxes.

For researchers, students, and anyone who needs publication-quality figures from text descriptions.

10x faster | 20x cheaper | 95.8/100 avg quality (self-evaluated via vision critic)

Option A: Claude Code Skill (Recommended — no install needed)

mkdir -p ~/.claude/skills/paperbanana
curl -sL https://github.com/stuinfla/paperbanana/main/docs/SKILL.md > ~/.claude/skills/paperbanana/SKILL.md

Type /paperbanana in Claude Code and start describing diagrams. That's it.

Option B: MCP Server (for Claude Code / AI assistants)

Quick setup: curl -sL https://github.com/stuinfla/paperbanana/main/install.sh | bash

Then add to your project's .mcp.json:

{
  "mcpServers": {
    "paperbanana": {
      "command": "python3",
      "args": ["-m", "mcp_server.server"],
      "cwd": "~/paperbanana",
      "env": { "GOOGLE_API_KEY": "your-key-here" }
    }
  }
}

Get a free API key at aistudio.google.com/apikey, then ask Claude: "Generate a diagram showing how neural networks learn"

Option C: Command Line

git clone https://github.com/stuinfla/paperbanana && cd paperbanana
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
export GOOGLE_API_KEY="your-key"  # Free at https://aistudio.google.com/apikey
.venv/bin/python cli_generate.py --content "Your concept" --caption "Figure 1" --output diagram.png

Note: These are this fork's Skill and MCP (SVG pipeline + visual storytelling). Different from the community pip install paperbanana package — see Community Supports.

View all 10 in the showcase

Here are 3 examples generated by this fork's SVG pipeline (95+ each):

Pi: Data Flow Pipeline	RuVector: GNN Learning Loop	Ruflo: Swarm Topologies

What's Different About This Fork

Original research by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon (Peking University + Google Cloud AI Research). Originally open-sourced as PaperVizAgent.

Enhanced by Stuart Kerr — SVG pipeline, visual storytelling, vision critic, 10x speed, 20x cost reduction.

	Original	This Fork
Rendering	Raster image generation	SVG code + Cairo render (100% text fidelity)
Speed	2-5 min per diagram	~30s in SVG mode
Cost	$0.50-2.00 per diagram	~$0.05 in SVG mode
Quality	~62-72/100 avg	95.8/100 avg (self-evaluated)
Design	Labeled boxes and arrows	Visual-first: icons, shapes, spatial layout
Self-correction	Text-based critic	Vision critic sees the rendered PNG, sends spatial fixes
Output	Raster PNG only	Editable SVG + PNG
Integration	Streamlit only	Skill, MCP, CLI, Streamlit, Python API

The core innovation: instead of drawing labeled boxes, the Planner first asks "What is this concept LIKE?" — finding a visual metaphor that makes the diagram click in seconds.

Full Install

For CLI, Streamlit, Python API, or MCP server:

git clone https://github.com/stuinfla/paperbanana && cd paperbanana
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
export GOOGLE_API_KEY="your-key-here"  # Free at https://aistudio.google.com/apikey

Command Line

# Generate a diagram
.venv/bin/python cli_generate.py \
  --content "Description of your concept" \
  --caption "Figure 1: What this shows" \
  --output diagram.png

# From a file
.venv/bin/python cli_generate.py \
  --content-file method_section.md \
  --caption "Figure 2: System architecture." \
  --output diagram.png

# Multiple candidates (picks the best)
.venv/bin/python cli_generate.py \
  --content-file method_section.md \
  --caption "Figure 1" \
  --output diagram.png \
  --candidates 5

Web UI (Streamlit)

.venv/bin/streamlit run demo.py

Python API

import asyncio
from utils.paperviz_processor import PaperVizProcessor
from utils import config
from agents.planner_agent import PlannerAgent
from agents.visualizer_agent import VisualizerAgent
from agents.stylist_agent import StylistAgent
from agents.critic_agent import CriticAgent
from agents.retriever_agent import RetrieverAgent
from agents.vanilla_agent import VanillaAgent
from agents.polish_agent import PolishAgent

exp_config = config.ExpConfig(
    dataset_name="Demo", task_name="diagram", split_name="demo",
    exp_mode="demo_full", retrieval_setting="auto", max_critic_rounds=3,
)

processor = PaperVizProcessor(
    exp_config=exp_config,
    planner_agent=PlannerAgent(exp_config=exp_config),
    visualizer_agent=VisualizerAgent(exp_config=exp_config),
    stylist_agent=StylistAgent(exp_config=exp_config),
    critic_agent=CriticAgent(exp_config=exp_config),
    retriever_agent=RetrieverAgent(exp_config=exp_config),
    vanilla_agent=VanillaAgent(exp_config=exp_config),
    polish_agent=PolishAgent(exp_config=exp_config),
)

input_data = {
    "filename": "my_diagram",
    "caption": "Figure 1: System architecture.",
    "content": "Your methodology text here...",
    "visual_intent": "Figure 1: System architecture.",
}

async def generate():
    async for result in processor.process_queries_batch(
        [input_data], max_concurrent=1, do_eval=False
    ):
        print("Generated:", result.keys())

asyncio.run(generate())

CLI Reference

Flag	Values	Default	Description
`--content`	text	required*	Inline content to visualize
`--content-file`	path	required*	File containing content
`--caption`	text	required	Figure caption / visual intent
`--output`	path	output.png	Output image path
`--task`	diagram, plot	diagram	Type of visualization
`--mode`	demo_full, demo_planner_critic, vanilla	demo_full	Pipeline mode
`--retrieval`	auto, manual, random, none	none	Reference retrieval strategy
`--critic-rounds`	1-5	3	Max refinement iterations
`--candidates`	1-20	1	Parallel candidates to generate
`--aspect-ratio`	16:9, 21:9, 3:2	16:9	Output aspect ratio
`--quiet`	flag	false	Suppress progress output
`--model`	model name	config	Override reasoning model
`--image-model`	model name	config	Override image generation model

*One of --content or --content-file is required.

Models and Cost

Model	Role	Quality	Speed	Cost
`gemini-3.1-pro-preview`	Reasoning	Best	~15s/call	Higher
`gemini-3-pro-image-preview`	Image Gen	Best	~60s/call	Higher
`gemini-2.5-flash`	Reasoning	Good	~5s/call	Lower
`gemini-2.5-flash-image`	Image Gen	Good	~30s/call	Lower

Switch models: --model gemini-2.5-flash --image-model gemini-2.5-flash-image

Or set permanently in configs/model_config.yaml:

defaults:
  model_name: "gemini-2.5-flash"
  image_model_name: "gemini-2.5-flash-image"

Quality Tier	Cost/image	Time
SVG pipeline (best)	~$0.05	~30s
Full raster pipeline	~$0.10	~2 min
Draft (vanilla)	~$0.02	~90s

Optional: download PaperBananaBench into data/PaperBananaBench/ for +15 quality points via reference retrieval.

How It Works

Visual Storytelling Pipeline

Before drawing anything, the Planner asks three questions:

Standard vs Storytelling approach comparison

A container format becomes a shipping crate with compartments. A self-learning database becomes a living library. The reviewer "gets it" in seconds.

SVG Pipeline: Vision Critic Loop

The highest-quality mode. LLM writes SVG code directly, Cairo renders to PNG, multimodal Gemini evaluates the rendered image, and spatial fixes are applied automatically.

SVG Generation -- LLM writes SVG with labels and descriptions on every element
Cairo Rendering -- 100% text fidelity (text placed by renderer, not predicted by a neural net)
Vision Critique -- evaluates for overlap, clipping, layout, missing information
Self-Correction -- fixes applied, re-rendered, re-evaluated until 95+/100

Side-by-Side: Before and After

Technical Architecture (+30 points)

With Visual Metaphor (95/100)	Standard Pipeline (65/100)

WiFi sensing is invisible. The metaphor -- waves passing through a person with a pose overlay -- makes the invisible visible.

Abstract Concepts (+16 points)

With Visual Metaphor (92/100)	Standard Pipeline (76/100)

The "Knowledge City" metaphor turns abstract ideas into something tangible.

More examples

Application Ecosystem (+25)

With Visual Metaphor (93/100)	Standard Pipeline (68/100)

Product Overview (+16)

With Visual Metaphor (94/100)	Standard Pipeline (78/100)

Quality Results

Scenario	Enhanced	Baseline	Gain
Application ecosystem	93	68	+25
Product overview	94	78	+16
Technical architecture	95	65	+30
Abstract concepts	92	76	+16
Average	93.5	71.75	+21.75

Architecture

The original 5-agent pipeline (all enhanced in this fork) plus the new SVG Visualizer:

Agent	Enhancement
Planner	Mandatory visual metaphor discovery before describing elements
Stylist	Preserves metaphors (never flattens into generic boxes)
Visualizer	Multi-candidate parallel generation + tag stripping
Critic	7 mandatory visual excellence checks, strict pass threshold
SVG Visualizer (new)	LLM writes SVG + Cairo render + multimodal vision critic loop

Quality journey across iterations

Each iteration built on the one before. The storytelling step produced the largest single improvement because it changes the strategy rather than just the execution.

Advanced: Batch Evaluation and Visualization

# Batch evaluation
python main.py \
  --dataset_name "PaperBananaBench" \
  --task_name "diagram" \
  --split_name "test" \
  --exp_mode "dev_full" \
  --retrieval_setting "auto"

# Pipeline evolution viewer
streamlit run visualize/show_pipeline_evolution.py

# Evaluation results
streamlit run visualize/show_referenced_eval.py

Modes: vanilla, dev_planner, dev_planner_stylist, dev_planner_critic, dev_full, demo_planner_critic, demo_full

Project Structure

ASCII Version (for AI/accessibility)

agents/
  planner_agent.py        # Visual metaphor discovery + description
  stylist_agent.py         # Metaphor-preserving style refinement
  visualizer_agent.py      # Multi-candidate image generation
  svg_visualizer_agent.py  # SVG code gen + Cairo render + vision critic
  critic_agent.py          # 7-check visual excellence scoring
  retriever_agent.py       # Reference example retrieval
  vanilla_agent.py         # Direct generation (baseline)
  polish_agent.py          # Post-processing refinement
mcp_server/
  server.py                # MCP server (4 tools) for AI assistants
cli_generate.py            # Headless CLI for single diagram generation
demo.py                    # Streamlit web UI
main.py                    # Batch evaluation runner
configs/                   # Model config templates
data/PaperBananaBench/     # Reference dataset (download separately)
docs/                      # Comparison images, showcase gallery
style_guides/              # NeurIPS aesthetic guidelines
utils/                     # Pipeline orchestration, config
visualize/                 # Pipeline visualization tools

Community Supports

Note: The community projects below are independent implementations of the original PaperBanana paper -- not related to the Skill or MCP in this fork. If you installed from Quick Start above, you're using this fork's SVG pipeline.

https://github.com/llmsresearch/paperbanana -- pip-installable package with its own MCP server (different from this fork's MCP)
https://github.com/efradeca/freepaperbanana

Related work in automated academic illustration:

We warmly welcome community contributions to make PaperBanana even better!

License

Apache-2.0

Citation

@article{zhu2026paperbanana,
  title={PaperBanana: Automating Academic Illustration for AI Scientists},
  author={Zhu, Dawei and Meng, Rui and Song, Yale and Wei, Xiyu and Li, Sujian and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2601.23265},
  year={2026}
}

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Our goal is simply to benefit the community, so currently we have no plans to use it for commercial purposes. The core methodology was developed during my internship at Google, and patents have been filed for these specific workflows by Google. While this doesn't impact open-source research efforts, it restricts third-party commercial applications using similar logic.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
agents		agents
assets		assets
configs		configs
docs		docs
mcp_server		mcp_server
notebooks		notebooks
prompts		prompts
scripts		scripts
static		static
style_guides		style_guides
utils		utils
visualize		visualize
.ascii-to-svg-manifest.json		.ascii-to-svg-manifest.json
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
ENHANCED_PIPELINE.md		ENHANCED_PIPELINE.md
LICENSE		LICENSE
README.md		README.md
cli_generate.py		cli_generate.py
code-of-conduct.md		code-of-conduct.md
demo.py		demo.py
index.html		index.html
install.sh		install.sh
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_svg_pipeline.py		test_svg_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperBanana 🍌

Option A: Claude Code Skill (Recommended — no install needed)

Option B: MCP Server (for Claude Code / AI assistants)

Option C: Command Line

What's Different About This Fork

Full Install

Command Line

Web UI (Streamlit)

How It Works

Visual Storytelling Pipeline

SVG Pipeline: Vision Critic Loop

Side-by-Side: Before and After

Technical Architecture (+30 points)

Abstract Concepts (+16 points)

Application Ecosystem (+25)

Product Overview (+16)

Quality Results

Architecture

Project Structure

Community Supports

License

Citation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperBanana 🍌

Option A: Claude Code Skill (Recommended — no install needed)

Option B: MCP Server (for Claude Code / AI assistants)

Option C: Command Line

What's Different About This Fork

Full Install

Command Line

Web UI (Streamlit)

How It Works

Visual Storytelling Pipeline

SVG Pipeline: Vision Critic Loop

Side-by-Side: Before and After

Technical Architecture (+30 points)

Abstract Concepts (+16 points)

Application Ecosystem (+25)

Product Overview (+16)

Quality Results

Architecture

Project Structure

Community Supports

License

Citation

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages