Skip to content

VectorRobotics/vector-os-nano

Repository files navigation

Vector Robotics

Vector OS Nano

Zero-shot, natural language generalized grasping on a $150 robot arm.
No training. No fine-tuning. Just say what you want.

Python MuJoCo Claude ROS2 Moondream2 EdgeTAM RealSense LeRobot

Vector OS: a cross-embodiment robot operating system with industrial-grade SLAM, navigation, generalized grasping, semantic mapping, long-chain task orchestration and explainable task execution.
Being developed at CMU Robotics Institute. Nano is the grasping proof-of-value. Full stack coming soon.

Vector OS: 跨本体通用机器人操作系统:工业级SLAM、导航、泛化抓取、语义建图、长链任务编排、可解释任务执行。
CMU 机器人研究所 全力开发中。Nano 是低成本硬件上的低门槛概念验证,完整系统即将分阶段开源。


Demo

Click to watch full demo video
Click to watch full demo video


What is Vector OS Nano?

A Python SDK that gives any robot arm a natural language brain. pip install and go. No ROS2 required.

from vector_os_nano import Agent, SO101

arm = SO101(port="/dev/ttyACM0")
agent = Agent(arm=arm, llm_api_key="sk-...")
agent.execute("pick up the red cup and put it on the left")

No hardware? Full simulation:

from vector_os_nano import Agent, MuJoCoArm

arm = MuJoCoArm(gui=True)
arm.connect()
agent = Agent(arm=arm, llm_api_key="sk-...")
agent.execute("把鸭子放到前面")        # one-shot: plan + execute
agent.run_goal("clean the table")     # agent loop: observe-decide-act-verify

SkillFlow — Declarative Skill Routing

All command routing is declarative via the @skill decorator. No hard-coded if/else chains. Skills describe themselves, the system routes automatically.

from vector_os_nano.core.skill import skill, SkillContext
from vector_os_nano.core.types import SkillResult

@skill(
    aliases=["grab", "grasp", "抓", "拿", "抓起"],       # trigger words
    direct=False,                                          # needs planning
    auto_steps=["scan", "detect", "pick"],                 # default chain
)
class PickSkill:
    name = "pick"
    description = "Pick up an object from the workspace"
    parameters = {"object_label": {"type": "string"}}
    preconditions = ["gripper_empty"]
    postconditions = ["gripper_holding_any"]
    effects = {"gripper_state": "holding"}

    def execute(self, params, context):
        # ... pick implementation ...
        return SkillResult(success=True)

Routing logic:

"home"           → @skill alias match → direct=True → execute (zero LLM)
"close grip"     → @skill alias match → direct=True → execute (zero LLM)
"抓杯子"         → @skill alias match → auto_steps → scan→detect→pick (zero LLM)
"把鸭子放到左边"  → no simple match → LLM plans pick(hold) + place(left) + home
"你好"           → no match → LLM classify → chat response

See docs/skill-protocol.md for full SkillFlow specification.

Agent Pipeline — Two Execution Modes

One-shot (simple commands) — plan once, execute all steps:

User Input → MATCH → CLASSIFY → PLAN → EXECUTE → SUMMARIZE
"抓杯子"      alias    (skip)    (skip)   scan→detect→pick   "Done"
"把X放左边"   no match  task     LLM plan  pick(hold)→place   "Done"

Agent Loop (iterative goals) — observe, decide one step, act, verify, repeat:

User Input → run_goal("clean the table")
                ↓
        ┌→ OBSERVE  (camera + world model)
        │  DECIDE   (LLM picks ONE action)
        │  ACT      (execute single skill)
        │  VERIFY   (re-detect: did it actually work?)
        └─ LOOP     (until goal achieved or max iterations)

The Agent Loop solves two problems:

  1. Iterative goals: "clean the table" picks objects one-by-one until table is empty
  2. Hardware drift: doesn't trust success=True — uses perception to verify actual outcomes

AI assistant V speaks before acting, shows live progress, and summarizes after:

vector> 把所有东西都随便乱放

╭─ V ──────────────────────────────────────────────────╮
│ 好的主人,我来把所有东西都随便乱放。                     │
╰──────────────────────────────────────────────────────╯
  Plan: 13 steps

  [1/13] pick ... OK 14.6s
  [2/13] place ... OK 11.5s
  ...
  [13/13] home ... OK 3.8s

  Done 13/13 steps, 180.0s

╭─ V ──────────────────────────────────────────────────╮
│ 主人,已把6个物体全部放到不同位置,手臂已回待命。       │
╰──────────────────────────────────────────────────────╯

MuJoCo Simulation

No robot? No camera? No problem.

python run.py --sim              # MuJoCo viewer + interactive CLI
python run.py --sim-headless     # headless (no viewer)
  • SO-101 arm with real STL meshes (13 parts from CAD)
  • 6 graspable objects: banana, mug, bottle, screwdriver, duck, lego
  • Weld-constraint grasping, smooth real-time motion
  • Pick-and-place with named locations (front, left, right, center, etc.)
  • Simulated perception with Chinese/English NL queries

MuJoCo Simulation

Hardware (~$420 total)

Component Model Cost
Robot Arm LeRobot SO-ARM100 (6-DOF, 3D-printed) ~$150
Camera Intel RealSense D405 ~$270
GPU Any NVIDIA with 8+ GB VRAM (existing)

Hardware Setup

Quick Start

# Setup
cd vector_os_nano
python3 -m venv .venv --prompt vector_os_nano
source .venv/bin/activate
pip install -e ".[all]"

# Launch (simulation — no hardware needed)
python run.py --sim

# Launch (real hardware)
python run.py

LLM config — create config/user.yaml:

llm:
  api_key: "your-openrouter-api-key"
  model: "anthropic/claude-haiku-4-5"
  api_base: "https://openrouter.ai/api/v1"

All Launch Modes

# Classic pipeline (classify → plan → execute)
python run.py                  # Real hardware + CLI
python run.py --sim            # MuJoCo sim + viewer + CLI
python run.py --sim-headless   # MuJoCo sim headless
python run.py -v               # Verbose: show agent thinking (MATCH/CLASSIFY/ROUTE)
python run.py --dashboard      # Textual TUI dashboard
python run.py --web --sim      # Web dashboard at localhost:8000

# Agent mode (LLM tool-calling — smarter, context-aware)
python run.py --agent                                     # Default: gpt-4o-mini
python run.py --agent --agent-model openai/gpt-4o         # Stronger reasoning
python run.py --agent --agent-model anthropic/claude-sonnet-4-6  # Claude Sonnet
python run.py --agent -v                                  # Agent + debug output
python run.py --sim --agent                               # Sim + agent mode

Classic vs Agent mode: Classic mode uses a rigid classify→plan→execute pipeline — fast for simple commands ("抓杯子") but can't handle multi-turn context. Agent mode uses LLM-native function calling — the LLM sees full conversation history, decides when to chat vs call tools, and understands context ("我饿了" → picks food proactively).

REST API

When running with --web, these endpoints are available:

GET  /api/status              # Robot state (joints, gripper, objects)
GET  /api/skills              # Available skills with JSON schemas
GET  /api/world               # World model (objects + robot state)
GET  /api/camera              # Camera frame (JPEG)
POST /api/execute             # {"instruction": "pick the banana"}
POST /api/skill/{name}        # {"object_label": "banana", "mode": "drop"}
POST /api/run_goal            # {"goal": "clean the table", "max_iterations": 10}

Any agent framework (LangGraph, custom scripts, etc.) can control the robot via HTTP.

Custom Skills

from vector_os_nano.core.skill import skill, SkillContext
from vector_os_nano.core.types import SkillResult

@skill(aliases=["wave", "挥手", "打招呼"], direct=False, auto_steps=["wave"])
class WaveSkill:
    name = "wave"
    description = "Wave the arm as a greeting"
    parameters = {"times": {"type": "integer", "default": 3}}
    preconditions = []
    postconditions = []
    effects = {}

    def execute(self, params, context):
        for _ in range(params.get("times", 3)):
            joints = context.arm.get_joint_positions()
            joints[0] = 0.5
            context.arm.move_joints(joints, duration=0.5)
            joints[0] = -0.5
            context.arm.move_joints(joints, duration=0.5)
        return SkillResult(success=True)

agent.register_skill(WaveSkill())
agent.execute("wave")       # alias match → direct execute
agent.execute("挥手三次")    # alias match → LLM fills params

Project Structure

vector_os_nano/
├── core/          SkillFlow, Agent, AgentLoop, Executor, WorldModel, PlanValidator
├── llm/           LLM-agnostic providers (Claude, OpenAI, Ollama), ModelRouter
├── perception/    RealSense + Moondream VLM + EdgeTAM tracker
├── hardware/
│   ├── so101/     SO-101 arm driver (Feetech serial, Pinocchio IK)
│   └── sim/       MuJoCo simulation (arm, gripper, perception, 6 objects)
├── skills/        @skill classes with diagnostics + enum constraints + failure_modes
├── cli/           Rich CLI with debug mode (-v shows agent thinking)
├── mcp/           MCP server (tools + resources, stdio + SSE transport)
├── web/           FastAPI + WebSocket dashboard
└── ros2/          Optional ROS2 integration (5 nodes)

MCP Server -- Claude Code Controls the Robot

Vector OS Nano exposes all skills via the Model Context Protocol (MCP). Claude Code connects directly and controls the robot -- sim or real hardware -- through natural language.

# Auto-connects when Claude Code starts (configured in .mcp.json)
# Or manual:
python -m vector_os_nano.mcp --sim --stdio        # sim + stdio (for .mcp.json)
python -m vector_os_nano.mcp --hardware --stdio    # real hardware + stdio
python -m vector_os_nano.mcp --sim                 # sim + SSE on :8100

12 MCP tools: pick, place, home, scan, detect, gripper_open, gripper_close, wave, natural_language, run_goal, diagnostics, debug_perception 7 MCP resources: world://state, world://objects, world://robot, camera://overhead, camera://front, camera://side, camera://live

Claude Code controlling robot via MCP
Claude Code operating the robot arm through MCP -- scan, detect, pick, place via natural conversation.

Autonomous Skill Generation

Claude Agent can autonomously design, implement, and test new skills with full reasoning -- then register and execute them immediately.

Claude Agent generating skills autonomously
Claude Agent designing a custom wave skill with full reasoning, code generation, and live execution.

What's Coming

The full Vector OS stack under development at CMU Robotics Institute:

  • SLAM + Navigation -- LiDAR/visual SLAM, Nav2, multi-floor mapping
  • Semantic Mapping -- 3D scene graphs, object permanence, spatial reasoning
  • Multi-Robot Coordination -- fleet management, task allocation, shared world model
  • Mobile Manipulation -- wheeled, legged, and humanoid platforms

Star this repo and stay tuned.


点击查看中文

什么是 Vector OS Nano?

一个 Python SDK,让任何机械臂拥有自然语言大脑。pip install 即可使用,不需要 ROS2。

from vector_os_nano import Agent, SO101

arm = SO101(port="/dev/ttyACM0")
agent = Agent(arm=arm, llm_api_key="sk-...")
agent.execute("抓起红色杯子放到左边")

没有硬件?用 MuJoCo 仿真:

python run.py --sim

SkillFlow 协议

所有命令路由通过 @skill 装饰器声明,零硬编码:

@skill(aliases=["抓", "拿", "抓起"], auto_steps=["scan", "detect", "pick"])
class PickSkill: ...

简单命令(home, open, close)零 LLM 调用,常见模式(抓X)自动编排,复杂任务才用 LLM 规划。

快速开始

cd vector_os_nano
python3 -m venv .venv --prompt vector_os_nano
source .venv/bin/activate
pip install -e ".[all]"
python run.py --sim

License

MIT License


Built by Vector Robotics at CMU Robotics Institute with Claude Code

About

Vector OS Nano: Natural language controlled robot arm. $450 hardware, say 'pick up the battery' and it does. LeRobot SO-ARM100 + RealSense D405 + ROS2 + LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors