Articulate Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

ICLR 2025

Long Le Jason Xie, William Liang, Hung-Ju Wang, Yue Yang,
Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton
University of Pennsylvania

[Project Website] [Paper] [Twitter threads]

Articulate Anything is a powerful VLM system for articulating 3D objects using various input modalities.

articulate_anything_tiktokified_2_V3.mp4

Features

Articulate 3D objects from text 🖋 descriptions
Articulate 3D objects from 🖼 images
Articulate 3D objects from 🎥 videos

We use Hydra for configuration management. You can easily customize the system by modifying the configuration files in configs/ or overload parameters from the command line. We can automatically articulate a variety of input modalities from a single command

   python articulate.py modality={partnet, text, image, video} prompt={prompt} out_dir={output_dir}

Articulate-anything uses a actor-critic system, allowing for self-correction and self-improvement over iterations.

🚀 QUICK START

Download preprocessed PartNet-Mobility dataset from 🤗 Articulate-Anything Dataset on Hugging Face.
To use an interactive demo, run
```
python gradio_app.py
```

articulate_anything_gradio_demo.mp4

See below for more detailed guides.

Installation

Note

Skip the downloading raw dataset step if you have already downloaded our dataset from 🤗 Articulate-Anything Dataset on Hugging Face.

Clone the repository:

git clone https://github.com/vlongle/articulate-anything.git
cd articulate-anything

Set up the Python environment:

conda create -n articulate-anything python=3.9
conda activate articulate-anything
pip install -e .

Download and extract the PartNet-Mobility dataset:

# Download from https://sapien.ucsd.edu/downloads
mkdir datasets
mv partnet-mobility-v0.zip datasets/partnet-mobility-v0.zip
cd datasets
mkdir partnet-mobility-v0
unzip partnet-mobility-v0 -d partnet-mobility-v0

Getting Started

Our system supports Google Gemini, OpenAI GPT, and Anthropic Claude. You can set the model_name in the config file conf/config.yaml to gemini-1.5-flash-latest, gpt-4o, or claude-3-5-sonnet-20241022. Get your API key from the respective website and set it as an environment variable:

export API_KEY=YOUR_API_KEY

Usage

We support reconstruction from in-the-wild text, images, or videos, or masked reconstruction from PartNet-Mobility dataset.

Note

Skip all the processing steps if you have downloaded our preprocessed dataset from 🤗 Articulate-Anything Dataset on Hugging Face.

Demo

First, preprocess the parntet dataset by running

python preprocess_partnet.py parallel={int} modality={}

Run the interactive demo
```
python gradio_app.py
```

💾 PartNet-Mobility Masked Reconstruction

🐒 It's articulation time! For a step-by-step guide on articulating a PartNet-Mobility object, see the notebook:

Open in Jupyter Notebook

or run

  python articulate.py modality=partnet prompt=45384 out_dir=results additional_prompt=joint_0

to run for object_id=149.

🖋 Text Articulation

Preprocess the dataset:

python articulate_anything/preprocess/preprocess_partnet.py parallel={int} modality=text

Our precomputed CLIP embeddings is available from our repo in partnet_mobility_embeddings.csv. If you prefer to generate your own embeddings, follow these steps:

Run the preprocessing with render_aprt_views=true to render part views for later part annotation.

   python articulate_anything/preprocess/preprocess_partnet.py parallel={int} modality=text render_part_views=true

Annotate mesh parts using VLM (skip if using our precomputed embeddings):

python articulate_anything/preprocess/annotate_partnet_parts.py parallel={int}

Extract CLIP embeddings (skip if using our precomputed embeddings):
```
python articulate_anything/preprocess/create_partnet_embeddings.py
```

🐒 It's articulation time! For a detailed guide, see:

Open in Jupyter Notebook

or run

python articulate.py modality=text  prompt="suitcase with a retractable handle" out_dir=results/text/suitcase joint_actor.targetted_affordance=false

🖼 / 🎥 Visual Articulation

Render images for each object:
```
python articulate_anything/preprocess/preprocess_partnet.py parallel={int} modality={image}
```
This renders a front-view image for each object in the PartNet-Mobility dataset. This is necessary for our mesh retrieval as we will compare the visual similarity between the input image or video against each rendered template object.

🐒 It's articulation time! For a detailed guide, see:

Open in Jupyter Notebook

or run

python articulate.py modality=video prompt="datasets/in-the-wild-dataset/videos/suitcase.mp4" out_dir=results/video/suitcase

Note: Please download a checkpoint of cotracker for video articulation to visualize the motion traces.

Notes

Some implementation pecularity with the PartNet-Mobility dataset.

Raise above ground: The meshes are centered at origin (0,0,0). We use pybullet to raise the links above the ground. Done automatically in sapien_simulate.
Rotate meshes: All the meshes will be on the ground. We have to get them in the upright orientation. Specifically, we need to add a fixed joint <origin rpy="1.570796326794897 0 1.570796326794897" xyz="0 0 0"/> between the first link and the base link. This is almost done in the original PartNet-Mobility dataset. render_partnet_obj which calls rotate_urdf saves the original urdf under mobility.urdf.backup and get the correct rotation under mobility.urdf. Then, for our generated python program we need to make sure that the compiled python program also has this joint. This is done automatically by the compiler odio_urdf.py using align_robot_orientation function.

Contact

Feel free to reach me at [email protected] if you'd like to collaborate, or have any questions. You can also open a Github issue if you encounter any problems.

Citation

If you find this work useful, please consider citing our paper:

@article{le2024articulate,
  title={Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model},
  author={Le, Long and Xie, Jason and Liang, William and Wang, Hung-Ju and Yang, Yue and Ma, Yecheng Jason and Vedder, Kyle and Krishna, Arjun and Jayaraman, Dinesh and Eaton, Eric},
  journal={arXiv preprint arXiv:2410.13882},
  year={2024}
}

For more information, visit our project website.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
articulate_anything		articulate_anything
assets/Figures		assets/Figures
conf		conf
datasets		datasets
examples		examples
.gitignore		.gitignore
README.md		README.md
actor_critic.py		actor_critic.py
articulate.py		articulate.py
articulate_joint.py		articulate_joint.py
articulate_link.py		articulate_link.py
extract_target_affordance.py		extract_target_affordance.py
gradio_app.py		gradio_app.py
obj_types.json		obj_types.json
partnet_mobility_embeddings.csv		partnet_mobility_embeddings.csv
partnet_obj_types.json		partnet_obj_types.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Articulate Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

ICLR 2025

Features

🚀 QUICK START

Table of Contents

Installation

Getting Started

Usage

Demo

💾 PartNet-Mobility Masked Reconstruction

🖋 Text Articulation

🖼 / 🎥 Visual Articulation

Notes

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

vlongle/articulate-anything

Folders and files

Latest commit

History

Repository files navigation

Articulate Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

ICLR 2025

Features

🚀 QUICK START

Table of Contents

Installation

Getting Started

Usage

Demo

💾 PartNet-Mobility Masked Reconstruction

🖋 Text Articulation

🖼 / 🎥 Visual Articulation

Notes

Contact

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages