Skip to content

feat: Python MLX subprocess engines for image edit, Cosmos, and Wan2.2 GGUF#14

Open
hzeng412 wants to merge 5 commits into
mainfrom
feat/pymlx-inference-engines
Open

feat: Python MLX subprocess engines for image edit, Cosmos, and Wan2.2 GGUF#14
hzeng412 wants to merge 5 commits into
mainfrom
feat/pymlx-inference-engines

Conversation

@hzeng412
Copy link
Copy Markdown
Contributor

@hzeng412 hzeng412 commented May 1, 2026

Summary

  • Add 3 new Python MLX subprocess engines: image edit (Qwen-Image-Edit-2511), Cosmos Predict2 (T2I + V2W), and Wan 2.2 GGUF video generation
  • Shared pymlx.rs infrastructure for Python/venv discovery, script resolution, and subprocess execution
  • Extended ImageModelType routing in inference thread for automatic engine selection
  • Scripts deployed to ~/.OminiX/inference/scripts/ for subprocess invocation

Test plan

  • Set up Python venv: python3 -m venv ~/.OminiX/inference/venv && ~/.OminiX/inference/venv/bin/pip install -r ~/.OminiX/inference/requirements.txt
  • Test image edit: curl -X POST http://localhost:8080/v1/images/generations -d '{"model":"qwen-image-edit","prompt":"make sky purple","image":"<base64>","size":"1024x1024"}'
  • Test Cosmos T2I: curl -X POST http://localhost:8080/v1/images/generations -d '{"model":"cosmos-predict2-14b","prompt":"a sunset","size":"1360x768"}'
  • Test Wan2.2 GGUF: curl -X POST http://localhost:8080/v1/videos/generations -d '{"model":"wan2.2-gguf","prompt":"a cat walking"}'
  • Verify existing FLUX/ZImage/mflux routes are unaffected

🤖 Generated with Claude Code

hzeng412 and others added 5 commits May 1, 2026 14:04
…2 GGUF

Add three new inference engines that wrap Python MLX scripts as subprocesses,
following the existing mflux.rs pattern:

- pymlx_image_edit: Qwen-Image-Edit-2511 (reference image + text → edited image)
- pymlx_cosmos: Cosmos Predict2 (text-to-image + video-to-world)
- pymlx_wan22: Wan 2.2 GGUF video generation (multiple sampling methods)

Shared infrastructure in pymlx.rs handles Python/venv discovery, script path
resolution, subprocess execution with timeout, and temp file management.

Extended ImageModelType with QwenImageEdit and CosmosT2I variants, with
model-id detection routing in the inference thread for both image and video
request paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add POST /v1/benchmark endpoint that runs side-by-side comparisons of
Python MLX (GGUF) and Rust native (safetensors) inference backends.

Supports: qwen-image, flux, qwen3 LLM. Reports mean/std/min/max timing
per backend with speedup ratio and winner determination.

Model paths configured via env vars (OMINIX_FLUX_DIFFUSION_MODEL, etc.)
for the Python scripts. Rust side is tested through the normal inference
channel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures the Studio model registry ID routes correctly to the pymlx
image edit engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New pymlx_flux.rs subprocess engine wraps infer_flux2.py for ultra-fast
4-step GGUF-based image generation (~8s for 512x512).

Adds FluxKleinGguf variant to ImageModelType with detection via "gguf"
or "q4" in the model ID, distinct from the Rust safetensors FluxKlein.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- pymlx_wan22 now supports both MLX safetensors (infer_wan22_mlx.py) and
  GGUF (infer_wan22_gguf.py) model formats, auto-detected from model dir
- Prefers MLX model at ~/.OminiX/models/wan2.2-5b/mlx_model_4bit/ if present
- Broadened video request routing to match any model containing "wan2" in name
- Fixed GGUF script argument names (--cfg-scale, --sampling-steps)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant