Skip to content

feat(dataset): add CreateML format support to DetectionDataset#2284

Open
madhavcodez wants to merge 9 commits into
roboflow:developfrom
madhavcodez:feat/dataset-createml
Open

feat(dataset): add CreateML format support to DetectionDataset#2284
madhavcodez wants to merge 9 commits into
roboflow:developfrom
madhavcodez:feat/dataset-createml

Conversation

@madhavcodez

Copy link
Copy Markdown
Contributor

Description

Adds CreateML object-detection format support to DetectionDataset, with
from_createml() for loading and as_createml() for exporting. supervision
already supports COCO, YOLO, and Pascal VOC; this fills the remaining common
format with a symmetric loader/exporter that mirrors those implementations.

Type of Change

  • ✨ New feature (non-breaking change which adds functionality)

Motivation and Context

CreateML is a widely used object-detection annotation format (Apple Create ML,
and one of Roboflow's dataset export options). DetectionDataset can already
round-trip COCO, YOLO, and Pascal VOC, but not CreateML, so users exporting in
that format have to convert manually before loading into supervision. This adds
first-class support following the existing from_<format> / as_<format>
convention.

No existing tracking issue — opening as a feature addition; happy to file one if
the maintainers prefer.

Changes Made

  • src/supervision/dataset/formats/createml.py — new module:
    load_createml_annotations, save_createml_annotations, and the helpers
    createml_annotations_to_detections / detections_to_createml_annotations.
    Boxes use CreateML's pixel-space centre + width/height and are converted
    to/from xyxy. Class names are inferred from the labels present in the file
    and assigned sorted, zero-based ids. Image paths are validated against the
    images directory (rejecting .. traversal, absolute paths, the directory
    itself, and directory targets), matching the COCO loader's protection.
  • src/supervision/dataset/core.pyDetectionDataset.from_createml() and
    DetectionDataset.as_createml(), plus the format import. Method docstrings
    render automatically in the API docs.
  • tests/dataset/formats/test_createml.py — unit tests for the conversion
    helpers, loader, exporter, save→load round-trip (integer and float
    coordinates), global class-id consistency across images, and the path-safety
    guards.

Testing

  • I have tested this code locally
  • I have added unit tests that prove my feature works
  • All new and existing tests pass

Local run: pytest tests/dataset/ passes (including the new test_createml.py),
and ruff check / ruff format --check are clean on the changed files.

@madhavcodez madhavcodez requested a review from SkalskiP as a code owner May 31, 2026 18:22
@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.90476% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 82%. Comparing base (14f6f24) to head (7ecb1ca).

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2284   +/-   ##
=======================================
  Coverage       82%     82%           
=======================================
  Files           66      67    +1     
  Lines         9082    9166   +84     
=======================================
+ Hits          7412    7485   +73     
- Misses        1670    1681   +11     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add DetectionDataset.from_createml and as_createml plus a new
formats/createml.py module (load/save helpers), mirroring the existing
COCO, YOLO, and Pascal VOC format support. Boxes use CreateML's
pixel-space centre + width/height and are converted to/from xyxy; class
names are inferred from the labels present in the file. Image paths are
validated against the images directory, matching the COCO loader's
path-traversal protection. Adds unit tests for the helpers, loader,
exporter, integer/float round-trip, global class-id consistency, and the
path-safety guards.
Cast the JSON payload read via read_json_file to list[CreateMLDict] and
the data passed to save_json_file to dict[str, Any] (both helpers are
annotated for dict only), and iterate xyxy/class_id arrays directly so
the class_id None-guard narrows the loop variable for mypy.
@madhavcodez

Copy link
Copy Markdown
Contributor Author

Rebased onto the latest develop, conflicts resolved, CI green. Ready for review whenever a maintainer has a moment.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class CreateML object-detection dataset support to supervision’s DetectionDataset, complementing existing COCO / YOLO / Pascal VOC loaders/exporters.

Changes:

  • Introduces a new CreateML format module with load/export + conversion helpers.
  • Adds DetectionDataset.from_createml() and DetectionDataset.as_createml() APIs.
  • Adds unit tests for conversions, loader/exporter behavior, round-trips, and path-safety checks; updates changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/supervision/dataset/formats/createml.py Implements CreateML JSON parsing/export and CreateML↔Detections conversions, including image-path validation.
src/supervision/dataset/core.py Wires CreateML support into DetectionDataset via new from_createml / as_createml methods.
tests/dataset/formats/test_createml.py Adds coverage for conversion correctness, round-trips, global class-id stability, and path-safety guards.
docs/changelog.md Documents the new CreateML dataset support in the unreleased changelog section.

Comment thread src/supervision/dataset/formats/createml.py
Comment thread src/supervision/dataset/core.py
Borda and others added 4 commits June 24, 2026 13:33
- add show_progress: bool = False to load_createml_annotations, save_dataset_images, from_createml, as_createml with tqdm.auto wrapping
- add Google-style docstrings to all four public callables in createml.py
- validate JSON root is list in load_createml_annotations; raise ValueError on dict
- guard missing 'image' key per entry; raise ValueError with full entry repr
- detect duplicate image entries; raise ValueError naming the duplicate
- wrap annotation field access in try/except KeyError/TypeError; raise ValueError("Malformed...")
- replace cast("dict[str, Any]", ...) with # type: ignore[arg-type] in save_createml_annotations

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- group tests into four classes: TestCreatemlAnnotationsToDetections, TestDetectionsToCreatemlAnnotations, TestLoadCreatemlAnnotations, TestSaveCreatemlAnnotations
- add pytest.param(id=...) slugs to all parametrize cases
- add one-line docstring to every test method
- add module-level docstring
- add error-path tests: missing 'coordinates' key, missing 'label' key, missing coordinate sub-key, coordinates=None, JSON root is dict, missing 'image' key, duplicate image entry
- total 21 tests (up from 9)

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- update docs/datasets/core.md description frontmatter: include CreateML alongside YOLO/COCO/VOC
- update docs/llms.txt: add CreateML to format enumerations in two places
- update docs/how_to/process_datasets.md FAQ: add from_createml/as_createml to format list

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread src/supervision/dataset/formats/createml.py Outdated
Comment thread src/supervision/dataset/formats/createml.py
Comment thread src/supervision/dataset/formats/createml.py
Comment thread src/supervision/dataset/formats/createml.py
Comment thread docs/llms.txt Outdated
- Wrap class-discovery set comprehension in try/except to raise
  clear ValueError instead of KeyError/TypeError on missing 'label'
- Use `entry.get("annotations") or []` in both the class-scan and
  the per-entry loop to handle explicit JSON `"annotations": null`
- Normalise image_name via Path(image_name) in _resolve_image_path
  so `./a.jpg` segments can not bypass duplicate-image checks
- Correct docstring: image_paths keys are joined (not fully resolved)
  paths, matching actual return value of _resolve_image_path

[resolve group] PR roboflow#2284 — items 3,4,5,6
Review suggestions by @Copilot

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Method list in the CreateML support FAQ entry omitted the new
from_createml() and as_createml() methods added by PR roboflow#2284.

[resolve group] PR roboflow#2284 — item 7
Review suggestion by @Copilot

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants