feat(dataset): add CreateML format support to DetectionDataset#2284
Open
madhavcodez wants to merge 9 commits into
Open
feat(dataset): add CreateML format support to DetectionDataset#2284madhavcodez wants to merge 9 commits into
madhavcodez wants to merge 9 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2284 +/- ##
=======================================
Coverage 82% 82%
=======================================
Files 66 67 +1
Lines 9082 9166 +84
=======================================
+ Hits 7412 7485 +73
- Misses 1670 1681 +11 🚀 New features to boost your workflow:
|
0d2ae43 to
0c6adfa
Compare
Add DetectionDataset.from_createml and as_createml plus a new formats/createml.py module (load/save helpers), mirroring the existing COCO, YOLO, and Pascal VOC format support. Boxes use CreateML's pixel-space centre + width/height and are converted to/from xyxy; class names are inferred from the labels present in the file. Image paths are validated against the images directory, matching the COCO loader's path-traversal protection. Adds unit tests for the helpers, loader, exporter, integer/float round-trip, global class-id consistency, and the path-safety guards.
Cast the JSON payload read via read_json_file to list[CreateMLDict] and the data passed to save_json_file to dict[str, Any] (both helpers are annotated for dict only), and iterate xyxy/class_id arrays directly so the class_id None-guard narrows the loop variable for mypy.
0c6adfa to
1dab055
Compare
Contributor
Author
|
Rebased onto the latest |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds first-class CreateML object-detection dataset support to supervision’s DetectionDataset, complementing existing COCO / YOLO / Pascal VOC loaders/exporters.
Changes:
- Introduces a new CreateML format module with load/export + conversion helpers.
- Adds
DetectionDataset.from_createml()andDetectionDataset.as_createml()APIs. - Adds unit tests for conversions, loader/exporter behavior, round-trips, and path-safety checks; updates changelog.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/supervision/dataset/formats/createml.py |
Implements CreateML JSON parsing/export and CreateML↔Detections conversions, including image-path validation. |
src/supervision/dataset/core.py |
Wires CreateML support into DetectionDataset via new from_createml / as_createml methods. |
tests/dataset/formats/test_createml.py |
Adds coverage for conversion correctness, round-trips, global class-id stability, and path-safety guards. |
docs/changelog.md |
Documents the new CreateML dataset support in the unreleased changelog section. |
- add show_progress: bool = False to load_createml_annotations, save_dataset_images, from_createml, as_createml with tqdm.auto wrapping
- add Google-style docstrings to all four public callables in createml.py
- validate JSON root is list in load_createml_annotations; raise ValueError on dict
- guard missing 'image' key per entry; raise ValueError with full entry repr
- detect duplicate image entries; raise ValueError naming the duplicate
- wrap annotation field access in try/except KeyError/TypeError; raise ValueError("Malformed...")
- replace cast("dict[str, Any]", ...) with # type: ignore[arg-type] in save_createml_annotations
---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- group tests into four classes: TestCreatemlAnnotationsToDetections, TestDetectionsToCreatemlAnnotations, TestLoadCreatemlAnnotations, TestSaveCreatemlAnnotations - add pytest.param(id=...) slugs to all parametrize cases - add one-line docstring to every test method - add module-level docstring - add error-path tests: missing 'coordinates' key, missing 'label' key, missing coordinate sub-key, coordinates=None, JSON root is dict, missing 'image' key, duplicate image entry - total 21 tests (up from 9) --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- update docs/datasets/core.md description frontmatter: include CreateML alongside YOLO/COCO/VOC - update docs/llms.txt: add CreateML to format enumerations in two places - update docs/how_to/process_datasets.md FAQ: add from_createml/as_createml to format list --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
- Wrap class-discovery set comprehension in try/except to raise
clear ValueError instead of KeyError/TypeError on missing 'label'
- Use `entry.get("annotations") or []` in both the class-scan and
the per-entry loop to handle explicit JSON `"annotations": null`
- Normalise image_name via Path(image_name) in _resolve_image_path
so `./a.jpg` segments can not bypass duplicate-image checks
- Correct docstring: image_paths keys are joined (not fully resolved)
paths, matching actual return value of _resolve_image_path
[resolve group] PR roboflow#2284 — items 3,4,5,6
Review suggestions by @Copilot
---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Method list in the CreateML support FAQ entry omitted the new from_createml() and as_createml() methods added by PR roboflow#2284. [resolve group] PR roboflow#2284 — item 7 Review suggestion by @Copilot --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds CreateML object-detection format support to
DetectionDataset, withfrom_createml()for loading andas_createml()for exporting.supervisionalready supports COCO, YOLO, and Pascal VOC; this fills the remaining common
format with a symmetric loader/exporter that mirrors those implementations.
Type of Change
Motivation and Context
CreateML is a widely used object-detection annotation format (Apple Create ML,
and one of Roboflow's dataset export options).
DetectionDatasetcan alreadyround-trip COCO, YOLO, and Pascal VOC, but not CreateML, so users exporting in
that format have to convert manually before loading into supervision. This adds
first-class support following the existing
from_<format>/as_<format>convention.
No existing tracking issue — opening as a feature addition; happy to file one if
the maintainers prefer.
Changes Made
src/supervision/dataset/formats/createml.py— new module:load_createml_annotations,save_createml_annotations, and the helperscreateml_annotations_to_detections/detections_to_createml_annotations.Boxes use CreateML's pixel-space centre + width/height and are converted
to/from
xyxy. Class names are inferred from the labels present in the fileand assigned sorted, zero-based ids. Image paths are validated against the
images directory (rejecting
..traversal, absolute paths, the directoryitself, and directory targets), matching the COCO loader's protection.
src/supervision/dataset/core.py—DetectionDataset.from_createml()andDetectionDataset.as_createml(), plus the format import. Method docstringsrender automatically in the API docs.
tests/dataset/formats/test_createml.py— unit tests for the conversionhelpers, loader, exporter, save→load round-trip (integer and float
coordinates), global class-id consistency across images, and the path-safety
guards.
Testing
Local run:
pytest tests/dataset/passes (including the newtest_createml.py),and
ruff check/ruff format --checkare clean on the changed files.