Add benchmarking for NDD using NIM Endpoint by praateekmahajan · Pull Request #1495 · NVIDIA-NeMo/Curator

praateekmahajan · 2026-02-12T02:11:21Z

Description

I changed my dataset s.t. I have 50 rows per jsonl file (so 863 rows in train become 18 files) and then using NVIDIA-NIM endpoint I get throughput od 3.72 rows per second

Usage

# Add snippet demonstrating usage

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Praateek <praateekm@gmail.com>

greptile-apps · 2026-02-12T02:13:55Z

Greptile Overview

Greptile Summary

This PR adds benchmarking support for NeMo Data Designer (NDD) using the NVIDIA NIM endpoint. The changes include:

Added a new gretel_symptoms dataset configuration in nightly-benchmark.yaml
Created a new benchmark entry ndd_nvidia_nim with appropriate Ray configuration and Slack metrics reporting
Implemented ndd_benchmark.py script following the existing benchmark pattern from other scripts in the repository
The benchmark uses the DataDesignerStage with a medical notes generation task as a demonstration

The implementation follows established patterns in the codebase and integrates properly with the benchmarking framework. One concern is the TODO comment on line 201 suggesting that output metrics may not be fully implemented yet in the underlying data_designer.py module.

Confidence Score: 4/5

This PR is mostly safe to merge with minor concerns about metric extraction
The implementation follows established patterns and integrates properly with the benchmarking framework. However, there's a TODO comment indicating that output metrics might not be fully implemented yet, which could cause runtime errors. The benchmark also lacks requirements validation that other benchmarks have.
Pay attention to benchmarking/scripts/ndd_benchmark.py - verify that the metrics extraction on lines 199-203 works correctly at runtime

Important Files Changed

Filename	Overview
benchmarking/nightly-benchmark.yaml	Added `gretel_symptoms` dataset and `ndd_nvidia_nim` benchmark entry with appropriate configuration
benchmarking/scripts/ndd_benchmark.py	New NDD benchmarking script for synthetic data generation - missing requirements section and has a TODO for metric extraction

Sequence Diagram

sequenceDiagram
    participant Orchestrator as Benchmark Orchestrator
    participant Script as ndd_benchmark.py
    participant Pipeline as NeMo Curator Pipeline
    participant NDD as DataDesignerStage
    participant NIM as NVIDIA NIM API
    participant Metrics as Metrics Collection

    Orchestrator->>Script: Execute with config from nightly-benchmark.yaml
    Script->>Script: Parse args (model-type, model-id, input-path, etc.)
    Script->>Script: Validate NVIDIA_API_KEY environment variable
    Script->>Script: Load input JSONL files (gretel_symptoms dataset)
    Script->>Pipeline: Create pipeline with JsonlReader, DataDesignerStage, JsonlWriter
    Script->>Pipeline: Run pipeline with executor (ray_data)
    
    loop For each input batch
        Pipeline->>NDD: Process batch with DataDesigner config
        NDD->>NDD: Build medical notes generation config
        NDD->>NIM: Send chat completion requests (gpt-oss-20b model)
        NIM-->>NDD: Return generated physician notes
        NDD-->>Pipeline: Return processed batch
    end
    
    Pipeline-->>Script: Return output tasks
    Script->>Metrics: Extract metrics (input/output row counts, chars, throughput)
    Script->>Metrics: Write benchmark results (params.json, metrics.json, tasks.pkl)
    Script-->>Orchestrator: Return success code with metrics
    Orchestrator->>Orchestrator: Report metrics to Slack sink

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T02:13:59Z

benchmarking/scripts/ndd_benchmark.py

+    # TODO: add this to data_designer.py
+    output_row_count = int(ndd_metrics["num_output_records"])
+    output_total_chars = int(ndd_metrics["output_total_chars"])


The TODO suggests these metrics might not be properly tracked yet in data_designer.py. Verify that num_output_records and output_total_chars are actually available in ndd_metrics before using them, otherwise this may raise a KeyError at runtime.

fc

fcb6e6f

Signed-off-by: Praateek <praateekm@gmail.com>

copy-pr-bot bot temporarily deployed to test February 12, 2026 02:11 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 12, 2026 02:11 Failure

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking for NDD using NIM Endpoint#1495

Add benchmarking for NDD using NIM Endpoint#1495
praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
praateekmahajan:praateek/ndd-benchmark

praateekmahajan commented Feb 12, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 12, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

praateekmahajan commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Checklist

Uh oh!

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

praateekmahajan commented Feb 12, 2026 •

edited

Loading