Skip to content

Add benchmarking for NDD using NIM Endpoint#1495

Open
praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
praateekmahajan:praateek/ndd-benchmark
Open

Add benchmarking for NDD using NIM Endpoint#1495
praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
praateekmahajan:praateek/ndd-benchmark

Conversation

@praateekmahajan
Copy link
Contributor

@praateekmahajan praateekmahajan commented Feb 12, 2026

Description

I changed my dataset s.t. I have 50 rows per jsonl file (so 863 rows in train become 18 files) and then using NVIDIA-NIM endpoint I get throughput od 3.72 rows per second

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Praateek <praateekm@gmail.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

This PR adds benchmarking support for NeMo Data Designer (NDD) using the NVIDIA NIM endpoint. The changes include:

  • Added a new gretel_symptoms dataset configuration in nightly-benchmark.yaml
  • Created a new benchmark entry ndd_nvidia_nim with appropriate Ray configuration and Slack metrics reporting
  • Implemented ndd_benchmark.py script following the existing benchmark pattern from other scripts in the repository
  • The benchmark uses the DataDesignerStage with a medical notes generation task as a demonstration

The implementation follows established patterns in the codebase and integrates properly with the benchmarking framework. One concern is the TODO comment on line 201 suggesting that output metrics may not be fully implemented yet in the underlying data_designer.py module.

Confidence Score: 4/5

  • This PR is mostly safe to merge with minor concerns about metric extraction
  • The implementation follows established patterns and integrates properly with the benchmarking framework. However, there's a TODO comment indicating that output metrics might not be fully implemented yet, which could cause runtime errors. The benchmark also lacks requirements validation that other benchmarks have.
  • Pay attention to benchmarking/scripts/ndd_benchmark.py - verify that the metrics extraction on lines 199-203 works correctly at runtime

Important Files Changed

Filename Overview
benchmarking/nightly-benchmark.yaml Added gretel_symptoms dataset and ndd_nvidia_nim benchmark entry with appropriate configuration
benchmarking/scripts/ndd_benchmark.py New NDD benchmarking script for synthetic data generation - missing requirements section and has a TODO for metric extraction

Sequence Diagram

sequenceDiagram
    participant Orchestrator as Benchmark Orchestrator
    participant Script as ndd_benchmark.py
    participant Pipeline as NeMo Curator Pipeline
    participant NDD as DataDesignerStage
    participant NIM as NVIDIA NIM API
    participant Metrics as Metrics Collection

    Orchestrator->>Script: Execute with config from nightly-benchmark.yaml
    Script->>Script: Parse args (model-type, model-id, input-path, etc.)
    Script->>Script: Validate NVIDIA_API_KEY environment variable
    Script->>Script: Load input JSONL files (gretel_symptoms dataset)
    Script->>Pipeline: Create pipeline with JsonlReader, DataDesignerStage, JsonlWriter
    Script->>Pipeline: Run pipeline with executor (ray_data)
    
    loop For each input batch
        Pipeline->>NDD: Process batch with DataDesigner config
        NDD->>NDD: Build medical notes generation config
        NDD->>NIM: Send chat completion requests (gpt-oss-20b model)
        NIM-->>NDD: Return generated physician notes
        NDD-->>Pipeline: Return processed batch
    end
    
    Pipeline-->>Script: Return output tasks
    Script->>Metrics: Extract metrics (input/output row counts, chars, throughput)
    Script->>Metrics: Write benchmark results (params.json, metrics.json, tasks.pkl)
    Script-->>Orchestrator: Return success code with metrics
    Orchestrator->>Orchestrator: Report metrics to Slack sink
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +201 to +203
# TODO: add this to data_designer.py
output_row_count = int(ndd_metrics["num_output_records"])
output_total_chars = int(ndd_metrics["output_total_chars"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO suggests these metrics might not be properly tracked yet in data_designer.py. Verify that num_output_records and output_total_chars are actually available in ndd_metrics before using them, otherwise this may raise a KeyError at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant