Add benchmarking for NDD using NIM Endpoint#1495
Open
praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
Open
Add benchmarking for NDD using NIM Endpoint#1495praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
praateekmahajan wants to merge 1 commit intoNVIDIA-NeMo:huvu/nemo_data_designerfrom
Conversation
Contributor
Greptile OverviewGreptile SummaryThis PR adds benchmarking support for NeMo Data Designer (NDD) using the NVIDIA NIM endpoint. The changes include:
The implementation follows established patterns in the codebase and integrates properly with the benchmarking framework. One concern is the TODO comment on line 201 suggesting that output metrics may not be fully implemented yet in the underlying Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Orchestrator as Benchmark Orchestrator
participant Script as ndd_benchmark.py
participant Pipeline as NeMo Curator Pipeline
participant NDD as DataDesignerStage
participant NIM as NVIDIA NIM API
participant Metrics as Metrics Collection
Orchestrator->>Script: Execute with config from nightly-benchmark.yaml
Script->>Script: Parse args (model-type, model-id, input-path, etc.)
Script->>Script: Validate NVIDIA_API_KEY environment variable
Script->>Script: Load input JSONL files (gretel_symptoms dataset)
Script->>Pipeline: Create pipeline with JsonlReader, DataDesignerStage, JsonlWriter
Script->>Pipeline: Run pipeline with executor (ray_data)
loop For each input batch
Pipeline->>NDD: Process batch with DataDesigner config
NDD->>NDD: Build medical notes generation config
NDD->>NIM: Send chat completion requests (gpt-oss-20b model)
NIM-->>NDD: Return generated physician notes
NDD-->>Pipeline: Return processed batch
end
Pipeline-->>Script: Return output tasks
Script->>Metrics: Extract metrics (input/output row counts, chars, throughput)
Script->>Metrics: Write benchmark results (params.json, metrics.json, tasks.pkl)
Script-->>Orchestrator: Return success code with metrics
Orchestrator->>Orchestrator: Report metrics to Slack sink
|
Comment on lines
+201
to
+203
| # TODO: add this to data_designer.py | ||
| output_row_count = int(ndd_metrics["num_output_records"]) | ||
| output_total_chars = int(ndd_metrics["output_total_chars"]) |
Contributor
There was a problem hiding this comment.
The TODO suggests these metrics might not be properly tracked yet in data_designer.py. Verify that num_output_records and output_total_chars are actually available in ndd_metrics before using them, otherwise this may raise a KeyError at runtime.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
I changed my dataset s.t. I have 50 rows per jsonl file (so 863 rows in train become 18 files) and then using NVIDIA-NIM endpoint I get throughput od 3.72 rows per second
Usage
# Add snippet demonstrating usageChecklist