Fix test_collect_hidden_states: use synthetic short conversations by yeyu-nvidia · Pull Request #1234 · NVIDIA/Model-Optimizer

yeyu-nvidia · 2026-04-10T22:07:25Z

Summary

test_collect_hidden_states was using real daring-anteater conversations (typically 1000+ tokens) but the tiny test model has max_position_embeddings=32. Both sampled conversations exceeded the default --max-seq-len 3072 filter, producing zero .pt files and failing the assertion.
Added a tiny_conversations_path fixture with synthetic short single-turn conversations that tokenize within max_position_embeddings=32.
Changed test_collect_hidden_states to use this fixture with --max-seq-len 32.
Added a None guard for tokenizer.chat_template.replace(...) to avoid AttributeError when the tokenizer has no chat template.

Test plan

pytest tests/examples/speculative_decoding/test_eagle_offline_ptq.py::test_collect_hidden_states passes
CI speculative_decoding job passes

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Resolved compatibility issues when tokenizers do not have a chat template configuration by adding proper error handling.
- Standardized tokenization input extraction logic across different transformer library versions for consistent behavior.
Tests
- Enhanced test infrastructure with new conversation data fixtures and improved sequence length validation for speculative decoding examples.

…emplate In transformers 4.46+, apply_chat_template() with return_tensors="pt" returns a BatchEncoding object that no longer subclasses dict. The previous isinstance(tokenized, dict) guard evaluated to False and fell through to tokenized (the BatchEncoding), causing input_ids.shape[1] to call BatchEncoding.__getattr__("shape") and raise AttributeError. Fix by checking isinstance(tokenized, torch.Tensor) instead, which correctly handles both old transformers (plain tensor return) and new transformers (BatchEncoding return). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

Since transformers 4.46+ is required, apply_chat_template always returns a BatchEncoding. Drop the torch.Tensor fallback and just index ["input_ids"] directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

Without return_dict=True, apply_chat_template returns a raw torch.Tensor on transformers <5.0 (default return_dict=False) and a BatchEncoding on transformers >=5.0 (default changed to True). Subscripting a Tensor with ["input_ids"] raises TypeError on <5.0. Passing return_dict=True explicitly forces BatchEncoding on all versions (verified locally on 4.57.1 and 5.0.0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

The test was using real daring-anteater conversations (typically 1000+ tokens) with the default --max-seq-len 3072, but the tiny test model has max_position_embeddings=32. Long conversations were being silently filtered out ("Skipped N conversations due to length constraints"), producing zero .pt files and failing the assertion. Fix by: - Adding a tiny_conversations_path fixture with synthetic short single-turn conversations that tokenize well within max_position_embeddings=32 - Passing --max-seq-len 32 in the test to match the model's capacity - Guarding tokenizer.chat_template.replace() against None chat_template Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

coderabbitai · 2026-04-10T22:08:55Z

📝 Walkthrough

Walkthrough

A guard was added to prevent attribute errors on tokenizers without chat templates. Tokenization logic was updated to consistently use BatchEncoding with return_dict=True for normalized input extraction. A new pytest fixture for generating test conversations was introduced, and an existing test was refactored to use it with additional sequence length constraints.

Changes

Cohort / File(s)	Summary
Chat Template & Tokenization Updates `examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`	Added null check for `tokenizer.chat_template` before modification. Updated `submit_generates()` to enforce `return_dict=True` when applying chat template, removing conditional branching for `input_ids` extraction.
Test Fixture `tests/examples/speculative_decoding/conftest.py`	Added session-scoped pytest fixture `tiny_conversations_path` that generates a temporary JSONL file with 5 synthetic single-turn conversations, including new `json` import.
Test Integration `tests/examples/speculative_decoding/test_eagle_offline_ptq.py`	Updated `test_collect_hidden_states` to consume conversation input from new `tiny_conversations_path` fixture instead of `tiny_daring_anteater_path`, and added `--max-seq-len 32` parameter to command invocation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing the test_collect_hidden_states by using synthetic short conversations instead of real lengthy ones.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	PR adheres to SECURITY.md requirements. torch.load in test file is explicitly exempted, trust_remote_code is correctly implemented as caller-configurable defaulting to False, and no other security anti-patterns found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch yeyu/fix-test-collect-hidden-states-length-filter

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-10T22:12:25Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-10 22:55 UTC

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`:
- Around line 213-215: The code unconditionally calls
tokenizer.apply_chat_template which raises if tokenizer.chat_template is None;
update the tokenization path to check tokenizer.chat_template and use
apply_chat_template only when present, otherwise call tokenizer(conversations,
return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent
non-chat tokenization) to produce the same return structure and extract
["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure
downstream code still expects the same keys (e.g., input_ids) so
compute_hidden_states_hf.py's tokenization works for tokenizers both with and
without chat templates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5a8ffb0a-3aaf-4205-bd70-e6161935648a

📥 Commits

Reviewing files that changed from the base of the PR and between da0e8ff and 4e12bb4.

📒 Files selected for processing (3)

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py
tests/examples/speculative_decoding/conftest.py
tests/examples/speculative_decoding/test_eagle_offline_ptq.py

coderabbitai · 2026-04-10T22:13:46Z

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

+            input_ids = tokenizer.apply_chat_template(
+                conversations, return_tensors="pt", return_dict=True, add_generation_template=False
+            )["input_ids"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Hugging Face Transformers, what is the expected behavior of tokenizer.apply_chat_template(...) when tokenizer.chat_template is None, and what fallback is recommended?

💡 Result:

When tokenizer.chat_template is None and no chat_template argument is passed to tokenizer.apply_chat_template(...), it raises a ValueError with the message: "Cannot use apply_chat_template because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating". The recommended fallback is to either set tokenizer.chat_template to a suitable Jinja template string (e.g., copy from a similar model or use ChatML format), or pass a chat_template string directly as a keyword argument to apply_chat_template. Official docs encourage setting tokenizer.chat_template explicitly and pushing to the Hub for chat models lacking one. Recent versions removed automatic class-level default templates to avoid issues.

Citations:

1: https://discuss.huggingface.co/t/chat-template-is-not-set-throwing-error/104095

2: ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set huggingface/transformers#33246

3: https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py

4: https://huggingface.co/docs/transformers/v4.49.0/en/chat_template_advanced

5: https://huggingface.co/docs/transformers/v4.43.2/chat_templating

6: https://huggingface.co/docs/transformers/v4.36.1/chat_templating

🏁 Script executed:

cat -n examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py | sed -n '135,220p'

Repository: NVIDIA/Model-Optimizer

Length of output: 4355

Handle tokenizers without chat templates in the tokenization path.

Line 213 unconditionally calls apply_chat_template; if tokenizer.chat_template is None, this raises a ValueError at runtime even though a check exists at line 145. The line 145 guard only modifies an existing template and does not prevent this code path. Add a fallback for tokenizers without chat templates:

Proposed fix

- input_ids = tokenizer.apply_chat_template( - conversations, return_tensors="pt", return_dict=True, add_generation_template=False - )["input_ids"] + if tokenizer.chat_template is not None: + input_ids = tokenizer.apply_chat_template( + conversations, + return_tensors="pt", + return_dict=True, + add_generation_template=False, + )["input_ids"] + else: + plain_text = "\n".join( + f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations + ) + input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

input_ids = tokenizer.apply_chat_template(

conversations, return_tensors="pt", return_dict=True, add_generation_template=False

)["input_ids"]

if tokenizer.chat_template is not None:

input_ids = tokenizer.apply_chat_template(

conversations,

return_tensors="pt",

return_dict=True,

add_generation_template=False,

)["input_ids"]

else:

plain_text = "\n".join(

f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations

)

input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py` around lines 213 - 215, The code unconditionally calls tokenizer.apply_chat_template which raises if tokenizer.chat_template is None; update the tokenization path to check tokenizer.chat_template and use apply_chat_template only when present, otherwise call tokenizer(conversations, return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent non-chat tokenization) to produce the same return structure and extract ["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure downstream code still expects the same keys (e.g., input_ids) so compute_hidden_states_hf.py's tokenization works for tokenizers both with and without chat templates.

codecov · 2026-04-10T22:20:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.42%. Comparing base (14e9bea) to head (4e12bb4).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1234      +/-   ##
==========================================
+ Coverage   75.68%   76.42%   +0.74%     
==========================================
  Files         353      353              
  Lines       40491    41375     +884     
==========================================
+ Hits        30644    31621     +977     
+ Misses       9847     9754      -93

Flag	Coverage Δ
examples	`44.49% <ø> (+2.76%)`	⬆️
unit	`55.53% <ø> (+0.27%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yeyu-nvidia and others added 5 commits April 9, 2026 10:50

Fix ruff format: collapse apply_chat_template kwargs to single line

4be7f6f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

yeyu-nvidia requested a review from a team as a code owner April 10, 2026 22:07

yeyu-nvidia requested a review from ChenhanYu April 10, 2026 22:07

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

kevalmorabia97 approved these changes Apr 10, 2026

View reviewed changes

yeyu-nvidia enabled auto-merge (squash) April 10, 2026 22:42

kevalmorabia97 disabled auto-merge April 10, 2026 22:48

kevalmorabia97 enabled auto-merge (squash) April 10, 2026 22:48

kevalmorabia97 merged commit 9050188 into main Apr 10, 2026
42 checks passed

kevalmorabia97 deleted the yeyu/fix-test-collect-hidden-states-length-filter branch April 10, 2026 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_collect_hidden_states: use synthetic short conversations#1234

Fix test_collect_hidden_states: use synthetic short conversations#1234
kevalmorabia97 merged 5 commits intomainfrom
yeyu/fix-test-collect-hidden-states-length-filter

yeyu-nvidia commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

codecov bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeyu-nvidia commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yeyu-nvidia commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading

codecov bot commented Apr 10, 2026 •

edited

Loading