Skip to content

Fix test_collect_hidden_states: use synthetic short conversations#1234

Merged
kevalmorabia97 merged 5 commits intomainfrom
yeyu/fix-test-collect-hidden-states-length-filter
Apr 10, 2026
Merged

Fix test_collect_hidden_states: use synthetic short conversations#1234
kevalmorabia97 merged 5 commits intomainfrom
yeyu/fix-test-collect-hidden-states-length-filter

Conversation

@yeyu-nvidia
Copy link
Copy Markdown
Contributor

@yeyu-nvidia yeyu-nvidia commented Apr 10, 2026

Summary

  • test_collect_hidden_states was using real daring-anteater conversations (typically 1000+ tokens) but the tiny test model has max_position_embeddings=32. Both sampled conversations exceeded the default --max-seq-len 3072 filter, producing zero .pt files and failing the assertion.
  • Added a tiny_conversations_path fixture with synthetic short single-turn conversations that tokenize within max_position_embeddings=32.
  • Changed test_collect_hidden_states to use this fixture with --max-seq-len 32.
  • Added a None guard for tokenizer.chat_template.replace(...) to avoid AttributeError when the tokenizer has no chat template.

Test plan

  • pytest tests/examples/speculative_decoding/test_eagle_offline_ptq.py::test_collect_hidden_states passes
  • CI speculative_decoding job passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Resolved compatibility issues when tokenizers do not have a chat template configuration by adding proper error handling.
    • Standardized tokenization input extraction logic across different transformer library versions for consistent behavior.
  • Tests

    • Enhanced test infrastructure with new conversation data fixtures and improved sequence length validation for speculative decoding examples.

yeyu-nvidia and others added 5 commits April 9, 2026 10:50
…emplate

In transformers 4.46+, apply_chat_template() with return_tensors="pt"
returns a BatchEncoding object that no longer subclasses dict. The
previous isinstance(tokenized, dict) guard evaluated to False and fell
through to tokenized (the BatchEncoding), causing input_ids.shape[1] to
call BatchEncoding.__getattr__("shape") and raise AttributeError.

Fix by checking isinstance(tokenized, torch.Tensor) instead, which
correctly handles both old transformers (plain tensor return) and new
transformers (BatchEncoding return).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Since transformers 4.46+ is required, apply_chat_template always
returns a BatchEncoding. Drop the torch.Tensor fallback and just
index ["input_ids"] directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Without return_dict=True, apply_chat_template returns a raw torch.Tensor
on transformers <5.0 (default return_dict=False) and a BatchEncoding on
transformers >=5.0 (default changed to True). Subscripting a Tensor with
["input_ids"] raises TypeError on <5.0.

Passing return_dict=True explicitly forces BatchEncoding on all versions
(verified locally on 4.57.1 and 5.0.0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
The test was using real daring-anteater conversations (typically 1000+
tokens) with the default --max-seq-len 3072, but the tiny test model has
max_position_embeddings=32. Long conversations were being silently filtered
out ("Skipped N conversations due to length constraints"), producing zero
.pt files and failing the assertion.

Fix by:
- Adding a tiny_conversations_path fixture with synthetic short single-turn
  conversations that tokenize well within max_position_embeddings=32
- Passing --max-seq-len 32 in the test to match the model's capacity
- Guarding tokenizer.chat_template.replace() against None chat_template

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@yeyu-nvidia yeyu-nvidia requested a review from a team as a code owner April 10, 2026 22:07
@yeyu-nvidia yeyu-nvidia requested a review from ChenhanYu April 10, 2026 22:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

A guard was added to prevent attribute errors on tokenizers without chat templates. Tokenization logic was updated to consistently use BatchEncoding with return_dict=True for normalized input extraction. A new pytest fixture for generating test conversations was introduced, and an existing test was refactored to use it with additional sequence length constraints.

Changes

Cohort / File(s) Summary
Chat Template & Tokenization Updates
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py
Added null check for tokenizer.chat_template before modification. Updated submit_generates() to enforce return_dict=True when applying chat template, removing conditional branching for input_ids extraction.
Test Fixture
tests/examples/speculative_decoding/conftest.py
Added session-scoped pytest fixture tiny_conversations_path that generates a temporary JSONL file with 5 synthetic single-turn conversations, including new json import.
Test Integration
tests/examples/speculative_decoding/test_eagle_offline_ptq.py
Updated test_collect_hidden_states to consume conversation input from new tiny_conversations_path fixture instead of tiny_daring_anteater_path, and added --max-seq-len 32 parameter to command invocation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing the test_collect_hidden_states by using synthetic short conversations instead of real lengthy ones.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed PR adheres to SECURITY.md requirements. torch.load in test file is explicitly exempted, trust_remote_code is correctly implemented as caller-configurable defaulting to False, and no other security anti-patterns found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yeyu/fix-test-collect-hidden-states-length-filter

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-10 22:55 UTC

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`:
- Around line 213-215: The code unconditionally calls
tokenizer.apply_chat_template which raises if tokenizer.chat_template is None;
update the tokenization path to check tokenizer.chat_template and use
apply_chat_template only when present, otherwise call tokenizer(conversations,
return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent
non-chat tokenization) to produce the same return structure and extract
["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure
downstream code still expects the same keys (e.g., input_ids) so
compute_hidden_states_hf.py's tokenization works for tokenizers both with and
without chat templates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5a8ffb0a-3aaf-4205-bd70-e6161935648a

📥 Commits

Reviewing files that changed from the base of the PR and between da0e8ff and 4e12bb4.

📒 Files selected for processing (3)
  • examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py
  • tests/examples/speculative_decoding/conftest.py
  • tests/examples/speculative_decoding/test_eagle_offline_ptq.py

Comment on lines +213 to +215
input_ids = tokenizer.apply_chat_template(
conversations, return_tensors="pt", return_dict=True, add_generation_template=False
)["input_ids"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Hugging Face Transformers, what is the expected behavior of tokenizer.apply_chat_template(...) when tokenizer.chat_template is None, and what fallback is recommended?

💡 Result:

When tokenizer.chat_template is None and no chat_template argument is passed to tokenizer.apply_chat_template(...), it raises a ValueError with the message: "Cannot use apply_chat_template because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating". The recommended fallback is to either set tokenizer.chat_template to a suitable Jinja template string (e.g., copy from a similar model or use ChatML format), or pass a chat_template string directly as a keyword argument to apply_chat_template. Official docs encourage setting tokenizer.chat_template explicitly and pushing to the Hub for chat models lacking one. Recent versions removed automatic class-level default templates to avoid issues.

Citations:


🏁 Script executed:

cat -n examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py | sed -n '135,220p'

Repository: NVIDIA/Model-Optimizer

Length of output: 4355


Handle tokenizers without chat templates in the tokenization path.

Line 213 unconditionally calls apply_chat_template; if tokenizer.chat_template is None, this raises a ValueError at runtime even though a check exists at line 145. The line 145 guard only modifies an existing template and does not prevent this code path. Add a fallback for tokenizers without chat templates:

Proposed fix
-            input_ids = tokenizer.apply_chat_template(
-                conversations, return_tensors="pt", return_dict=True, add_generation_template=False
-            )["input_ids"]
+            if tokenizer.chat_template is not None:
+                input_ids = tokenizer.apply_chat_template(
+                    conversations,
+                    return_tensors="pt",
+                    return_dict=True,
+                    add_generation_template=False,
+                )["input_ids"]
+            else:
+                plain_text = "\n".join(
+                    f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations
+                )
+                input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
input_ids = tokenizer.apply_chat_template(
conversations, return_tensors="pt", return_dict=True, add_generation_template=False
)["input_ids"]
if tokenizer.chat_template is not None:
input_ids = tokenizer.apply_chat_template(
conversations,
return_tensors="pt",
return_dict=True,
add_generation_template=False,
)["input_ids"]
else:
plain_text = "\n".join(
f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations
)
input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`
around lines 213 - 215, The code unconditionally calls
tokenizer.apply_chat_template which raises if tokenizer.chat_template is None;
update the tokenization path to check tokenizer.chat_template and use
apply_chat_template only when present, otherwise call tokenizer(conversations,
return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent
non-chat tokenization) to produce the same return structure and extract
["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure
downstream code still expects the same keys (e.g., input_ids) so
compute_hidden_states_hf.py's tokenization works for tokenizers both with and
without chat templates.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.42%. Comparing base (14e9bea) to head (4e12bb4).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1234      +/-   ##
==========================================
+ Coverage   75.68%   76.42%   +0.74%     
==========================================
  Files         353      353              
  Lines       40491    41375     +884     
==========================================
+ Hits        30644    31621     +977     
+ Misses       9847     9754      -93     
Flag Coverage Δ
examples 44.49% <ø> (+2.76%) ⬆️
unit 55.53% <ø> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yeyu-nvidia yeyu-nvidia enabled auto-merge (squash) April 10, 2026 22:42
@kevalmorabia97 kevalmorabia97 disabled auto-merge April 10, 2026 22:48
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) April 10, 2026 22:48
@kevalmorabia97 kevalmorabia97 merged commit 9050188 into main Apr 10, 2026
42 checks passed
@kevalmorabia97 kevalmorabia97 deleted the yeyu/fix-test-collect-hidden-states-length-filter branch April 10, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants