Fix test_collect_hidden_states: use synthetic short conversations#1234
Conversation
…emplate
In transformers 4.46+, apply_chat_template() with return_tensors="pt"
returns a BatchEncoding object that no longer subclasses dict. The
previous isinstance(tokenized, dict) guard evaluated to False and fell
through to tokenized (the BatchEncoding), causing input_ids.shape[1] to
call BatchEncoding.__getattr__("shape") and raise AttributeError.
Fix by checking isinstance(tokenized, torch.Tensor) instead, which
correctly handles both old transformers (plain tensor return) and new
transformers (BatchEncoding return).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Since transformers 4.46+ is required, apply_chat_template always returns a BatchEncoding. Drop the torch.Tensor fallback and just index ["input_ids"] directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
Without return_dict=True, apply_chat_template returns a raw torch.Tensor on transformers <5.0 (default return_dict=False) and a BatchEncoding on transformers >=5.0 (default changed to True). Subscripting a Tensor with ["input_ids"] raises TypeError on <5.0. Passing return_dict=True explicitly forces BatchEncoding on all versions (verified locally on 4.57.1 and 5.0.0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
The test was using real daring-anteater conversations (typically 1000+
tokens) with the default --max-seq-len 3072, but the tiny test model has
max_position_embeddings=32. Long conversations were being silently filtered
out ("Skipped N conversations due to length constraints"), producing zero
.pt files and failing the assertion.
Fix by:
- Adding a tiny_conversations_path fixture with synthetic short single-turn
conversations that tokenize well within max_position_embeddings=32
- Passing --max-seq-len 32 in the test to match the model's capacity
- Guarding tokenizer.chat_template.replace() against None chat_template
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
📝 WalkthroughWalkthroughA guard was added to prevent attribute errors on tokenizers without chat templates. Tokenization logic was updated to consistently use Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`:
- Around line 213-215: The code unconditionally calls
tokenizer.apply_chat_template which raises if tokenizer.chat_template is None;
update the tokenization path to check tokenizer.chat_template and use
apply_chat_template only when present, otherwise call tokenizer(conversations,
return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent
non-chat tokenization) to produce the same return structure and extract
["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure
downstream code still expects the same keys (e.g., input_ids) so
compute_hidden_states_hf.py's tokenization works for tokenizers both with and
without chat templates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5a8ffb0a-3aaf-4205-bd70-e6161935648a
📒 Files selected for processing (3)
examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.pytests/examples/speculative_decoding/conftest.pytests/examples/speculative_decoding/test_eagle_offline_ptq.py
| input_ids = tokenizer.apply_chat_template( | ||
| conversations, return_tensors="pt", return_dict=True, add_generation_template=False | ||
| )["input_ids"] |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Hugging Face Transformers, what is the expected behavior of tokenizer.apply_chat_template(...) when tokenizer.chat_template is None, and what fallback is recommended?
💡 Result:
When tokenizer.chat_template is None and no chat_template argument is passed to tokenizer.apply_chat_template(...), it raises a ValueError with the message: "Cannot use apply_chat_template because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating". The recommended fallback is to either set tokenizer.chat_template to a suitable Jinja template string (e.g., copy from a similar model or use ChatML format), or pass a chat_template string directly as a keyword argument to apply_chat_template. Official docs encourage setting tokenizer.chat_template explicitly and pushing to the Hub for chat models lacking one. Recent versions removed automatic class-level default templates to avoid issues.
Citations:
- 1: https://discuss.huggingface.co/t/chat-template-is-not-set-throwing-error/104095
- 2: ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set huggingface/transformers#33246
- 3: https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py
- 4: https://huggingface.co/docs/transformers/v4.49.0/en/chat_template_advanced
- 5: https://huggingface.co/docs/transformers/v4.43.2/chat_templating
- 6: https://huggingface.co/docs/transformers/v4.36.1/chat_templating
🏁 Script executed:
cat -n examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py | sed -n '135,220p'Repository: NVIDIA/Model-Optimizer
Length of output: 4355
Handle tokenizers without chat templates in the tokenization path.
Line 213 unconditionally calls apply_chat_template; if tokenizer.chat_template is None, this raises a ValueError at runtime even though a check exists at line 145. The line 145 guard only modifies an existing template and does not prevent this code path. Add a fallback for tokenizers without chat templates:
Proposed fix
- input_ids = tokenizer.apply_chat_template(
- conversations, return_tensors="pt", return_dict=True, add_generation_template=False
- )["input_ids"]
+ if tokenizer.chat_template is not None:
+ input_ids = tokenizer.apply_chat_template(
+ conversations,
+ return_tensors="pt",
+ return_dict=True,
+ add_generation_template=False,
+ )["input_ids"]
+ else:
+ plain_text = "\n".join(
+ f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations
+ )
+ input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| input_ids = tokenizer.apply_chat_template( | |
| conversations, return_tensors="pt", return_dict=True, add_generation_template=False | |
| )["input_ids"] | |
| if tokenizer.chat_template is not None: | |
| input_ids = tokenizer.apply_chat_template( | |
| conversations, | |
| return_tensors="pt", | |
| return_dict=True, | |
| add_generation_template=False, | |
| )["input_ids"] | |
| else: | |
| plain_text = "\n".join( | |
| f"{msg.get('role', 'user')}: {msg.get('content', '')}" for msg in conversations | |
| ) | |
| input_ids = tokenizer(plain_text, return_tensors="pt")["input_ids"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py`
around lines 213 - 215, The code unconditionally calls
tokenizer.apply_chat_template which raises if tokenizer.chat_template is None;
update the tokenization path to check tokenizer.chat_template and use
apply_chat_template only when present, otherwise call tokenizer(conversations,
return_tensors="pt", return_dict=True, add_special_tokens=False) (or equivalent
non-chat tokenization) to produce the same return structure and extract
["input_ids"]; modify the block around tokenizer.apply_chat_template and ensure
downstream code still expects the same keys (e.g., input_ids) so
compute_hidden_states_hf.py's tokenization works for tokenizers both with and
without chat templates.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1234 +/- ##
==========================================
+ Coverage 75.68% 76.42% +0.74%
==========================================
Files 353 353
Lines 40491 41375 +884
==========================================
+ Hits 30644 31621 +977
+ Misses 9847 9754 -93
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
test_collect_hidden_stateswas using real daring-anteater conversations (typically 1000+ tokens) but the tiny test model hasmax_position_embeddings=32. Both sampled conversations exceeded the default--max-seq-len 3072filter, producing zero.ptfiles and failing the assertion.tiny_conversations_pathfixture with synthetic short single-turn conversations that tokenize withinmax_position_embeddings=32.test_collect_hidden_statesto use this fixture with--max-seq-len 32.Noneguard fortokenizer.chat_template.replace(...)to avoidAttributeErrorwhen the tokenizer has no chat template.Test plan
pytest tests/examples/speculative_decoding/test_eagle_offline_ptq.py::test_collect_hidden_statespassesspeculative_decodingjob passes🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests