Skip to content

Add clear error message for missing SentencePiece model in get_spm_processor (fix #41553)#41584

Open
AvinashDwivedi wants to merge 3 commits into
huggingface:mainfrom
AvinashDwivedi:fix/41553-llama-tokenizer-clear-error
Open

Add clear error message for missing SentencePiece model in get_spm_processor (fix #41553)#41584
AvinashDwivedi wants to merge 3 commits into
huggingface:mainfrom
AvinashDwivedi:fix/41553-llama-tokenizer-clear-error

Conversation

@AvinashDwivedi

Copy link
Copy Markdown

Fix: clearer error when SentencePiece model file is missing for Llama tokenizer

Fix:
Improved error handling in get_spm_processor to provide a clear, actionable message when the required SentencePiece model file is missing.

Before:
The tokenizer raised an unhelpful low-level

TypeError: not a string

from the SentencePiece library when vocab_file was None.

After:
It now raises a concise, human-readable error:

ValueError: Missing SentencePiece model file: 'None'. 
This tokenizer requires a .model file provided by the 'mistral-common' package. 
Install it with: pip install mistral-common

Fixes: #41553
Screenshot 2025-10-14 191309

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: llama

@AvinashDwivedi

Copy link
Copy Markdown
Author

@ArthurZucker would you please help me why I'm seeing checks are failing although the changes are very small. I'm new to opensource So, please help me sir!

@Rocketknight1

Copy link
Copy Markdown
Member

I think it's strange to reference mistral-common in the Llama tokenizer, but up to you @ArthurZucker @itazap!

@vasqu

vasqu commented Oct 15, 2025

Copy link
Copy Markdown
Collaborator

Copy pasting a bit but let's coordinate #41553 (comment)

@AvinashDwivedi

Copy link
Copy Markdown
Author

@ArthurZucker @Rocketknight1 as I can see this issue is still not solved can you guys give me some idea how you want it? I'll make it exactly same as you want it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bad error message for AutoTokenizer loading Voxtral

3 participants