-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Qwen3 ASR and Forced Aligner #43838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mbtariq82
wants to merge
148
commits into
huggingface:main
Choose a base branch
from
mbtariq82:qwen3-asr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Qwen3 ASR and Forced Aligner #43838
Changes from all commits
Commits
Show all changes
148 commits
Select commit
Hold shift + click to select a range
8a367b0
Create modular file and port processor
mbtariq82-dev a7d62a2
Test for pretrained, tokenizer and feature extractor
mbtariq82-dev 9e2cfd5
add ProcessorTesterMixin to test class
mbtariq82-dev 665d1fb
add config classes
mbtariq82-dev 3ce24d5
unable to pass test_apply_chat_template_audio, added debugging logic …
mbtariq82-dev 3669d24
Add model and config classes
mbtariq82-dev ae7d1cb
Add attn_implementation to configs
mbtariq82-dev 26db1dd
Fix tests by removing attentions hook and manually calculating attent…
mbtariq82-dev d4c307b
Change model 'attentions' hook class from Qwen3ASRThinkerTextAttentio…
mbtariq82-dev 0b3248d
Architectural change inspired by test_generate_with_static_cache: Ali…
mbtariq82-dev fdfd969
Use modular transformers components to define Qwen3ASRAudioEncoderConfig
mbtariq82-dev 6336f14
Use modular transformers to define Qwen3ASRTextConfig from Qwen3OmniM…
mbtariq82-dev 72cd0f6
Comment about inherited class-level attributes for Qwen3ASRTextConfig
mbtariq82-dev 86f4678
Use modular transformers to define Qwen3ASRThinkerConfig from Qwen3Om…
mbtariq82-dev e4f4e4f
Remove comments
mbtariq82-dev 2a0b543
Use modular transformers to define Qwen3ASRConfig from Qwen3OmniMoeCo…
mbtariq82-dev 598e838
Import _get_feat_extract_output_lengths from Qwen3-Omni-Moe instead o…
mbtariq82-dev 65ead7b
Use modular transformers to define Qwen3ASRProcessor from Qwen3OmniMo…
mbtariq82-dev 0d548a8
Change pipeline_model_mapping in model tests from 'automatic-speech-r…
mbtariq82-dev e6a75e6
Use modular transformers to define Qwen3ASRTextRMSNorm from Qwen3Omni…
mbtariq82-dev c36106a
Import rotate_half, repeat_kv, apply_rotary_pos_emb, eager_attention_…
mbtariq82-dev c81f684
Use modular transformers to define Qwen3ASRTextAttention from Qwen3Om…
mbtariq82-dev fd12335
Use modular transformers to define Qwen3ASRTextMLP from Qwen3OmniMoeT…
mbtariq82-dev e4b7d93
Use modular transformers to define Qwen3ASRThinkerTextDecoderLayer fr…
mbtariq82-dev c64210c
Import _get_feat_extract_output_lengths from Qwen3-Omni-Moe instead o…
mbtariq82-dev 03d9fa6
Use modular transformers to define Qwen3ASRPreTrainedModelForConditio…
mbtariq82-dev 77c11ee
Use modular transformers to define Qwen3ASRAudioAttention from Qwen3O…
mbtariq82-dev c7bc5d1
Use modular transformers to define Qwen3ASRAudioEncoderLayer from Qwe…
mbtariq82-dev 835b891
Import SinusoidsPositionEmbedding from Qwen3-Omni-Moe instead of rede…
mbtariq82-dev f3e6a8d
Use modular transformers to define Qwen3ASRAudioEncoder from Qwen3Omn…
mbtariq82-dev de3fdf9
Use modular transformers to define Qwen3ASRThinkerTextRotaryEmbedding…
mbtariq82-dev 077a52b
Use modular transformers to define Qwen3ASRThinkerTextMLP directly fr…
mbtariq82-dev 14735fd
Use modular transformers to define Qwen3ASRThinkerTextRMSNorm directl…
mbtariq82-dev 69ecc47
Use modular transformers to define Qwen3ASRThinkerTextModel from Qwen…
mbtariq82-dev 4a8fb2b
Use modular transformers to define Qwen3ASRThinkerForConditionalGener…
mbtariq82-dev 4e14ff1
Update Qwen3ASRTextConfig modular according to convention.
ebezzam df87020
Nits
ebezzam 805f1a0
Change Qwen3ASRProcessor inheritance from Qwen3OmniMoeProcessor to Au…
mbtariq82-dev 0af1d92
Merge branch 'qwen3-asr' of https://github.com/mbtariq82/transformers…
mbtariq82-dev 7d9c73d
Comment about ThinkerConfig inheritance
mbtariq82-dev 0d78599
Change Qwen3ASRProcessor to inherit directly - init no longer has to …
mbtariq82-dev a1e5f77
Remove torch.manual_seed from integration tests
mbtariq82-dev 06250d9
Style: fix ruff lint issues and typing compliance
mbtariq82-dev d78e6c5
Add reproducer to programmatically update expected results for integr…
mbtariq82-dev 9ad348b
Add convert_qwen3_asr_to_hf.py
mbtariq82-dev 54e5ad1
Remove Qwen3OmniMoeConfig inheritance from Qwen3ASRConfig
mbtariq82-dev 1f01d00
Remove Qwen3OmniMoeThinkerConfig inheritance from Qwen3ASRThinkerConfig
mbtariq82-dev 411c39c
cleanup
mbtariq82-dev b8a6c38
Cleanup
mbtariq82 69c3e26
Cleanup
mbtariq82 28877a1
Cleanup
mbtariq82 47dacb9
Cleanup
mbtariq82 abefad7
Functional model conversion.
ebezzam 69ccfae
Cleanup
mbtariq82-dev ceb72ff
Cleanup
mbtariq82-dev 3ca90bf
Cleanup
mbtariq82-dev 086a464
Cleanup
mbtariq82-dev bef02e4
Add init_weights to Qwen3ASRPreTrainedModel to pass ModelTesterMixin:…
mbtariq82-dev 581676b
Cleanup
mbtariq82-dev d55747b
Cleanup
mbtariq82-dev b9d83de
Cleanup
mbtariq82-dev 80ccd30
Use converted hf weights for integration tests
mbtariq82-dev e951ea5
Change Processor tests to use hf checkpoint
mbtariq82-dev f73117a
Restore CI/github scripts to upstream versions
mbtariq82-dev 948f40a
Restore CI/github scripts to upstream versions (2)
mbtariq82-dev 65b0a3c
Restore CI/github scripts to upstream versions (3)
mbtariq82-dev e941a46
passing integration tests
ebezzam fa21c2e
Standardize processor.
ebezzam 13f7203
Cleanup and standardize modeling.
ebezzam 78299be
Remove rope deltas.
ebezzam a8b161f
Stop tracking reproducer.
ebezzam 6b03776
Merge branch 'qwen3-asr' of github.com:mbtariq82/transformers into qw…
ebezzam a23f637
Merge branch 'main' into qwen3-asr
ebezzam 7ed8e54
Update config modular.
ebezzam 224c7b3
Account for n_window in encoder length computation.
ebezzam 7a58b9c
Merge branch 'main' into qwen3-asr
ebezzam f6e97e5
Add qwen3asr
ebezzam a3aa053
Merge branch 'qwen3-asr' of github.com:mbtariq82/transformers into qw…
ebezzam c7e813c
Nit
ebezzam 401d869
Expose encoder from qwen3 omni, and cleaner modular.
ebezzam 3ad04f6
DIrectly use language model from Qwen3.
ebezzam 0139cfe
Modular from other audio LMs.
ebezzam 7197827
Shift flattening to processor.
ebezzam 6a1308d
Add docs and post-process methods.
ebezzam 33cae66
Address model integration tests + style
ebezzam d711751
Processing tests.
ebezzam 6bae830
Functional forced alignment in a single modular.
ebezzam c6250a3
Add reproducer for timestamps.
ebezzam 5d12746
Remove processor from modular.
ebezzam 3839910
Merge branch 'main' into qwen3-asr
ebezzam 4d89dd2
Create base Qwen3ASR model like Llava.
ebezzam 62d80ea
Push timestamp fixtures.
ebezzam a5c5d60
Nits and style.
ebezzam 502ff64
Forced aligner refactor: new auto class and better naming.
ebezzam 67c1f52
Forced alignmnet nits.
ebezzam e0d751e
Create audio encoder that is more in line with other and torch compil…
ebezzam 9b582c0
Small fixes for tests.
ebezzam 81b8bba
add torch compil forced aligner example, and small fix for compile
ebezzam 50962ae
Modeling nits.
ebezzam 0b932ec
undo exposure of omni audio encoder, doc/style nits
ebezzam 61d0ba2
Add note on attention's k_proj bias.
ebezzam ffa7915
Cleaner init.
ebezzam f344601
Apply suggestion from @vasqu
ebezzam 4159fc0
Apply suggestion from @vasqu
ebezzam f85234b
Doc improvements, and conversion fix.
ebezzam fb0c006
Merge branch 'qwen3-asr' of github.com:mbtariq82/transformers into qw…
ebezzam d568035
Simplify conversion script.
ebezzam 2e02d0a
Apply suggestion from @vasqu
ebezzam 94239ae
Apply suggestion from @vasqu
ebezzam 48fdcf9
Better encoder config in modular.
ebezzam 09b5d9f
Merge branch 'qwen3-asr' of github.com:mbtariq82/transformers into qw…
ebezzam ce6f4df
Add default method to SinusoidsPositionEmbedding, and generate from m…
ebezzam 8a5f845
Refactor forced aligner. Use GenericForTokenClassification.
ebezzam 02383ee
Address processor comments.
ebezzam d904134
Add support for language codes.
ebezzam 51253d7
Address comments for token classification.
ebezzam 371da13
Better modular for attention and token classification.
ebezzam e303059
Merge branch 'main' into qwen3-asr
ebezzam cb42572
Modular after merge.
ebezzam b12c76b
Use new ALM testing classes.
ebezzam 3392aa9
Update src/transformers/models/qwen3_asr/feature_extraction_qwen3_asr.py
ebezzam ecf3f74
Address review comments: create make_list_of_audio_chat_template util…
ebezzam 1396cde
merge
ebezzam deb037b
Merge branch 'main' into qwen3-asr
ebezzam 6053739
Modular after merge.
ebezzam 3d47bb2
Address unprotected torch import.
ebezzam eb5ccc4
Introduce score_bias for GenericForTokenClassification.
ebezzam 41125d7
Refactor token classification bias.
ebezzam 7e3fbc9
Merge branch 'main' into qwen3-asr
ebezzam cdb6639
Refactor processsing like AudioFlamingo3 with submethods.
ebezzam 8034275
Use windowed attention like in Qwen 3 Omni.
ebezzam bbe486c
Add multimodal projector, and small refactor.
ebezzam b1aae95
Better max_source_positions, style fixes.
ebezzam 1a6d2a5
Merge branch 'main' into qwen3-asr
ebezzam 7bac079
Update modular after ALM refactor.
ebezzam 8ef687a
Merge branch 'main' into qwen3-asr
ebezzam 1c7f736
check repo
ebezzam 46c5596
Apply post-processing like original implementation.
ebezzam bca869f
Set default max new tokens like original, and nits.
ebezzam cf31d4b
Zero pad to min length like original
ebezzam 6812afe
Remove padding mask update for min length (like original)
ebezzam 753a0b7
Refactor, and update padding mask.
ebezzam cf861cf
revert mask update, hurts AMI performance
ebezzam f13c378
Merge branch 'main' into qwen3-asr
ebezzam 5be33e7
feature extractor nits
ebezzam e4f0704
Merge branch 'main' into qwen3-asr
ebezzam 3fecf7c
Renaming with hf suffix.
ebezzam a2ec912
Merge branch 'qwen3-asr' of github.com:mbtariq82/transformers into qw…
ebezzam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -259,7 +259,7 @@ def __post_init__(self, **kwargs): | |
| # Our configs prev wouldn't save `id2label` for 2 labels because it is the default. In all other | ||
| # cases we expect the config dict to have an `id2label` field if it's a clf model, or not otherwise | ||
| if self.id2label is None: | ||
| self.num_labels = kwargs.get("num_labels", 2) | ||
| self.num_labels = kwargs.get("num_labels", self.num_labels if self.num_labels is not None else 2) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this because otherwise an existing |
||
| else: | ||
| if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"): | ||
| logger.warning( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Copyright 2026 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_qwen3_asr import * | ||
| from .feature_extraction_qwen3_asr import * | ||
| from .modeling_qwen3_asr import * | ||
| from .processing_qwen3_asr import * | ||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.