Skip to content

[ASR pipline] fix with datasets 4.0#39504

Merged
eustlb merged 7 commits into
huggingface:mainfrom
eustlb:pipeline-fixes-torchcodec
Jul 30, 2025
Merged

[ASR pipline] fix with datasets 4.0#39504
eustlb merged 7 commits into
huggingface:mainfrom
eustlb:pipeline-fixes-torchcodec

Conversation

@eustlb

@eustlb eustlb commented Jul 18, 2025

Copy link
Copy Markdown
Collaborator

What does this PR do?

Cf code comments!

@eustlb eustlb requested a review from ydshieh July 18, 2025 15:03
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# torchcodec always returns (num_channels, num_samples)
# while before (datasets < 4.0) we had (2, num_samples) if stereo, (num_samples,) if mono
_array = _audio_samples.data
_array = _array[0] if _array.shape[0] == 1 else _array

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is to make it (num_samples,) as we know it is mono, right

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

euh, wait.

Would it possible be (num_samples, ) with num_samples=1 (for datasets < 4.0 and mono) ...

probably not a realistic situation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge case wizard here! nice catch, should never happen but still can be easily handled , added a change 😊

Comment on lines +438 to +440
if inputs.ndim != 1:
logger.warning(f"We expect a single channel audio input for AutomaticSpeechRecognitionPipeline, got {inputs.ndim}. Taking the mean of the channels for mono conversion.")
inputs = inputs.mean(axis=0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are certain this is what we want, OK for me.

(but previously, it worked with (2, num_samples) if stereo here?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! as long as we warn the user it is okay, such meaning is commonly done 😉

@ydshieh ydshieh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@ydshieh

ydshieh commented Jul 29, 2025

Copy link
Copy Markdown
Collaborator

Just wondering if #39309 is still relevant after this PR ?

@eustlb eustlb enabled auto-merge (squash) July 29, 2025 16:36
@eustlb eustlb disabled auto-merge July 29, 2025 16:42
@eustlb eustlb enabled auto-merge (squash) July 29, 2025 16:43
@eustlb eustlb disabled auto-merge July 29, 2025 16:43
@eustlb eustlb enabled auto-merge (squash) July 30, 2025 08:00
@eustlb eustlb merged commit 01d5f94 into huggingface:main Jul 30, 2025
25 checks passed
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants