[ASR pipline] fix with datasets 4.0#39504
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # torchcodec always returns (num_channels, num_samples) | ||
| # while before (datasets < 4.0) we had (2, num_samples) if stereo, (num_samples,) if mono | ||
| _array = _audio_samples.data | ||
| _array = _array[0] if _array.shape[0] == 1 else _array |
There was a problem hiding this comment.
so this is to make it (num_samples,) as we know it is mono, right
There was a problem hiding this comment.
euh, wait.
Would it possible be (num_samples, ) with num_samples=1 (for datasets < 4.0 and mono) ...
probably not a realistic situation
There was a problem hiding this comment.
edge case wizard here! nice catch, should never happen but still can be easily handled , added a change 😊
| if inputs.ndim != 1: | ||
| logger.warning(f"We expect a single channel audio input for AutomaticSpeechRecognitionPipeline, got {inputs.ndim}. Taking the mean of the channels for mono conversion.") | ||
| inputs = inputs.mean(axis=0) |
There was a problem hiding this comment.
If you are certain this is what we want, OK for me.
(but previously, it worked with (2, num_samples) if stereo here?)
There was a problem hiding this comment.
Yes! as long as we warn the user it is okay, such meaning is commonly done 😉
|
Just wondering if #39309 is still relevant after this PR ? |
* fix * handle edge case * make
* fix * handle edge case * make
* fix * handle edge case * make
* fix * handle edge case * make
* fix * handle edge case * make
* fix * handle edge case * make
* fix * handle edge case * make
What does this PR do?
Cf code comments!