Harness the power of ONNX Runtime to transcribe audio into text effortlessly.
-
Single Model:
- SenseVoiceSmall
- Whisper-Large-V3 / [Custom fine tuned]
- Whisper-Large-V2 / [Custom fine tuned]
- Paraformer-Small-Chinese
- Paraformer-Large-Chinese
- Paraformer-Large-English
- Paraformer-Online-Streaming-Chinese
- FireRedASR-AED
- Dolphin
- Fun-ASR-Nano-2512
-
Combined Models (ASR + Speaker Identify):
- End-to-end speech recognition with built-in
STFTprocessing.
Input: Audio file
Output: Transcription result - Seamlessly integrate with these additional tools for improved performance:
- This Whisper does not support automatic language detection. Please specify a target language.
- Visit the project overview for further details.
| OS | Device | Backend | Model | Real-Time Factor (Chunk Size: 128000 or 8s) |
|---|---|---|---|---|
| Ubuntu 24.04 | Laptop | CPU i5-7300HQ |
SenseVoiceSmall f32 |
0.037 |
| Ubuntu 24.04 | Laptop | CPU i5-7300HQ |
SenseVoiceSmall q8f32 |
0.075 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
SenseVoiceSmall f32 |
0.019 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
SenseVoiceSmall q8f32 |
0.022 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
SenseVoiceSmall + ERes2NetV2_w24s4ep4 f32 |
0.10 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
Whisper-Large-v3-en q8f32 |
0.15 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
Whisper-Large-v3-Turbo-en q8f32 |
0.073 |
| Ubuntu 24.04 | Laptop | CPU i5-7300HQ |
Paraformer-Small-Chinese f32 |
0.04 |
| Ubuntu 24.04 | Laptop | CPU i5-7300HQ |
Paraformer-Large-English q8f32 |
0.14 |
| Ubuntu 24.04 | Desktop | CPU i3-12300 |
Paraformer-Large-Streaming-Chinese f32 |
0.06 Chunk Size: 8800 |
| Ubuntu 24.04 | Laptop | CPU i3-12300 |
FireRedASR-AED-L-Chinese q8f32 |
0.17 |
| Ubuntu 24.04 | Laptop | CPU i7-1165G7 |
Dolphin-Small q8f32 |
0.14 |
| Ubuntu 24.04 | Laptop | CPU i7-1165G7 |
Fun-ASR-Nano q4f32 |
0.11 |
利用 ONNX Runtime 实现音频到文本的高效转录。
-
单模型:
- SenseVoiceSmall
- Whisper-Large-V3 / [Custom fine tuned]
- Whisper-Large-V2 / [Custom fine tuned]
- Paraformer-Small-中文
- Paraformer-Large-中文
- Paraformer-Large-英文
- Paraformer-实时-流式-中文
- FireRedASR-AED
- Dolphin
- Fun-ASR-Nano-2512
-
组合模型 (ASR + 讲话者识别):
- 端到端语音识别,内置
STFT处理。
输入:音频文件
输出:转录结果 - 推荐搭配以下工具,提升性能:
- 此 Whisper 不支持自动语言检测。请指定目标语言。
- 访问项目概览获取更多信息。