GitHub - DakeQQ/Automatic-Speech-Recognition-ASR-ONNX: Utilizes ONNX Runtime to transcribe audio into text.

Automatic-Speech-Recognition-ASR-ONNX

Harness the power of ONNX Runtime to transcribe audio into text effortlessly.

Supported Models

Single Model:
- SenseVoiceSmall
- Whisper-Large-V3 / [Custom fine tuned]
- Whisper-Large-V2 / [Custom fine tuned]
- Paraformer-Small-Chinese
- Paraformer-Large-Chinese
- Paraformer-Large-English
- Paraformer-Online-Streaming-Chinese
- FireRedASR-AED
- Dolphin
- Fun-ASR-Nano-2512
Combined Models (ASR + Speaker Identify):
- SenseVoiceSmall + ERes2NetV2
- SenseVoiceSmall + ERes2NetV2_w24s4ep4

Features

End-to-end speech recognition with built-in STFT processing.
Input: Audio file
Output: Transcription result
Seamlessly integrate with these additional tools for improved performance:
- Voice Activity Detection (VAD)
- Audio Denoiser
This Whisper does not support automatic language detection. Please specify a target language.

Learn More

Visit the project overview for further details.

性能 Performance

OS	Device	Backend	Model	Real-Time Factor (Chunk Size: 128000 or 8s)
Ubuntu 24.04	Laptop	CPU i5-7300HQ	SenseVoiceSmall f32	0.037
Ubuntu 24.04	Laptop	CPU i5-7300HQ	SenseVoiceSmall q8f32	0.075
Ubuntu 24.04	Desktop	CPU i3-12300	SenseVoiceSmall f32	0.019
Ubuntu 24.04	Desktop	CPU i3-12300	SenseVoiceSmall q8f32	0.022
Ubuntu 24.04	Desktop	CPU i3-12300	SenseVoiceSmall + ERes2NetV2_w24s4ep4 f32	0.10
Ubuntu 24.04	Desktop	CPU i3-12300	Whisper-Large-v3-en q8f32	0.15
Ubuntu 24.04	Desktop	CPU i3-12300	Whisper-Large-v3-Turbo-en q8f32	0.073
Ubuntu 24.04	Laptop	CPU i5-7300HQ	Paraformer-Small-Chinese f32	0.04
Ubuntu 24.04	Laptop	CPU i5-7300HQ	Paraformer-Large-English q8f32	0.14
Ubuntu 24.04	Desktop	CPU i3-12300	Paraformer-Large-Streaming-Chinese f32	0.06 Chunk Size: 8800
Ubuntu 24.04	Laptop	CPU i3-12300	FireRedASR-AED-L-Chinese q8f32	0.17
Ubuntu 24.04	Laptop	CPU i7-1165G7	Dolphin-Small q8f32	0.14
Ubuntu 24.04	Laptop	CPU i7-1165G7	Fun-ASR-Nano q4f32	0.11

Coming Soon 🚀

自动语音识别（ASR）ONNX

利用 ONNX Runtime 实现音频到文本的高效转录。

支持模型

单模型：
- SenseVoiceSmall
- Whisper-Large-V3 / [Custom fine tuned]
- Whisper-Large-V2 / [Custom fine tuned]
- Paraformer-Small-中文
- Paraformer-Large-中文
- Paraformer-Large-英文
- Paraformer-实时-流式-中文
- FireRedASR-AED
- Dolphin
- Fun-ASR-Nano-2512
组合模型 (ASR + 讲话者识别)：
- SenseVoiceSmall + ERes2NetV2
- SenseVoiceSmall + ERes2NetV2_w24s4ep4

功能特点

端到端语音识别，内置 STFT 处理。
输入：音频文件
输出：转录结果
推荐搭配以下工具，提升性能：
- 语音活动检测 (VAD)
- 音频去噪
此 Whisper 不支持自动语言检测。请指定目标语言。

了解更多

访问项目概览获取更多信息。

Name		Name	Last commit message	Last commit date
Latest commit History 1,037 Commits
Dolphin		Dolphin
FireRedASR		FireRedASR
Fun_ASR_Nano		Fun_ASR_Nano
Paraformer_Chinese		Paraformer_Chinese
Paraformer_English		Paraformer_English
Paraformer_Streaming_Chinese		Paraformer_Streaming_Chinese
SenseVoice		SenseVoice
SenseVoice_Plus_Speaker_Identify		SenseVoice_Plus_Speaker_Identify
Whisper_V2_V3		Whisper_V2_V3
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Speech-Recognition-ASR-ONNX

Supported Models

Features

Learn More

性能 Performance

Coming Soon 🚀

自动语音识别（ASR）ONNX

支持模型

功能特点

了解更多

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automatic-Speech-Recognition-ASR-ONNX

Supported Models

Features

Learn More

性能 Performance

Coming Soon 🚀

自动语音识别（ASR）ONNX

支持模型

功能特点

了解更多

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages