Configuration Parsing Warning: Invalid JSON for config file config.json

SenseVoice

FunASR SenseVoice on Axera, official repo: https://github.com/FunAudioLLM/SenseVoice

TODO

  • 支持 AX630C
  • 支持 C++
  • 支持 FastAPI

功能

  • 语音识别
  • 自动识别语言(支持中文、英文、粤语、日语、韩语)
  • 情感识别
  • 自动标点
  • 支持流式识别

支持平台

  • AX650N
  • AX630C

Table of contents

环境安装

Python==3.12

sudo apt-get install libsndfile-dev
pip install -r requirements.txt

安装 pyaxenigne

参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API

在 0.1.3rc2 上测试通过,可通过

pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl

安装,或把版本号更改为你想使用的版本

使用

Python

cd python
python3 main.py --input ../example/en.mp3
[INFO] Available providers:  ['AxEngineExecutionProvider']
{'input': '../example/en.mp3', 'language': 'auto', 'streaming': False}
......
RTF: 0.036785734138361184    Latency: 0.2639744281768799s  Total length: 7.176s
ASR result: the tribal chieftain called for the boy and presented him with fifty pieces of gold

运行参数说明:

参数名称 说明 默认值
--input/-i 输入音频文件
--language/-l 识别语言,支持auto, zh, en, yue, ja, ko auto
--streaming 流式识别

CPP

  • AX650
./cpp/ax650/test_sensevoice -a example/zh.mp3 -p sensevoice_ax650/
Init asr success, take 0.2130seconds
Result: 开饭时间早上九点至下午五点
RTF(0.21 / 5.62) = 0.0372
  • AX630C
./cpp/ax630c/test_sensevoice -a example/zh.mp3 -p sensevoice_ax630c/

对应的源码在Github

示例

example下有测试音频

如 中文测试

cd python
python main.py -i example/zh.mp3

输出

RTF: 0.04386647134764582    Latency: 0.2463541030883789s  Total length: 5.616s
ASR result: 开饭时间早上九点至下午五点

流式识别

python main.py -i example/zh.mp3 --streaming

输出

{'timestamps': [540], 'text': '开'}
{'timestamps': [540, 780, 1080], 'text': '开放时'}
{'timestamps': [540, 780, 1080, 1260, 1740], 'text': '开放时间早'}
{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340], 'text': '开放时间早上9'}
{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640], 'text': '开放时间早上9点'}
{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060], 'text': '开放时间早上9点至'}
{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020], 'text': '开放时间早上9点至下午'}
{'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020, 4440, 4620], 'text': '开放时间早上9点至下午五点'}
RTF: 0.03678379235444246

Gradio DEMO

cd python
python3 gradio_demo.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
* Running on local URL:  https://xxx.xxx.xxx.xxx:7861
* Running on local URL:  https://172.18.0.1:7861
* Running on local URL:  https://172.17.0.1:7861
* Running on local URL:  https://0.0.0.0:7861
* To create a public link, set `share=True` in `launch()`.

DEMO_Gradio

准确率

使用WER(Word-Error-Rate)作为评价标准

WER = 2.0%

复现测试结果

./download_datasets.sh
python test_wer.py -d aishell -g datasets/ground_truth.txt --language zh

技术讨论

  • Github issues
  • QQ 群: 139953715
Downloads last month
58
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/SenseVoice

Finetuned
(4)
this model

Collection including AXERA-TECH/SenseVoice