CAM++ โ CoreML (Apple Neural Engine)
CoreML conversion of FunASR's CAM++ speaker-embedding model (~7.2M params), for on-device speaker verification / diarization on Apple Silicon. Upstream: iic/speech_campplus_sv_zh-cn_16k-common.
Files
| File | Precision | Compute unit | Role |
|---|---|---|---|
CamPlusPreprocessor.mlmodelc |
FP32 | CPU | waveform โ 80-d fbank features |
CamPlusPlus.mlmodelc |
FP16 | ANE | fbank โ 192-d speaker embedding |
Pipeline
waveform โ [Preprocessor fp32/CPU] โ fbank [1,T,80]
โ [CAM++ fp16/ANE] โ embedding [1,192] (L2-normalize, then cosine for verification/clustering)
CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine similarity for speaker verification and diarization clustering.
Benchmark โ AISHELL-1 speaker verification
| Metric | Value |
|---|---|
| EER | 0.48% (20 speakers, 6000 same / 6000 diff trials) |
| same-speaker cosine | 0.805 |
| different-speaker cosine | 0.256 |
AISHELL-1 (clean read Mandarin) is easier than the official CN-Celeb (~6-7%). CoreMLโtorch embedding cosine 0.9997-0.99999.
License
Weights derive from FunASR's CAM++; upstream license applies. Format conversion only.