CAM++ โ€” CoreML (Apple Neural Engine)

CoreML conversion of FunASR's CAM++ speaker-embedding model (~7.2M params), for on-device speaker verification / diarization on Apple Silicon. Upstream: iic/speech_campplus_sv_zh-cn_16k-common.

Files

File Precision Compute unit Role
CamPlusPreprocessor.mlmodelc FP32 CPU waveform โ†’ 80-d fbank features
CamPlusPlus.mlmodelc FP16 ANE fbank โ†’ 192-d speaker embedding

Pipeline

waveform โ†’ [Preprocessor fp32/CPU] โ†’ fbank [1,T,80]
        โ†’ [CAM++ fp16/ANE] โ†’ embedding [1,192]  (L2-normalize, then cosine for verification/clustering)

CAM++ normalizes the fbank internally. The 192-d embedding is used with cosine similarity for speaker verification and diarization clustering.

Benchmark โ€” AISHELL-1 speaker verification

Metric Value
EER 0.48% (20 speakers, 6000 same / 6000 diff trials)
same-speaker cosine 0.805
different-speaker cosine 0.256

AISHELL-1 (clean read Mandarin) is easier than the official CN-Celeb (~6-7%). CoreMLโ†”torch embedding cosine 0.9997-0.99999.

License

Weights derive from FunASR's CAM++; upstream license applies. Format conversion only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support