Papers
arxiv:2412.06965

Improving Music Source Separation with Diffusion and Consistency Refinement

Published on Apr 26
Authors:
,
,
,

Abstract

Generative diffusion models are used for music source separation with consistency distillation enabling fast inference while maintaining quality across different architectures.

In this work, we propose an approach to music source separation that uses a generative diffusion model as a last-stage refinement on top of a deterministic separator, progressively enhancing the separated sources through iterative denoising. While the diffusion refinement yields measurable quality gains, it requires iterative steps at inference, increasing computational cost. To speed up the inference process, we apply consistency distillation, reducing inference to a single step while maintaining quality; with two or more steps, the distilled model even surpasses the diffusion-based approach. Crucially, our method is architecture-agnostic: we demonstrate state-of-the-art results when applied to both a custom U-Net-based separator on Slakh2100 and the state-of-the-art BS-RoFormer model on MUSDB18, showing that the refinement generalizes across backbone architectures. Sound examples are available at: https://consistency-separation.github.io/.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2412.06965
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.06965 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.06965 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.