Woosh — Sony AI Sound-Effect Foundation Model (Mirror)
This repository is a community mirror of the open weights released by Sony Research for Woosh — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.
All files here are a one-to-one copy of Sony's v1.0.0 GitHub release, repackaged into a single browseable HF repo for convenience.
License — CC-BY-NC 4.0 (Non-Commercial)
All open weights in this repository are released by Sony Research under the CC-BY-NC 4.0 license. Generated outputs inherit the non-commercial restriction. You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.
If you need to attribute: Sony Research — Woosh (arXiv / paper, GitHub: SonyResearch/Woosh).
Model Suite
Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.
Shared infrastructure
| Folder | Role | File(s) |
|---|---|---|
checkpoints/Woosh-AE/ |
Audio encoder / decoder producing high-quality latents | weights.safetensors, config.yaml |
checkpoints/Woosh-CLAP/ |
Multimodal text-audio alignment model (audio + text encoders) | weights_audio.safetensors, weights_text.safetensors, config.yaml |
checkpoints/TextConditionerA/ |
Text conditioner for the T2A path (pairs with Flow / DFlow) | weights.safetensors, config.yaml |
checkpoints/TextConditionerV/ |
Text conditioner for the V2A path (pairs with VFlow / DVFlow) | weights.safetensors, config.yaml |
Generative backbones
| Folder | Task | Notes |
|---|---|---|
checkpoints/Woosh-Flow/ |
Text → Audio | Full-quality T2A latent diffusion |
checkpoints/Woosh-DFlow/ |
Text → Audio | Distilled T2A — fewer steps, faster inference |
checkpoints/Woosh-VFlow-8s/ |
Video → Audio | V2A latent diffusion — fixed 8-second output |
checkpoints/Woosh-DVFlow-8s/ |
Video → Audio | Distilled V2A — fewer steps, fixed 8-second output |
Every weight file ships as safetensors. No .pt / .ckpt / .bin in this mirror.
Layout
checkpoints/
├── Woosh-AE/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-CLAP/
│ ├── weights_audio.safetensors
│ ├── weights_text.safetensors
│ └── config.yaml
├── TextConditionerA/
│ ├── weights.safetensors
│ └── config.yaml
├── TextConditionerV/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-Flow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-DFlow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-VFlow-8s/
│ ├── weights.safetensors
│ └── config.yaml
└── Woosh-DVFlow-8s/
├── weights.safetensors
└── config.yaml
Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.
Usage
This mirror is intended to be consumed by Sony's upstream woosh package. Clone and install the upstream repo, then point it at a local copy of this mirror's checkpoints/ directory.
# Clone upstream
git clone https://github.com/SonyResearch/Woosh.git
cd Woosh
# Sony's suggested env setup (uses uv)
uv sync
uv pip install -e .
# Pull weights from this mirror
hf download AEmotionStudio/woosh-models --local-dir ./
Acknowledgements
All credit for the Woosh models belongs to Sony Research. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.
- Upstream: https://github.com/SonyResearch/Woosh
- Release: https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0
- License: CC-BY-NC 4.0