Woosh — Sony AI Sound-Effect Foundation Model (Mirror)

This repository is a community mirror of the open weights released by Sony Research for Woosh — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.

All files here are a one-to-one copy of Sony's v1.0.0 GitHub release, repackaged into a single browseable HF repo for convenience.

License — CC-BY-NC 4.0 (Non-Commercial)

All open weights in this repository are released by Sony Research under the CC-BY-NC 4.0 license. Generated outputs inherit the non-commercial restriction. You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.

If you need to attribute: Sony Research — Woosh (arXiv / paper, GitHub: SonyResearch/Woosh).

Model Suite

Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.

Shared infrastructure

Folder	Role	File(s)
`checkpoints/Woosh-AE/`	Audio encoder / decoder producing high-quality latents	`weights.safetensors`, `config.yaml`
`checkpoints/Woosh-CLAP/`	Multimodal text-audio alignment model (audio + text encoders)	`weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml`
`checkpoints/TextConditionerA/`	Text conditioner for the T2A path (pairs with Flow / DFlow)	`weights.safetensors`, `config.yaml`
`checkpoints/TextConditionerV/`	Text conditioner for the V2A path (pairs with VFlow / DVFlow)	`weights.safetensors`, `config.yaml`

Generative backbones

Folder	Task	Notes
`checkpoints/Woosh-Flow/`	Text → Audio	Full-quality T2A latent diffusion
`checkpoints/Woosh-DFlow/`	Text → Audio	Distilled T2A — fewer steps, faster inference
`checkpoints/Woosh-VFlow-8s/`	Video → Audio	V2A latent diffusion — fixed 8-second output
`checkpoints/Woosh-DVFlow-8s/`	Video → Audio	Distilled V2A — fewer steps, fixed 8-second output

Every weight file ships as safetensors. No .pt / .ckpt / .bin in this mirror.

Layout

checkpoints/
├── Woosh-AE/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-CLAP/
│   ├── weights_audio.safetensors
│   ├── weights_text.safetensors
│   └── config.yaml
├── TextConditionerA/
│   ├── weights.safetensors
│   └── config.yaml
├── TextConditionerV/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-Flow/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-DFlow/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-VFlow-8s/
│   ├── weights.safetensors
│   └── config.yaml
└── Woosh-DVFlow-8s/
    ├── weights.safetensors
    └── config.yaml

Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.

Usage

This mirror is intended to be consumed by Sony's upstream woosh package. Clone and install the upstream repo, then point it at a local copy of this mirror's checkpoints/ directory.

# Clone upstream
git clone https://github.com/SonyResearch/Woosh.git
cd Woosh

# Sony's suggested env setup (uses uv)
uv sync
uv pip install -e .

# Pull weights from this mirror
hf download AEmotionStudio/woosh-models --local-dir ./

Acknowledgements

All credit for the Woosh models belongs to Sony Research. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.

Upstream: https://github.com/SonyResearch/Woosh
Release: https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0
License: CC-BY-NC 4.0

Downloads last month: -; Downloads are not tracked for this model. How to track