CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging

CheXmix is a unified early-fusion generative model trained on a large corpus of chest X-rays paired with radiology reports. The huggingface repository here provides the model weights and an example file (CVPR findings 2026).

[💻 Github] [📄 Paper]

⚡️ Installation

For an editable installation, use the following commands to clone and install this repository.

git clone https://github.com/StanfordMIMI/CheXmix.git
cd CheXmix
pip install -e .

For usage instructions, please visit the github repository.

📁 Project Structure:

.
├── README.md
├── model.safetensors <CheXmix (S1 + S2) checkpoint>
├── vqgan.ckpt <Image Tokenizer checkpoint>

📎 Citation

If you find this repository useful for your work, please cite the cite the paper:

@inproceedings{kumar2026chexmix,
  author    = {Kumar, Ashwin and Holland, Robbie and Barrett, Corey and Kim, Jangwon and Varma, Maya and Chen, Zhihong and Gao, Yunhe and Zaharchuk, Greg and Taghavi, Tara and Kenthapadi, Krishnaram and Chaudhari, Akshay},
  title     = {CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  pages     = {9466--9476},
  year      = {2026},
  note      = {arXiv preprint arXiv:2604.22989}
}

Downloads last month: 8

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for stanfordmimi/CheXmix

CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging

Paper • 2604.22989 • Published Apr 24