arxiv:2604.19748

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Published on Apr 21

· Submitted by

taesiri on Apr 22

Authors:

Taihang Hu ,

Yefeng Shen ,

Xingjian Wang ,

Jun Zheng ,

Abstract

A commercial-scale virtual try-on system achieves high success rates, photorealistic results, and real-time performance through integrated system design and multi-stage training.

AI-generated summary

Recent advances in image generation and editing have opened new opportunities for virtual try-on. However, existing methods still struggle to meet complex real-world demands. We present Tstars-Tryon 1.0, a commercial-scale virtual try-on system that is robust, realistic, versatile, and highly efficient. First, our system maintains a high success rate across challenging cases like extreme poses, severe illumination variations, motion blur, and other in-the-wild conditions. Second, it delivers highly photorealistic results with fine-grained details, faithfully preserving garment texture, material properties, and structural characteristics, while largely avoiding common AI-generated artifacts. Third, beyond apparel try-on, our model supports flexible multi-image composition (up to 6 reference images) across 8 fashion categories, with coordinated control over person identity and background. Fourth, to overcome the latency bottlenecks of commercial deployment, our system is heavily optimized for inference speed, delivering near real-time generation for a seamless user experience. These capabilities are enabled by an integrated system design spanning end-to-end model architecture, a scalable data engine, robust infrastructure, and a multi-stage training paradigm. Extensive evaluation and large-scale product deployment demonstrate that Tstars-Tryon1.0 achieves leading overall performance. To support future research, we also release a comprehensive benchmark. The model has been deployed at an industrial scale on the Taobao App, serving millions of users with tens of millions of requests.

View arXiv page View PDF Project page Add to collection

Community

taesiri

Paper submitter 1 day ago

GPT-Image 2.0 attempt to make a poster for this paper:

taihang

Paper author 1 day ago

hi @taesiri , very thanks for your recommendation! I'am one of the authors of the paper and now don't have permission to modify the hf paper page, could you help us set organization to alibaba-inc? Thank you very much!

Shari-Lewis

1 day ago

avahal

about 13 hours ago

the multi-image diffusion approach in mmdiT, coordinating up to 6 references while keeping identity and background stable, stands out in this space. one thing i’m curious about is how you resolve conflicting cues from multiple references when textures and lighting disagree across garments. would love to see a clean ablation showing how fidelity and artifact rates scale as you vary the number of references from 1 to 6. btw the arxivlens breakdown helped me parse the method details.