Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
Abstract
SATO introduces a novel token ordering strategy for autoregressive transformers that preserves edge flow and semantic layout in mesh generation through triangle strip-based sequences.
Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token ordering strategies employed by existing methods typically fail to meet professional artist standards, where coordinate-based sorting yields inefficiently long sequences, and patch-based heuristics disrupt the continuous edge flow and structural regularity essential for high-quality modeling. To address these limitations, we propose Strips as Tokens (SATO), a novel framework with a token ordering strategy inspired by triangle strips. By constructing the sequence as a connected chain of faces that explicitly encodes UV boundaries, our method naturally preserves the organized edge flow and semantic layout characteristic of artist-created meshes. A key advantage of this formulation is its unified representation, enabling the same token sequence to be decoded into either a triangle or quadrilateral mesh. This flexibility facilitates joint training on both data types: large-scale triangle data provides fundamental structural priors, while high-quality quad data enhances the geometric regularity of the outputs. Extensive experiments demonstrate that SATO consistently outperforms prior methods in terms of geometric quality, structural coherence, and UV segmentation.
Community
Today we're releasing Strips as Tokens (SATO), a new autoregressive framework for artist mesh generation with native UV segmentation.
Most existing mesh generators use token orderings that do not match how artists actually build meshes. Coordinate based sequences are often too long, while patch based heuristics can break edge flow, topology regularity, and UV structure.
SATO takes a different path.
Inspired by triangle strips, it represents a mesh as a connected chain of faces that also encodes UV boundaries directly in the token sequence. This gives the model a much more natural way to capture organized edge flow, semantic structure, and clean UV layouts during generation.
A key feature of SATO is that the same token sequence can be decoded into either a triangle mesh or a quad mesh. This provides one unified representation for both mesh types, making it possible to jointly learn from large scale triangle data and high quality quad data in a single framework.
Given an input point cloud, SATO autoregressively generates artist style meshes together with native UV segmentation. In other words, the model does not only produce geometry. It also generates UV charts with clean and meaningful island boundaries, making the outputs much more practical for downstream texturing and real content creation workflows.
Across extensive experiments, SATO achieves strong results on triangle mesh generation, quad mesh generation, and UV aware mesh generation within one unified framework. We believe high quality mesh generation should move closer to real artist workflows, where geometry and UV layout are designed together rather than treated separately.
Geometry matters. Topology matters. UVs matter too.
SATO is a step toward generative models that understand all three.
๐ฅ Video (Youtube): https://youtu.be/Mc9skirm8cg
๐ฅ Video (Bilibili): https://www.bilibili.com/video/BV13eQ8BAEiA/
๐ Project Page: https://ruixu.me/html/SATO/index.html
๐ Paper: https://arxiv.org/abs/2604.09132
๐ป Code: https://github.com/Xrvitd/SATO
๐ค Models / Demo: COMING SOON
the strip-based tokenization, where faces grow along shared edges like a zipper and the uv island boundaries are baked into the token stream, is the clever core here. my worry is how the 512^3 quantization interacts with real artists' fine edge details and whether tiny uv seam noise could ripple into topology or tri/quad decoding ambiguity. the arxivLens breakdown helped me parse the token vocabulary and the start-of-strip and uv-transition tokens, which is nontrivial to implement cleanly in practice. an ablation worth doing would be varying quantization levels and decoding stride to see if the unified tri/quad output stays robust under leaner token budgets.
Get this paper in your agent:
hf papers read 2604.09132 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper