arxiv:2603.01960

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

Published on May 11

Authors:

Abstract

TiledAttention is a PyTorch-callable scaled dot-product attention implementation that offers customizable performance tuning through cuTile/TileIR while maintaining competitive speed relative to established attention backends.

AI-generated summary

TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled K,V streaming. Algorithmically, TiledAttention follows the established FlashAttention-style online-softmax formulation; our novelty is the cuTile/TileIR implementation strategy, schedule-level modifiability, and reproducible benchmarking/profiling workflow. The approach is both performant and directly editable at the schedule level from Python (tile shapes, staging, shared-memory layout), enabling rapid, reproducible kernel research without template-heavy CUDA/CUTLASS rewrites. We benchmark TiledAttention on an NVIDIA DGX GB10 node with a reproducible harness and compare against PyTorch SDPA (auto-dispatch), explicit unfused baselines (torch_sdpa_math, standard eager attention), and forced backend probes (FlashAttention2, EffecientAttention, CuDNN Attention) across sequence length, head dimension, and precision (FP16/BF16). While production fused baselines remain stronger overall, TiledAttention delivers large speedups over standard eager attention paths and is available for direct use within PyTorch workflows, providing a practical balance between performance and customizability.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.01960

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.01960 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.01960 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.01960 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.