Papers
arxiv:2605.25969

Triplet-Block Diffusion RWKV

Published on May 25
Authors:
,
,
,

Abstract

B³D-RWKV combines diffusion and RWKV architectures to achieve parallel, bidirectional processing with improved decoding speed while maintaining competitive accuracy.

AI-generated summary

Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires bidirectional attention, while causal models are unidirectional. To unify these architectures, we propose B^3D-RWKV, a diffusion RWKV variant that integrates the model's O(L) inference efficiency with parallel, bidirectional discrete-diffusion through a triplet-block layout method. B^3D-RWKV-7.2B reaches comparable accuracy on an 8-task suite versus existing models while significantly outperforming baselines in decoding throughput with an average of 1.6times speedup.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.25969
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25969 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.25969 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25969 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.