RetMask Collection Trained checkpoints for the paper "From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models" • 4 items • Updated 19 days ago
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.33k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.33k • • 19
tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • 71B • Updated Jul 1, 2025 • 215 • • 13
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3 Text Generation • 71B • Updated Apr 2, 2025 • 434 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 Text Generation • 8B • Updated Apr 2, 2025 • 2.96k • • 24
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2 Text Generation • 8B • Updated Apr 2, 2025 • 125 • • 16