Cosmo's picture

Cosmo

cosmojg

·

https://cosmo.red

AI & ML interests

Machine learning and computational neuroscience

Recent Activity

liked a model 2 days ago

talkie-lm/talkie-web-13b-base

liked a model 2 days ago

talkie-lm/talkie-1930-13b-it

liked a model 2 days ago

talkie-lm/talkie-1930-13b-base

View all activity

Organizations

None yet

upvoted a collection 2 days ago

talkie-13b

talkie-1930-13b is a vintage language model trained on pre-1931 English-language text. See https://github.com/talkie-lm/talkie to run talkie. • 3 items • Updated 9 days ago • 34

upvoted a collection 23 days ago

DFlash

Block Diffusion for Flash Speculative Decoding • 15 items • Updated 6 days ago • 92

upvoted a collection 27 days ago

Gemma 4

Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated 8 days ago • 166

upvoted an article 28 days ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30, 2025

•

313

upvoted a collection 28 days ago

Gemma 4

8 items • Updated 28 days ago • 703

upvoted a collection about 2 months ago

Qwen3.5

21 items • Updated Mar 9 • 1.59k

upvoted a paper 2 months ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77

upvoted a collection 2 months ago

BD3-LMs

https://m-arriola.com/bd3lms/ • 4 items • Updated 19 days ago • 31

upvoted 2 collections 3 months ago

Trinity-Large

8 items • Updated Mar 30 • 42

Qwen3-TTS

7 items • Updated Jan 22 • 352

upvoted a paper 4 months ago

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Paper • 2512.23705 • Published Dec 29, 2025 • 45

upvoted 3 collections 4 months ago

Olmo 3.1

The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 51

Molmo2

Artifacts for the Molmo2 release • 5 items • Updated Mar 2 • 35

Bolmo

Artifacts for the Bolmo release: https://allenai.org/papers/bolmo. • 4 items • Updated Dec 23, 2025 • 11

upvoted 4 collections 5 months ago

Jan-v2-VL

Jan-v2-VL: a family of VLM focused on reliable, many-step task execution. • 9 items • Updated Mar 13 • 40

Mistral Large 3

A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated Dec 2, 2025 • 99

Ministral 3

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 166

FLUX.2

Our second generation of FLUX • 21 items • Updated 24 days ago • 203

upvoted 2 collections 7 months ago

Granite 4.0 Language Models

Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 11 items • Updated 1 day ago • 220

Granite Quantized Models

Quantized versions of IBM Granite models. • 44 items • Updated 1 day ago • 34