Pramodith Ballapuram's picture

Pramodith Ballapuram

Pramodith

·

AI & ML interests

NLP

Recent Activity

commentedon an article about 1 month ago

Unlocking asynchronicity in continuous batching

upvoted an article about 1 month ago

Unlocking asynchronicity in continuous batching

published a model about 2 months ago

Geometric-AI/geometric-ai-kernels

View all activity

Organizations

commented on Unlocking asynchronicity in continuous batching about 1 month ago

Wonderful blog! Had a clarification question, on the benefits of a shared memory pool for cuda-graphs. If my understanding is correct the shared memory pool would be memory needed for inputs/outputs of Batch(N) + Batch(N+1). So each cuda graph presumably captures this memory and re-uses it (albeit different slots). Is the benefit of the shared memory pool that it reduces any additional cudagraph overhead?

Also another question, wouldn't it be common for the batch sizes of N and N+1 to be different? If so how is the memory needed for batch N+1 pre-allocated accurately, is it just a max-batch size?

upvoted an article about 1 month ago

Article

Unlocking asynchronicity in continuous batching

+1

ror, pcuenq, ariG23498

•

May 14

• 61

published a model about 2 months ago

Geometric-AI/geometric-ai-kernels

updated a model about 2 months ago

Geometric-AI/geometric-ai-kernels

New activity in Geometric-AI/geometric-ai-kernels about 2 months ago

Final Kernel versions

#1 opened about 2 months ago by

upvoted an article 3 months ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

+5

merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift

•

Apr 2

• 909

liked a Space 4 months ago

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Who needs 1T parameters? Olympiad proofs with a 4B model

updated a Space 6 months ago

Trackio

Show video feed with object tracking

published a Space 6 months ago

Trackio

Show video feed with object tracking

upvoted an article 6 months ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

qgallouedec

•

Apr 18, 2025

• 72

liked a Space 9 months ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

updated a model 11 months ago

Pramodith/topN_sigma_generation

Text Generation • Updated Aug 5, 2025 • 8 • 2

published a model 11 months ago

Pramodith/topN_sigma_generation

Text Generation • Updated Aug 5, 2025 • 8 • 2

New activity in google/gemma-3n-E2B-it 12 months ago

Multimodal queries not working when hosted on Inference Endpoints.

#11 opened 12 months ago by

upvoted an article about 1 year ago

Article

Learn the Hugging Face Kernel Hub in 5 Minutes

+5

drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb

•

Jun 12, 2025

• 164

updated a model about 1 year ago

Pramodith/riddle_qwen2.5-3B

Text Generation • Updated May 23, 2025 • 4

published a model about 1 year ago

Pramodith/riddle_qwen2.5-3B

Text Generation • Updated May 23, 2025 • 4

updated a model about 1 year ago

Pramodith/riddle_qwen2.5-1.5B

Text Generation • 2B • Updated Apr 11, 2025 • 4

published a model about 1 year ago

Pramodith/riddle_qwen2.5-1.5B

Text Generation • 2B • Updated Apr 11, 2025 • 4

upvoted an article almost 2 years ago

Article

Uncensor any LLM with abliteration

mlabonne

•

Jun 13, 2024

• 868