Jinki Jeong PRO

Anserwise

1 23 164

AI & ML interests

None yet

Recent Activity

liked a Space 7 days ago

FINAL-Bench/POCKET-Image-Studio

liked a model 7 days ago

FINAL-Bench/POCKET-Image-Zimage

reacted to SeaWolf-AI's post with 🔥 8 days ago

POCKET now speaks Gemma 4 — a 26B model that loads in every app, and runs on your PC with no GPU We're adding a Gemma-4 sibling to POCKET: POCKET-26B, built from Google's Gemma-4-26B-A4B (Apache-2.0). Our flagship POCKET-35B is a Qwen-family MoE and needs a recent llama.cpp; POCKET-26B trades a little size for the thing people kept asking for — it just loads, everywhere, today: Ollama, LM Studio, PocketPal, MLX, any stock llama.cpp. No fork, no bleeding-edge runtime, no CUDA, no cloud. It's a sparse Mixture-of-Experts (25.2B total, ~4B active per token), so the work per token stays small — a real 26B that generates on a CPU with no graphics card. Two things make it stand out: 1) Universal compatibility. Gemma 4 is a standard, widely-supported architecture, so POCKET-26B runs on the tools you already have — no waiting for your app to add a new model type. 2) Quality that survives compression. Measured GPQA-Diamond (198 q, greedy): • Full base: 67.7% • POCKET-26B Q4_K_M (17 GB): 67.7% — lossless • POCKET-26B Q2_K (11 GB): 67.2% — near-lossless, at 11 GB Live, on a CPU-only box (our demo Space — POCKET-26B vs Bonsai-27B, same machine, same stock llama.cpp): POCKET-26B ≈ 19 tok/s vs Bonsai ≈ 6 tok/s → about 3× faster generation, no GPU. (Honest notes: shared CPU box, sequential race; a dedicated machine is faster.) Where it fits in the family: • POCKET-35B (Qwen MoE) — bigger, top-tier, needs a recent llama.cpp. • POCKET-26B (Gemma 4) — loads in any app, quality-robust when compressed. The demo runs the Q4_K_M build; Q2_K (11 GB) is the smallest footprint. For a true ≤8 GB phone, the 5 GB POCKET-KR (Qwen) is still the pick. Try it and grab it: 🖥️ Live demo (Gemma4-based, answering on a CPU, no GPU): https://huggingface.co/spaces/FINAL-Bench/POCKET-26B-CPU 📦 POCKET-26B-GGUF (Q4_K_M 17 GB · Q2_K 11 GB): https://huggingface.co/FINAL-Bench/POCKET-26B-GGUF 📚 POCKET collection: https://huggingface.co/collections/FINAL-Bench/pocket-models

View all activity

Organizations

None yet

liked a Space 7 days ago

POCKET-Image Studio

🖼

Perfect text in any language, in any image

liked a model 7 days ago

FINAL-Bench/POCKET-Image-Zimage

Text-to-Image • 3B • Updated 5 days ago • 95 • 28

reacted to SeaWolf-AI's post with 🔥 8 days ago

Post

3115

POCKET now speaks Gemma 4 — a 26B model that loads in every app, and runs on your PC with no GPU

We're adding a Gemma-4 sibling to POCKET: POCKET-26B, built from Google's Gemma-4-26B-A4B (Apache-2.0). Our flagship POCKET-35B is a Qwen-family MoE and needs a recent llama.cpp; POCKET-26B trades a little size for the thing people kept asking for — it just loads, everywhere, today: Ollama, LM Studio, PocketPal, MLX, any stock llama.cpp. No fork, no bleeding-edge runtime, no CUDA, no cloud.

It's a sparse Mixture-of-Experts (25.2B total, ~4B active per token), so the work per token stays small — a real 26B that generates on a CPU with no graphics card.

Two things make it stand out:

1) Universal compatibility. Gemma 4 is a standard, widely-supported architecture, so POCKET-26B runs on the tools you already have — no waiting for your app to add a new model type.

2) Quality that survives compression. Measured GPQA-Diamond (198 q, greedy):
• Full base: 67.7%
• POCKET-26B Q4_K_M (17 GB): 67.7% — lossless
• POCKET-26B Q2_K (11 GB): 67.2% — near-lossless, at 11 GB

Live, on a CPU-only box (our demo Space — POCKET-26B vs Bonsai-27B, same machine, same stock llama.cpp): POCKET-26B ≈ 19 tok/s vs Bonsai ≈ 6 tok/s → about 3× faster generation, no GPU. (Honest notes: shared CPU box, sequential race; a dedicated machine is faster.)

Where it fits in the family:
• POCKET-35B (Qwen MoE) — bigger, top-tier, needs a recent llama.cpp.
• POCKET-26B (Gemma 4) — loads in any app, quality-robust when compressed. The demo runs the Q4_K_M build; Q2_K (11 GB) is the smallest footprint. For a true ≤8 GB phone, the 5 GB POCKET-KR (Qwen) is still the pick.

Try it and grab it:
🖥️ Live demo (Gemma4-based, answering on a CPU, no GPU): FINAL-Bench/POCKET-26B-CPU
📦 POCKET-26B-GGUF (Q4_K_M 17 GB · Q2_K 11 GB): FINAL-Bench/POCKET-26B-GGUF
📚 POCKET collection: https://huggingface.co/collections/FINAL-Bench/pocket-models

3 replies

liked a Space 8 days ago

POCKET-26B vs Bonsai · CPU (Gemma4-based)

⚔

BONSAI vs POCKET — 26B Gemma4 MoE out-runs a 27B on CPU

liked a model 8 days ago

FINAL-Bench/POCKET-26B-GGUF

Text Generation • 25B • Updated 4 days ago • 10.3k • 36

upvoted a paper 10 days ago

Quantum Cryptanalysis on IBM Quantum Hardware: Extending Even--Mansour Period Recovery from N=4 to N=10

Paper • 2607.18340 • Published 13 days ago • 21

upvoted an article 10 days ago

Article

POCKET: a 35-billion-parameter model that runs on your iPhone — and on your PC with no GPU

FINAL-Bench

•

10 days ago

• 12

liked a Space 10 days ago

POCKET vs Bonsai · CPU

⚔

BONSAI vs POCKET — 35B MoE out-runs the top 1-bit 27B on CPU

liked 4 models 10 days ago

upvoted a collection 10 days ago

POCKET-MODELs

Collection

9 items • Updated 7 days ago • 22

reacted to SeaWolf-AI's post with 👍 10 days ago

Post

4023

📱 POCKET — a 35-billion-parameter model that runs on your iPhone, and on your PC with no GPU

We're releasing POCKET, VIDRAFT's flagship Darwin-36B-Opus compressed for on-device use. No fork, no CUDA, no cloud — it runs on stock llama.cpp. It's a sparse Mixture-of-Experts model (256 experts, only 8 active per token), so the file can be large while the work per token stays small. That's what lets a 35B model run on a phone, and generate fast on a CPU with no graphics card.

Measured (POCKET-35B IQ1_M vs Bonsai-27B Q1_0):
• CPU generate (Xeon, 16 threads): 27.0 vs 10.1 tok/s → 2.69× faster
• GPU generate (H100): 197 vs 89 tok/s → 2.22× faster
• GPU prompt processing (H100): 753 vs 1816 → 0.41× (Bonsai wins this one — MoE prefill wakes every expert, so sparsity stops helping there. We say so.)
• Quality (HellaSwag, 400 q): 61.0% vs 60.0% → a tie (confidence intervals overlap)

On a real consumer laptop — MacBook M3 Pro (18 GB) — POCKET wins every axis, prompt processing included:
• Metal generate: 25.4 vs 12.8 → 1.99×
• CPU generate: 13.8 vs 4.4 → 3.13×
• Metal prompt: 240.7 vs 73.4 → 3.28×

One more quiet fact: the same-size, quality-oriented rival Ternary-Bonsai-27B (7.2 GB) fails to load in upstream llama.cpp at all — it needs the PrismML fork. POCKET runs on the tools you already have: LM Studio, Ollama, PocketPal, MLX.

📖 Full story (tech, measurements, recipes): https://huggingface.co/blog/FINAL-Bench/pocket

Models:
📦 POCKET-35B-GGUF (PC / server, no GPU): FINAL-Bench/POCKET-35B-GGUF
🇰🇷 POCKET-KR-GGUF (Android): FINAL-Bench/POCKET-KR-GGUF
🍎 POCKET-KR-MLX (iPhone / Mac): FINAL-Bench/POCKET-KR-MLX
🌍 POCKET-EN-GGUF (English phone / PC): FINAL-Bench/POCKET-EN-GGUF
🖥️ Live demo (answering on a CPU, no GPU): FINAL-Bench/POCKET-35B-CPU
📚 Collection: FINAL-Bench/pocket-models-6a618ee5d23eafb7e185a5c6