My USB charger has a Blackwell GPU and 128GB RAM. What. A. Time. To. Be. Alive. People in Sofia: “It’s freezing.” Me: sitting next to 3kW of space AI heaters on my desk 👀 1x GLM-5, 2x MiniMax-M2.5, 1x Qwen3 Coder Next; all on single Aibrix/K8s cluster
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.
With local AI, I don’t have /fast CC switch, but I have /absurdlyfast: - 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation - KV cache: 707’200 tokens - Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.
Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.
My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
PromptRL: Language Models as Co-Learners in Flow-Based Image Generation RL 🚀
We found two critical failure modes in flow-based RL: 1️⃣ Quality-Diversity Dilemma: High-quality models produce similar outputs, bottlenecking RL exploration 2️⃣ Prompt Linguistic Hacking: Models overfit to surface patterns—paraphrase the prompt and performance tanks
Solution: **Jointly train LM + FM** — the LM dynamically generates semantically-consistent but diverse prompt variants
▐▛██▜▌ Claude Code v2.1.23 ▝████▘ Kimi-K2.5 · API Usage Billing ▘▘ ▝▝ ~/dev/vllm /model to try Opus 4.5 ❯ hey ● Hello! How can I help you today? ❯ what model are you? ● I'm Claude Kimi-K2.5, running in a local environment on Linux.
Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.