view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 267
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 38
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 18.1k
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 18.1k
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 38
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 5
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 5