14 models faced 1,400 simulations of heads-up Blackjack and European Roulette. Shared seeds locked identical cards and spins for each.
Key Stats:
- 14 models benchmarked - 59,483 rows - 35 MB compressed Parquet - 35,000 scored decisions - Full prompts, JSON responses, reasoning traces, latency - Bankroll tracking from $1,000 start per run
Live leaderboard tracks bets, hits, stands, and risk management. Gemini 3 Flash leads at +$3,396. Claude 4.5 Haiku at -$7,788. Traces in the dataset. Leaderboard in the space.