TradingAgents: Multi-Agents LLM Financial Trading Framework Paper • 2412.20138 • Published Dec 28, 2024 • 79
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published 3 days ago • 44
MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning Paper • 2605.14212 • Published 9 days ago • 17
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • 33B • Updated Feb 24, 2025 • 697k • • 1.56k
stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard Text Generation • 2B • Updated 13 days ago • 153 • 1
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
stillarrow/qwen2.5-coder-1.5b-instruct__grpo_no_std_code_hidden_only_shortcut_guard Updated 16 days ago • 28
stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard Text Generation • 2B • Updated 13 days ago • 153 • 1
stillarrow/qwen2.5-coder-1.5b-instruct__jspo_no_std_code_hidden_only_shortcut_guard Updated 16 days ago • 7
stillarrow/qwen2.5-coder-1.5b-instruct__jspo_no_std_code_hidden_only_shortcut_guard Updated 16 days ago • 7
stillarrow/qwen2.5-coder-1.5b-instruct__grpo_no_std_code_hidden_only_shortcut_guard Updated 16 days ago • 28
stillarrow/qwen2.5-math-7b__math_subject_proportional_cluster-246fecfa-et_mix_lambda_no_drift_off_ratio_100 Updated 16 days ago • 55
stillarrow/qwen2.5-math-7b__math_subject_proportional_cluster-246fecfa-et_mix_lambda_no_drift_off_ratio_100 Updated 16 days ago • 55
stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-0939fc56-policy_lambda_no_drift_off_ratio_100 Updated 16 days ago • 51
stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-0939fc56-policy_lambda_no_drift_off_ratio_100 Updated 16 days ago • 51
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning Paper • 2602.10090 • Published Feb 10 • 53
stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-6bc47709-et_mix_lambda_no_drift_off_ratio_100 Updated 17 days ago • 54
stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-aabaf976-policy_lambda_no_drift_off_ratio_100 Updated 17 days ago • 40
stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-6bc47709-et_mix_lambda_no_drift_off_ratio_100 Updated 17 days ago • 54