RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej Text Generation • 8B • Updated May 21, 2025 • 7 • 1
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data Text Generation • 8B • Updated May 10, 2025 • 2.48k • • 37
RLHFlow/Decision-Tree-Reward-Gemma-2-27B Text Classification • 27B • Updated Jan 24, 2025 • 296 • 8
RLHFlow/Decision-Tree-Reward-Llama-3.1-8B Text Classification • 8B • Updated Jan 24, 2025 • 302 • 7
RLHFlow/Llama3.1-8B-PRM-Mistral-Data Text Generation • 8B • Updated Nov 9, 2024 • 22 • • 10