meituan-longcat/LongCat-Flash-Thinking-2601 Text Generation • 562B • Updated about 1 month ago • 5.21k • 102
OpenAssistant/reward-model-deberta-v3-large-v2 Text Classification • Updated Feb 1, 2023 • 15.2k • • 244
nvidia/Nemotron-RL-instruction_following-structured_outputs Viewer • Updated Jan 12 • 9.95k • 474 • 34
instruction-pretrain/general-instruction-augmented-corpora Preview • Updated about 5 hours ago • 31.2k • 20
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 106