Lei Wang
demolei
AI & ML interests
LLMs
Recent Activity
upvoted
a
paper
about 22 hours ago
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling
upvoted
a
paper
about 22 hours ago
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
upvoted
a
paper
13 days ago
Training AI Co-Scientists Using Rubric Rewards