charliezhang

Clockz

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

The Physics of Multi-Turn Long-Horizon Planning: From Pre-training to Post-training via Single- and Multi-Teacher On-Policy Agentic Distillation

upvoted a paper about 1 month ago

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

upvoted a paper about 2 months ago

RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality

View all activity

Organizations

upvoted a paper 6 days ago

The Physics of Multi-Turn Long-Horizon Planning: From Pre-training to Post-training via Single- and Multi-Teacher On-Policy Agentic Distillation

Paper • 2607.24720 • Published 7 days ago • 26

upvoted a paper about 1 month ago

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Paper • 2606.26300 • Published Jun 24 • 51

upvoted a paper about 2 months ago

RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality

Paper • 2506.07171 • Published Jun 8, 2025 • 1

authored a paper about 2 months ago

Agents' Last Exam

Paper • 2606.05405 • Published Jun 3 • 387

upvoted 2 papers about 2 months ago

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Paper • 2606.12191 • Published Jun 10 • 70

Agents' Last Exam

Paper • 2606.05405 • Published Jun 3 • 387

upvoted a paper 2 months ago

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 35

upvoted a paper 3 months ago

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Paper • 2604.25914 • Published Apr 28 • 42

upvoted 2 papers 4 months ago

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

Paper • 2604.02288 • Published Apr 2 • 34

From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

Paper • 2604.14142 • Published Apr 15 • 30

updated 2 models 4 months ago

Interplay-LM-Reasoning/extrapolation_midtrain

Updated Apr 8

Interplay-LM-Reasoning/context_pretrain_2

Updated Apr 7

published a model 4 months ago

Interplay-LM-Reasoning/context_pretrain_2

Updated Apr 7

updated 2 models 4 months ago

Interplay-LM-Reasoning/context_pretrain

Updated Apr 7

Interplay-LM-Reasoning/extrapolation_rl

Updated Apr 6

upvoted a paper 4 months ago

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published Mar 17 • 110

upvoted 4 papers 5 months ago

charliezhang

AI & ML interests

Recent Activity

Organizations

Clockz's activity