3 10 4

garyzhang

xiaoniqiu

garyzhang99

AI & ML interests

LLM, Agents

Recent Activity

authored a paper 3 days ago

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

authored a paper 3 days ago

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

authored a paper 3 days ago

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models

View all activity

Organizations

authored 5 papers 3 days ago

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Paper • 2407.08583 • Published Jul 11, 2024 • 13

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

Paper • 2505.17826 • Published May 23, 2025 • 10

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models

Paper • 2501.14755 • Published Dec 23, 2024

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

Paper • 2509.24203 • Published Sep 29, 2025 • 8

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Paper • 2602.03392 • Published 10 days ago • 52

upvoted a paper 8 days ago

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Paper • 2602.03392 • Published 10 days ago • 52

upvoted a paper 2 months ago

Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering

Paper • 2512.06915 • Published Dec 7, 2025 • 12

updated a dataset 4 months ago

datajuicer/geometry_sft

Viewer • Updated Oct 27, 2025 • 300 • 57

published a dataset 4 months ago

datajuicer/geometry_sft

Viewer • Updated Oct 27, 2025 • 300 • 57

upvoted a paper 4 months ago

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

Paper • 2509.24203 • Published Sep 29, 2025 • 8

upvoted an article 5 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

Sep 22, 2025

•

127

updated a dataset 5 months ago

datajuicer/Trinity-ToolAce-SFT-split

Viewer • Updated Sep 19, 2025 • 498 • 4

published a dataset 5 months ago

datajuicer/Trinity-ToolAce-SFT-split

Viewer • Updated Sep 19, 2025 • 498 • 4

updated a dataset 5 months ago

datajuicer/Trinity-ToolAce-RL-split

Viewer • Updated Sep 19, 2025 • 4.93k • 7

published a dataset 5 months ago

datajuicer/Trinity-ToolAce-RL-split

Viewer • Updated Sep 19, 2025 • 4.93k • 7

commented 2 papers 6 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15, 2025 • 8 •

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15, 2025 • 8 •

authored a paper 6 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15, 2025 • 8

upvoted a paper 6 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15, 2025 • 8

upvoted a paper 9 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 188

garyzhang

AI & ML interests

Recent Activity

Organizations

xiaoniqiu's activity

Gaia2 and ARE: Empowering the community to study agents