27 6

shipeng luo

luoagent

AI & ML interests

ML AI

Recent Activity

upvoted an article 3 days ago

使用 DPO 微调 Llama 2

upvoted a paper 4 days ago

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

upvoted a paper 4 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

View all activity

Organizations

None yet

upvoted an article 3 days ago

Article

使用 DPO 微调 Llama 2

Aug 8, 2023

•

upvoted 8 papers 4 days ago

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Paper • 2603.17051 • Published 15 days ago • 106

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 9 days ago • 28

Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

Paper • 2603.22446 • Published 9 days ago • 7

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Paper • 2603.18718 • Published 13 days ago • 9

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Paper • 2603.25744 • Published 6 days ago • 11

upvoted 11 papers 5 days ago

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 8 days ago • 59

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 62

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published 10 days ago • 77

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 6 days ago • 123

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 16 days ago • 79

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Paper • 2603.16932 • Published 18 days ago • 85

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published 26 days ago • 117

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 22 days ago • 147

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published 16 days ago • 148

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published 21 days ago • 152

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Paper • 2603.02138 • Published 30 days ago • 150

shipeng luo

AI & ML interests

Recent Activity

Organizations

luoagent's activity

使用 DPO 微调 Llama 2