Jack Zhang's picture

Jack Zhang

jackzhang

·

http://jackz.io/

AI & ML interests

None yet

Recent Activity

authored a paper 25 days ago

Jailbreak Distillation: Renewable Safety Benchmarking

authored a paper 25 days ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

authored a paper 25 days ago

Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models

View all activity

Organizations

authored 6 papers 25 days ago

Jailbreak Distillation: Renewable Safety Benchmarking

Paper • 2505.22037 • Published May 28, 2025 • 1

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Paper • 2510.08240 • Published Oct 9, 2025 • 41

Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models

Paper • 2510.21978 • Published Oct 24, 2025 • 16

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

DeonticBench: A Benchmark for Reasoning over Rules

Paper • 2604.04443 • Published Apr 6 • 9

Many-Tier Instruction Hierarchy in LLM Agents

Paper • 2604.09443 • Published about 1 month ago • 16

submitted a paper to Daily Papers 25 days ago

Many-Tier Instruction Hierarchy in LLM Agents

Paper • 2604.09443 • Published about 1 month ago • 16

authored 3 papers about 1 year ago

Tur[k]ingBench: A Challenge Benchmark for Web Agents

Paper • 2403.11905 • Published Mar 18, 2024

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

Paper • 2401.13136 • Published Jan 23, 2024

Certified Mitigation of Worst-Case LLM Copyright Infringement

Paper • 2504.16046 • Published Apr 22, 2025 • 13

authored 3 papers over 1 year ago

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Paper • 2404.03862 • Published Apr 5, 2024

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Paper • 2410.08968 • Published Oct 11, 2024 • 14

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published Oct 1, 2024 • 35