yd yang's picture

yd yang

yyang13

·

AI & ML interests

None yet

Recent Activity

authored a paper about 21 hours ago

Safe RLHF: Safe Reinforcement Learning from Human Feedback

authored a paper about 21 hours ago

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

authored a paper about 21 hours ago

AI Alignment: A Comprehensive Survey

View all activity

Organizations

None yet

authored 20 papers about 21 hours ago

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Paper • 2310.12773 • Published Oct 19, 2023 • 28

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Paper • 2305.12872 • Published May 22, 2023

AI Alignment: A Comprehensive Survey

Paper • 2310.19852 • Published Oct 30, 2023

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

Paper • 2311.05997 • Published Nov 10, 2023 • 37

Panacea: Pareto Alignment via Preference Adaptation for LLMs

Paper • 2402.02030 • Published Feb 3, 2024 • 10

Language Models Resist Alignment

Paper • 2406.06144 • Published Jun 10, 2024

In-Context Editing: Learning Knowledge from Self-Induced Distributions

Paper • 2406.11194 • Published Jun 17, 2024 • 20

ProgressGym: Alignment with a Millennium of Moral Progress

Paper • 2406.20087 • Published Jun 28, 2024 • 4

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Paper • 2307.04657 • Published Jul 10, 2023 • 6

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Paper • 2401.10568 • Published Jan 19, 2024 • 15

Safe DreamerV3: Safe Reinforcement Learning with World Models

Paper • 2307.07176 • Published Jul 14, 2023

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Paper • 2306.10715 • Published Jun 19, 2023

Regret-Minimizing Double Oracle for Extensive-Form Games

Paper • 2304.10498 • Published Apr 20, 2023

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

Paper • 2402.02416 • Published Feb 4, 2024 • 4

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Paper • 2403.12835 • Published Mar 19, 2024

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

Paper • 2205.15434 • Published May 30, 2022

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

Paper • 2205.10330 • Published May 20, 2022

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

Paper • 2402.12907 • Published Feb 20, 2024

ProAgent: Building Proactive Cooperative AI with Large Language Models

Paper • 2308.11339 • Published Aug 22, 2023

Reward Generalization in RLHF: A Topological Perspective

Paper • 2402.10184 • Published Feb 15, 2024