Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game Paper • 2305.12872 • Published May 22, 2023
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Paper • 2311.05997 • Published Nov 10, 2023 • 37
Panacea: Pareto Alignment via Preference Adaptation for LLMs Paper • 2402.02030 • Published Feb 3, 2024 • 10
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published Jun 17, 2024 • 20
ProgressGym: Alignment with a Millennium of Moral Progress Paper • 2406.20087 • Published Jun 28, 2024 • 4
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset Paper • 2307.04657 • Published Jul 10, 2023 • 6
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents Paper • 2401.10568 • Published Jan 19, 2024 • 15
Safe DreamerV3: Safe Reinforcement Learning with World Models Paper • 2307.07176 • Published Jul 14, 2023
Maximum Entropy Heterogeneous-Agent Reinforcement Learning Paper • 2306.10715 • Published Jun 19, 2023
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction Paper • 2402.02416 • Published Feb 4, 2024 • 4
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents Paper • 2403.12835 • Published Mar 19, 2024
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems Paper • 2205.15434 • Published May 30, 2022
A Review of Safe Reinforcement Learning: Methods, Theory and Applications Paper • 2205.10330 • Published May 20, 2022
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects Paper • 2402.12907 • Published Feb 20, 2024
ProAgent: Building Proactive Cooperative AI with Large Language Models Paper • 2308.11339 • Published Aug 22, 2023