From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published 6 days ago • 14 • 4
From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published 6 days ago • 14
From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published 6 days ago • 14
From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published 6 days ago • 14
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 8 days ago • 13
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 8 days ago • 13
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics Paper • 2602.02343 • Published 8 days ago • 13
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 22 days ago • 15
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 22 days ago • 15
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 22 days ago • 15
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published Jan 9 • 18
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published Jan 9 • 18
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published Jan 9 • 18