arxiv:2605.09608

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Published on May 10

· Submitted by

Yuanyi Wang on May 12

The Hong Kong Polytechnic University

Upvote

Authors:

Yuanyi Wang ,

Abstract

Research investigates how task geometry influences continual post-training of large language models, identifying geometry conflict as both a cause of forgetting and a control mechanism for update integration.

AI-generated summary

Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a state-relative update-integration failure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relative geometry conflict becomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free update-integration method that constructs a shared Wasserstein metric via Gaussian Wasserstein barycenters and uses geometry conflict to gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identify geometry conflict as both an explanatory signal for forgetting and a practical control signal for LLM continual post-training.

View arXiv page View PDF GitHub 1 Add to collection

Community

wyy-code

Paper author Paper submitter about 18 hours ago

•

edited about 18 hours ago

How should we continually post-train LLMs without causing catastrophic forgetting?

We find that forgetting is not simply caused by large parameter updates. Instead, it is better understood as a state-relative update-integration failure: harmful steps occur when a new task update becomes geometrically incompatible with the current LLM state shaped by previous updates.

We formalize this through geometry conflict, a signal based on the covariance geometry induced by task updates. Across Qwen3 0.6B--14B, we compare geometry conflict with update norm, subspace alignment, and gradient conflict, and show that state-relative geometry conflict better captures when transfer or interference occurs.

Based on this insight, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free continual update-integration method that uses geometry conflict to gate Wasserstein-based merge correction. GCWM improves retention and final performance across domain-continual and capability-continual settings without replay data.

The main takeaway: continual post-training should not only control how far an LLM moves, but whether each new update remains geometrically compatible with the evolving model state.