Papers
arxiv:2605.09608

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Published on May 10
· Submitted by
Yuanyi Wang
on May 12
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Research investigates how task geometry influences continual post-training of large language models, identifying geometry conflict as both a cause of forgetting and a control mechanism for update integration.

AI-generated summary

Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a state-relative update-integration failure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relative geometry conflict becomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free update-integration method that constructs a shared Wasserstein metric via Gaussian Wasserstein barycenters and uses geometry conflict to gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identify geometry conflict as both an explanatory signal for forgetting and a practical control signal for LLM continual post-training.

Community

Paper author Paper submitter
edited about 18 hours ago

How should we continually post-train LLMs without causing catastrophic forgetting?

We find that forgetting is not simply caused by large parameter updates. Instead, it is better understood as a state-relative update-integration failure: harmful steps occur when a new task update becomes geometrically incompatible with the current LLM state shaped by previous updates.

We formalize this through geometry conflict, a signal based on the covariance geometry induced by task updates. Across Qwen3 0.6B--14B, we compare geometry conflict with update norm, subspace alignment, and gradient conflict, and show that state-relative geometry conflict better captures when transfer or interference occurs.

Based on this insight, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free continual update-integration method that uses geometry conflict to gate Wasserstein-based merge correction. GCWM improves retention and final performance across domain-continual and capability-continual settings without replay data.

The main takeaway: continual post-training should not only control how far an LLM moves, but whether each new update remains geometrically compatible with the evolving model state.

太强了~

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09608
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09608 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09608 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09608 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.