arxiv:2602.15727

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Published on Feb 17

· Submitted by

Hila Manor on Feb 23

NVIDIA

Upvote

Authors:

Hila Manor ,

Abstract

Visual analogy learning via dynamic composition of learned LoRA transformation primitives enables flexible image manipulation with improved generalization over fixed adaptation modules.

AI-generated summary

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet {a, a', b}, the goal is to generate b' such that a : a' :: b : b'. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation. Code and data are in https://research.nvidia.com/labs/par/lorweb

View arXiv page View PDF Project page GitHub 10 Add to collection

Community

hilamanor

Paper author Paper submitter about 20 hours ago

TLDR: We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.

We present LoRWeB (LoRA Weight Basis), a novel modular framework for dynamically mixing low-rank adapters (LoRAs).
We use LoRWeB for visual analogy learning: Given an input pair of example "before" and "after" images (a and a'), we want to edit a new image (b) in a similar manner, and apply the same visual transformation (to produce b'). Rather than using a single fixed adapter, we learn a basis of LoRAs and a lightweight encoder that selects and weighs these basis LoRAs and constructs an edit LoRA. This enables better generalization to unseen visual transformations, achieving state-of-the-art performance on visual analogy tasks without requiring test-time optimization.