Papers
arxiv:2602.15727

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Published on Feb 17
· Submitted by
Hila Manor
on Feb 23
Authors:
,
,
,

Abstract

Visual analogy learning via dynamic composition of learned LoRA transformation primitives enables flexible image manipulation with improved generalization over fixed adaptation modules.

AI-generated summary

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet {a, a', b}, the goal is to generate b' such that a : a' :: b : b'. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation. Code and data are in https://research.nvidia.com/labs/par/lorweb

Community

Paper author Paper submitter

TLDR: We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.

We present LoRWeB (LoRA Weight Basis), a novel modular framework for dynamically mixing low-rank adapters (LoRAs).
We use LoRWeB for visual analogy learning: Given an input pair of example "before" and "after" images (a and a'), we want to edit a new image (b) in a similar manner, and apply the same visual transformation (to produce b'). Rather than using a single fixed adapter, we learn a basis of LoRAs and a lightweight encoder that selects and weighs these basis LoRAs and constructs an edit LoRA. This enables better generalization to unseen visual transformations, achieving state-of-the-art performance on visual analogy tasks without requiring test-time optimization.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/spanning-the-visual-analogy-space-with-a-weight-basis-of-loras-588-64eacd61

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.15727 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.15727 in a Space README.md to link it from this page.

Collections including this paper 1