Papers
arxiv:2603.25739

MegaFlow: Zero-Shot Large Displacement Optical Flow

Published on Mar 26
Authors:
,
,
,

Abstract

MegaFlow employs pre-trained Vision Transformer features to address large displacement optical flow estimation through global matching and iterative refinement, achieving superior zero-shot performance across multiple benchmarks.

AI-generated summary

Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow, a simple yet powerful model for zero-shot large displacement optical flow. Rather than relying on highly complex, task-specific architectural designs, MegaFlow adapts powerful pre-trained vision priors to produce temporally consistent motion fields. In particular, we formulate flow estimation as a global matching problem by leveraging pre-trained global Vision Transformer features, which naturally capture large displacements. This is followed by a few lightweight iterative refinements to further improve the sub-pixel accuracy. Extensive experiments demonstrate that MegaFlow achieves state-of-the-art zero-shot performance across multiple optical flow benchmarks. Moreover, our model also delivers highly competitive zero-shot performance on long-range point tracking benchmarks, demonstrating its robust transferability and suggesting a unified paradigm for generalizable motion estimation. Our project page is at: https://kristen-z.github.io/projects/megaflow.

Community

The zero-shot angle is compelling — most optical flow methods struggle when you move beyond their training domain. The large displacement problem is particularly interesting for video understanding in agentic systems where camera motion is unpredictable. Curious if MegaFlow handles occlusion boundaries differently than RAFT or GMFlow? The tradeoff between iteration count and displacement range is something we've wrestled with in real-time video pipelines. Any benchmarks on inference speed vs accuracy at 4K resolution?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.25739
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.25739 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.