Fangyun Wei's picture

5 3

Fangyun Wei

fangyunwei

·

https://scholar.google.com/citations?user=-ncz2s8AAAAJ&hl=

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

upvoted a paper 1 day ago

Spatia: Video Generation with Updatable Spatial Memory

upvoted a paper 1 day ago

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

View all activity

Organizations

authored 20 papers 1 day ago

Two-shot Video Object Segmentation

Paper • 2303.12078 • Published Mar 21, 2023

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

Paper • 1904.11492 • Published Apr 25, 2019

Global Context Networks

Paper • 2012.13375 • Published Dec 24, 2020

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

Paper • 2401.04730 • Published Jan 9, 2024

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26, 2024 • 20

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6, 2024

Improving Continuous Sign Language Recognition with Cross-Lingual Signs

Paper • 2308.10809 • Published Aug 21, 2023

RAIN: Your Language Models Can Align Themselves without Finetuning

Paper • 2309.07124 • Published Sep 13, 2023 • 3

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Paper • 2203.14940 • Published Mar 28, 2022

Unsupervised Prompt Learning for Vision-Language Models

Paper • 2204.03649 • Published Apr 7, 2022

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Paper • 2309.02186 • Published Sep 5, 2023 • 23

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Paper • 2403.07874 • Published Mar 12, 2024

Rethinking Generative Large Language Model Evaluation for Semantic Comprehension

Paper • 2403.07872 • Published Mar 12, 2024

Attentive Mask CLIP

Paper • 2212.08653 • Published Dec 16, 2022 • 1

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

Paper • 2406.16858 • Published Jun 24, 2024 • 1

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

Paper • 2406.11837 • Published Jun 17, 2024

End-to-End Semi-Supervised Object Detection with Soft Teacher

Paper • 2106.09018 • Published Jun 16, 2021

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Paper • 2010.15831 • Published Oct 29, 2020

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

Paper • 2406.16866 • Published Jun 24, 2024