OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

ganlinyang authored a paper 2 days ago

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

shepnerd updated a collection 3 days ago

InternVideo3

shepnerd updated a collection 3 days ago

InternVideo3

View all activity

Papers

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

RIVER: A Real-Time Interaction Benchmark for Video LLMs

View all Papers

shepnerd

updated a collection 3 days ago

InternVideo3

Collection

InternVideo3 enhances long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms • 3 items • Updated 3 days ago • 1

KingNish

posted an update 11 days ago

Post

4249

We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos

Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.

Trained on: H100s from Modal

The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.

Everything is open:
🤖 Demo → build-small-hackathon/OpenMythos
🧠 Model → build-small-hackathon/OpenMythos
📦 CVE Dataset → build-small-hackathon/CVE_Vulnerailities_Detailed
📄 ArXiv Dataset → himanshu17HF/ArvixImport-Filtered-Final

Try it out and let us know where it breaks 🙏

Abhaykoul

posted an update 11 days ago

Post

211

Shipped v0.1.2 of vtx — a minimalist coding agent for the terminal.

Most agentic CLIs ship 10k+ token system prompts. Vtx is ~2,200. Less prompt overhead means more room for your code in the model's context window.

Vtx is a from-scratch Python implementation of the design philosophy behind pi-mono — same principles, pure Python, no transpiled runtime.

What ships out of the box:

→ Textual TUI + headless CLI (vtx -p "fix the failing test")
→ 49 LLM provider gateways, all declared in a single provider.yaml
→ 5 core tools (read / edit / write / bash / find) plus web search and fetch
→ Session tree with compaction, handoff, and resume
→ AGENTS.md / CLAUDE.md auto-discovery
→ Skills system — drop SKILL.md files in .agents/skills/ and they become slash commands
→ Two OAuth flows (GitHub Copilot device flow, OpenAI Codex PKCE)
→ Two-mode permissions: prompt (default) or auto, with a safe-command allowlist

This release adds a proper extension system. Register new LLM-callable tools, intercept tool calls, hook lifecycle events, and add slash commands from a single register(api) function in a Python file under ~/.vtx/agent/extensions/. Extensions can override built-in tools by name and chain handler logic across subscribers.

Apache 2.0. uv tool install vtx-coding-agent and you're running.

GitHub: https://github.com/OEvortex/vtx-coding-agent
PyPI: https://pypi.org/project/vtx-coding-agent

Built in the open. Feedback, extensions, and PRs welcome.

prithivMLmods

posted an update 12 days ago

Post

4171

Wan2.2-I2V-Fast with highly upscaled sequential frame sampling is now available as a Spaces demo, built using Wan2.2-I2V and FLUX.2-Klein. Try the demo using the links below.👇

➠ wan2.2-i2v-fast : prithivMLmods/wan2.2-i2v-fast
➠ github: https://github.com/prithivsakthiur/wan2.2-i2v-fast
➠ collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

⤷ To learn more, visit the app page or the respective model pages.

qishisuren

submitted a paper to Daily Papers 12 days ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Paper • 2605.30789 • Published 25 days ago • 26

yanziang

authored a paper 16 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 17 days ago • 23

Eurayka

authored a paper 16 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 17 days ago • 23

linghan199

authored a paper 16 days ago

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Paper • 2606.12195 • Published 17 days ago • 23

wzk1015

in OpenGVLab/Mono-InternVL-2B 20 days ago

Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

#13 opened 20 days ago by

KBayoud

Fix KeyError in init when vision_config is empty (Transformers v5 compatibility)

#12 opened 20 days ago by

KBayoud

Eurayka

authored a paper 22 days ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Paper • 2606.05769 • Published 23 days ago • 6

Eurayka

submitted a paper to Daily Papers 22 days ago

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Paper • 2606.05769 • Published 23 days ago • 6

prithivMLmods

posted an update 27 days ago

Post

2190

Dropping the collection of Qwen 3.5/3.6 MTP GGUF quants. 🤗

🔗 Collection 1: https://huggingface.co/collections/prithivMLmods/mtp-qwen-35-36-moe-stable

🔗 Collection 2: https://huggingface.co/collections/prithivMLmods/mtp-qwen-35-36-stable

> To learn more, visit the respective model pages.

heroding77

submitted a paper to Daily Papers 29 days ago

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Paper • 2605.28424 • Published May 27 • 32

prithivMLmods

posted an update 30 days ago

Post

6191

PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

🤗 Space: prithivMLmods/PiD-Image-Upscaler
🔗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > To learn more, visit the app page or the respective model pages.

Kaining

authored a paper about 1 month ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published May 25 • 103

Kaining

submitted a paper to Daily Papers about 1 month ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published May 25 • 103

prithivMLmods

posted an update about 1 month ago

Post

5597

I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.

Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.

Thanks for all the community support! 🤗❤️

🔗 Spaces: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

4 replies

AI & ML interests

Recent Activity

Papers

Team members 118

OpenGVLab's activity

Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

Fix KeyError in __init__ when vision_config is empty (Transformers v5 compatibility)

Fix KeyError in init when vision_config is empty (Transformers v5 compatibility)