Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)

Yong-Ju updated a collection about 1 month ago

Yong-Ju updated a collection about 1 month ago

Yong-Ju updated a collection about 1 month ago

etri-vilab 's collections 4

etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Mar 20 • 22
etri-vilab/MultihopSpatial

Viewer • Updated Mar 20 • 11.3k • 1.42k • 2
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model

Paper • 2603.18892 • Published Mar 19 • 1

Running

32

Safe-VLM

🚀

32

Answer questions about images with text prompts
etri-vilab/SafeLLaVA-7B

Image-Text-to-Text • 7B • Updated Dec 5, 2025 • 23 • 3
etri-vilab/SafeLLaVA-13B

Image-Text-to-Text • 13B • Updated Nov 24, 2025 • 9 • 3
etri-vilab/SafeQwen2.5-VL-7B

Image-Text-to-Text • 8B • Updated Nov 17, 2025 • 228 • 3

etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Mar 20 • 22
etri-vilab/MultihopSpatial

Viewer • Updated Mar 20 • 11.3k • 1.42k • 2
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model

Paper • 2603.18892 • Published Mar 19 • 1

Running

32

Safe-VLM

🚀

32

Answer questions about images with text prompts
etri-vilab/SafeLLaVA-7B

Image-Text-to-Text • 7B • Updated Dec 5, 2025 • 23 • 3
etri-vilab/SafeLLaVA-13B

Image-Text-to-Text • 13B • Updated Nov 24, 2025 • 9 • 3
etri-vilab/SafeQwen2.5-VL-7B

Image-Text-to-Text • 8B • Updated Nov 17, 2025 • 228 • 3