MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model
AI & ML interests
Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)
Recent Activity
Organization Card
π₯ We are the Visual Intelligence Research Section in the Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon, South Korea
| π¨ Safe LLaVA/Safe QWen/Safe Gemma : AI Safety-tuned Vision Language Model |
|---|
![]() |
| π¨ KOALA : text-to-image generation | π Ko-LLaVA : Korean Vision-Language Model |
|---|---|
| (feat. Knowledge Distillation based Stable Diffusion XL) | (feat. Korean Large Language and Vision Assistant) |
![]() |
![]() |
models 16
etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct
Image-Text-to-Text β’ 4B β’ Updated β’ 10
etri-vilab/SafeLLaVA-7B
Image-Text-to-Text β’ 7B β’ Updated β’ 21 β’ 3
etri-vilab/SafeLLaVA-13B
Image-Text-to-Text β’ 13B β’ Updated β’ 20 β’ 3
etri-vilab/SafeQwen2.5-VL-32B
Image-Text-to-Text β’ 33B β’ Updated β’ 188 β’ 3
etri-vilab/SafeQwen2.5-VL-7B
Image-Text-to-Text β’ 8B β’ Updated β’ 216 β’ 3
etri-vilab/SafeGem-27B
Image-Text-to-Text β’ 27B β’ Updated β’ 10 β’ 3
etri-vilab/SafeGem-12B
Image-Text-to-Text β’ 12B β’ Updated β’ 12 β’ 3
etri-vilab/koala-lightning-1.7b
Text-to-Image β’ Updated β’ 6 β’ 2
etri-vilab/koala-lightning-1b
Text-to-Image β’ Updated β’ 6 β’ 9
etri-vilab/koala-lightning-700m
Text-to-Image β’ Updated β’ 38 β’ 9


