Towards Open-ended Visual Quality Comparison
Paper
•
2402.16641
•
Published
•
17
Training the Co-Instruct-562K dataset with LLaVA-1.5-7B to facilitate users that prefer the LLaVA structure.
It is notably less accurate than the main version: https://huggingface.co/q-future/co-instruct, please refer to that checkpoint if you want a more accurate model.
Preliminary Results:
We are working on improving it in the future but we also warn that this structure (direct projection) might not be very friendly to multi-image scenarios.