Generate detailed answers from images or videos
Generate text and audio responses from images and videos