Missing example for running the model

by molntamas - opened Apr 11, 2023

Apr 11, 2023

This model needs a bounding box to specify which widget to describe.
But there is no example for this on the model card.
What is unclear how the bounding box should be specified.

As I understand the code should look something like this:

model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-widget-captioning-base")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-widget-captioning-base")

question = "? bounding box ?"

inputs = processor(images=image, text=question, return_tensors="pt")

predictions = model.generate(**inputs)
print(processor.decode(predictions[0], skip_special_tokens=True))

sunjae1294

Jun 27, 2023

Same issue here.
The model seems to return same caption regardless of the bounding box.

HaiminWang

Oct 13, 2024

Has anyone solved it yet?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment