How to use SmartPy/MultiModal-Text-Image-DeBerta-ViT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="SmartPy/MultiModal-Text-Image-DeBerta-ViT")
# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("SmartPy/MultiModal-Text-Image-DeBerta-ViT") model = AutoModelForMaskedLM.from_pretrained("SmartPy/MultiModal-Text-Image-DeBerta-ViT")