Instructions to use witiko/mathberta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use witiko/mathberta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="witiko/mathberta")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("witiko/mathberta") model = AutoModelForMaskedLM.from_pretrained("witiko/mathberta") - Inference
- Notebooks
- Google Colab
- Kaggle
Prediction size vs tokenizer size
#3
by surya-narayanan - opened
Hi, the model seems to output a tensor of size batchsize x sentence size x 78672 but the tokenizer vocab size is 50265. Any idea why there's this discrepancy?
Hi @surya-narayanan , MathBERTa's tokenizer was largely extended to cover the math vocabulary. At the time of training, this was not fully supported by transformers, so some inconsistencies like this can still be ocassionally found. FWIW, later on, we opened and merged related PR to transformers library.
Anyway, I've checked for you that model's config matches the len(tokenizer.vocab), so the current model and tokenizer should be good to be used together as-is.
hmm, still facing an error- should i re-install transformers?