how did you convert `transformers.PreTrainedTokenizer` to ggml format?

by keunwoochoi - opened Jun 8, 2023

Jun 8, 2023

can you share how you did it? i am trying to use my custom language model to ggml. but i also use a tokenizers.Tokenizer that i trained on my corpus.
i could get merges.txt and vocab.json, but idk how i can convert it to tokenizer.model file, which seems like the only format the ggml converter is compatible with.

thanks!

NeoDim

Owner Jun 16, 2023

You need to add support of your model architecture into ggml - see https://github.com/ggerganov/ggml/tree/master/examples
There is no magical recipe. You also can see https://github.com/OpenNMT/CTranslate2 as an alternative.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment