Transformers
English
tokenization
A newer version of this model is available: qikp/pika-4

pika

๐ŸŽ‰ You are looking at pika 3, which uses the wordmix dataset!

pika is a simple and public domain-like tokenizer.

Special Tokens

  • End-of-Sequence token: [EOS] (ID 0)
  • Padding token: [PAD] (ID 1)

Training

pika was trained on qikp/wordmix.

Limitations

Some uncommon special tokens aren't present, you'll have to add them manually if needed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train qikp/pika-3

Space using qikp/pika-3 1