Chronos 🕰️ — Your 20th-Century Historian

"Those who cannot remember the past are condemned to repeat it." — George Santayana, philosopher, and apparently someone who never met a hallucinating LLM.

Chronos is a retrieval-augmented generation (RAG) AI that knows the 20th century the way your history teacher wished they did — but without the monotone voice and the overhead projector.

It pairs a Qwen 2.5 3B language model with a FAISS-powered knowledge base built from hundreds of Wikipedia articles: World War I, World War II, the Cold War, the Space Race, major political upheavals, key inventions, and everything in between. Ask it something historical and it digs through its archives like a librarian who actually enjoys their job. Ask it something casual and it just... talks to you. Like a person. Imagine that.

What Chronos actually is

Most history bots either hallucinate confidently or refuse to answer anything fun. Chronos tries to do neither. It was built with one goal: give accurate, evidence-backed historical answers while still being a conversation worth having.

It will not make up that Churchill and Stalin went to the same barber. It will tell you, with sources, what actually happened at Yalta. And if you just want to say hi, it'll say hi back.

🧠 How it works

Chronos runs a two-layer architecture depending on what you ask:

Layer 1 — Casual chat For greetings, small talk, or anything outside the historical lane, the Qwen 3B model answers directly. No retrieval, no database, just the raw language model being friendly. This is also the layer that handles "who are you?" before the bot accidentally goes looking through WWII chunks for an answer about itself. (Yes, that happened. No, it was not funny at the time.)

Layer 2 — Historical RAG When the question touches 20th-century history, the pipeline kicks in:

A keyword detector flags the query as historical
The e5-base-v2 bi-encoder retrieves 30 candidate chunks from the FAISS index
A cross-encoder (ms-marco-MiniLM-L-12-v2) re-ranks them and keeps the top 4
Those 4 chunks are packed into a prompt alongside the question
Qwen generates an answer grounded in the retrieved context

There's also a confidence threshold — if even the best-ranked chunk scores too low, Chronos says "I don't have enough information" rather than inventing something. This is called honesty. More AI systems should try it.

A small hard-coded safety net handles a handful of ultra-high-stakes questions (think: "Who led Nazi Germany?") before retrieval even begins, guaranteeing accuracy on the facts that really cannot be wrong.

📦 What's inside

Component	What it is
Base LLM	Qwen/Qwen2.5-3B-Instruct (4-bit quantized)
Bi-encoder	intfloat/e5-base-v2
Cross-encoder	cross-encoder/ms-marco-MiniLM-L-12-v2
FAISS index	jjk_index.faiss — historical Wikipedia chunks
Chunks	chunks.txt (~12 MB, one paragraph per line)
Pipeline	pipeline.py — one class, one `.ask()` method, done
Config	rag_config.json

🚀 Quick start

from huggingface_hub import snapshot_download
from pipeline import Chronos

model_dir = snapshot_download("QuantaSparkLabs/Chronos-3B")
bot = Chronos(model_dir)

# historical question
print(bot.ask("What caused World War I?"))

# casual
print(bot.ask("Hey, what's up?"))

Requirements: 4-bit quantization means you need bitsandbytes and a GPU with at least 6 GB VRAM. CPU inference works but you'll age noticeably while waiting.

📊 Evaluation Results

These are internal evaluations run manually on Chronos-3B — no benchmark leaderboard, no cherry-picked test set, just honest testing across real question types. Take them at face value.

Category	Score	Notes
Factual Accuracy (hard facts)	✅ 10/10	Critical questions — leaders, dates, core events — are answered instantly by the built-in safety net with zero error
Historical Knowledge (open-ended)	🔶 8/10	Most open questions (causes of wars, event explanations, country lists) are answered correctly via the knowledge base. Occasionally the confidence filter returns "I don't know" when the retrieval score is borderline
Hallucination Control	✅ 9/10	The confidence threshold + cross-encoder combination means Chronos almost never invents false history. It prefers to admit ignorance over guessing
Casual Friendliness	✅ 9/10	Greetings, identity questions, and small talk are handled with a warm, lively personality. It never feels robotic
Consistency (multi-turn)	🔶 7/10	Follow-up questions work, but the model can lose the thread across a long conversation. Multi-turn memory is a known limitation and a future goal
Speed (T4 GPU)	✅ 8/10	Answers generate in 2–5 seconds. Initial model download is heavy (~6 GB), but subsequent inferences are fast

Overall: 8/10

Chronos is reliable, personable, and historically faithful. It occasionally needs a second prompt on very obscure questions, but it does not fabricate harmful falsehoods. The known shortcomings — multi-turn coherence and occasional over-caution on borderline retrievals — are well-understood and can be tuned further.

These results reflect self-evaluation on a curated internal test set. Independent benchmarking on formal QA datasets (e.g. TriviaQA, NaturalQuestions) is a planned next step.

🔧 The bugs we fought (and eventually won)

Look, no project ships clean. Here's what actually happened during development, because pretending otherwise helps nobody.

Bug 1 — The cryptic list comparison crash

TypeError: '<=' not supported between instances of 'list' and 'int'

Every single answer crashed with this. Took an embarrassing amount of time to realize Gradio's ChatInterface passes (message, history) to your function, and our function was accidentally catching the history list as max_new_tokens. The fix was three words: fix the signature. The debugging took considerably longer.

Bug 2 — Qwen's corrupted generation config

The upstream Qwen 3B repo had max_new_tokens stored as a list in generation_config.json instead of an integer. This is the kind of bug that makes you question everything you know about software. We fixed it by loading a clean GenerationConfig manually and overwriting the bad file in our upload. Not glamorous. Worked perfectly.

Bug 3 — The LFS ghost files

Upload kept failing with "your push was rejected because an LFS pointer pointed to a file that does not exist." The cause was leftover LFS metadata from a previous interrupted upload haunting the repository like a very technical ghost. Solution: nuke the repo, start fresh, upload every file individually with upload_file instead of upload_folder. Tedious. Effective.

Bug 4 — The Finnish shipwreck incident

Asked "Who was the leader of Germany during WWI?" and got a confident paragraph about a Finnish shipwreck. This is what happens when a retriever fetches irrelevant chunks and a language model tries to connect them anyway. The fix was the confidence threshold — if the cross-encoder score is too low, Chronos admits it doesn't know. Hallucinations are not a feature.

(We still don't know where the shipwreck came from. Some mysteries are better left unsolved.)

Bug 5 — The identity crisis

"Who are you?" triggered a historical retrieval search, found nothing relevant, and the bot replied "I don't have enough information." Chronos literally did not know who it was. We fixed this by adding an identity handler at the very top of ask(), before any retrieval logic runs. An AI having an existential crisis is only funny in retrospect.

🌐 Run it locally

import gradio as gr
from pipeline import Chronos

bot = Chronos("path/to/downloaded/model")

def chat(message, history):
    return bot.ask(message)

gr.ChatInterface(fn=chat, title="Chronos 🕰️").launch()

Or deploy on Hugging Face Spaces — the repo already includes pipeline.py. Point app_file at it and you're done.

🤝 Contributing

Found a historical inaccuracy? Want to add more chunks to the knowledge base? Think Chronos got something wrong about the Battle of Stalingrad?

Open a discussion on the Community Tab
Submit a PR with new or corrected chunks
Flag wrong answers in any Gradio demo — we review them periodically

All contributions welcome. History is big. The knowledge base can always be bigger.

A final note

This project took longer than expected, broke in ways that felt personal, and shipped anyway. That's the job. Chronos is dedicated to everyone who has ever stared at a stack trace at 2am and decided to keep going — and to everyone who genuinely loves history and thinks it deserves better than a model that makes things up.

It does. You do. Here it is.

Built with perseverance, caffeine, and a deep respect for the 20th century. QuantaSparkLabs

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support