Archaea-74M
Archaea-74M is a decoder-only causal language model with approximately 74 million parameters, pretrained from scratch on BetterDataset-2M. The model uses a LLaMA-style architecture with Grouped Query Attention (GQA) and was trained using BF16 mixed precision.
This release represents approximately 1.23 billion trained tokens out of a planned 1.6 billion token pretraining run, making it a substantial intermediate checkpoint that captures most of the intended training curriculum while leaving room for future scaling and refinement.
Model Card
| Attribute | Value |
|---|---|
| Model ID | GODELEV/Archaea-74M |
| Parameters | ~74 Million |
| Architecture | Decoder-only Transformer (LLaMA-style) |
| Attention | Grouped Query Attention (GQA) |
| Context Length | 1024 |
| Tokenizer | GPT-2 |
| Training Precision | BF16 |
| Framework | PyTorch + Transformers |
| License | MIT |
Architecture
Transformer Configuration
| Parameter | Value |
|---|---|
| Hidden Size | 512 |
| Intermediate Size | 1408 |
| Layers | 8 |
| Attention Heads | 8 |
| KV Heads | 2 |
| GQA Ratio | 4:1 |
| Activation | SiLU |
| Normalization | RMSNorm |
| Context Length | 1024 |
The model implements Grouped Query Attention, reducing KV-cache memory requirements while maintaining strong representational capacity for a model of this scale.
Training
Dataset
Archaea-74M was pretrained on GODELEV/BetterDataset-2M, a multi-source corpus composed of:
- General web text
- Conversational content
- Knowledge-focused material
- Educational content
- Instruction-like examples
- Technical and programming text
The complete corpus contains approximately 1.6 billion tokens.
Training Progress
| Metric | Value |
|---|---|
| Planned Tokens | ~1.6B |
| Tokens Trained | ~1.23B |
| Completion | ~77% |
| Planned Steps | 25,000 |
| Completed Steps | 18,800 |
Optimization
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Scheduler | OneCycleLR |
| Peak Learning Rate | 6e-4 |
| Weight Decay | 0.1 |
| Gradient Clipping | 1.0 |
| Sequence Length | 1024 |
| Effective Batch Size | 64 |
| Precision | BF16 |
Training Statistics
| Metric | Value |
|---|---|
| Initial Loss | 10.9223 |
| Final Loss | 2.9488 |
| Best Loss | 2.8071 |
| Final Perplexity | 19.08 |
| Best Perplexity | 16.56 |
Training Loss Curve
Learning Rate Schedule
Evaluation
Evaluated using EleutherAI LM Evaluation Harness.
Benchmark Results
Done on 0-Shot
| Benchmark | Metric | Score |
|---|---|---|
| HellaSwag | acc_norm | 27.31% |
| PIQA | acc_norm | 58.54% |
| WinoGrande | acc | 51.54% |
| BoolQ | acc | 56.33% |
| ARC-Easy | acc_norm | 39.06% |
| ARC-Challenge | acc_norm | 22.70% |
| OpenBookQA | acc_norm | 26.00% |
| CommonsenseQA | acc | 19.66% |
| LAMBADA | acc | 18.01% |
| BLiMP | acc | 74.91% |
| MMLU | acc | 25.07% |
| SciQ | acc_norm | 57.70% |
| COPA | acc | 61.00% |
| RACE | acc | 24.78% |
| SWAG | acc_norm | 41.98% |
| TruthfulQA MC2 | acc | 46.46% |
| WikiText-2 | Word Perplexity | 68.06 |
Evaluation Notes
Failed tasks from previous runs:
- SocialIQA: dataset loading incompatibility.
- Arithmetic benchmark: task identifier changes in recent lm-eval versions.
Sample Generations
The following examples were generated directly from the pretrained base model.
Example 1
Input
Once upon a time
Output
Once upon a time
In the days after, I was told that I could not make a living. So I said to myself,
βI want to go back to my family.β I said, βThis is a good one. You can take me to the doctor
or something and do it with you.β
The doctor asked me how I felt about this. He said he did. But I said,
βYou can't have a baby.β
Example 2
Input
The horizon looked
Output
the horizon looked like a big black hole. The sun was shining and the wind was blowing.
The sun was shining with the sun, but it was still there. The stars were on top of each other.
The sun was falling as a result of the weather. The sun was going up.
Example 3
Input
what is the future of ai
Output
what is the future of Ai?
I don't know what you're talking about, but I'm sure that's a good reason for you to be optimistic.
It doesn't matter if it doesn't take away your wandering in a positive direction.
The answer is simple: I'll leave you with a feeling of uncertainty and optimism.
These outputs illustrate the model's ability to continue prompts coherently while also demonstrating typical limitations of small-scale pretrained language models, including repetition, topic drift, and inconsistent factual reasoning.
Usage
Installation
pip install torch transformers
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "GODELEV/Archaea-74M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
device_map="auto"
)
Text Generation
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "GODELEV/Archaea-74M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.8,
do_sample=True,
repetition_penalty=1.2,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Repository Structure
Archaea-74M/
βββ config.json
βββ generation_config.json
βββ model.safetensors
βββ tokenizer.json
βββ tokenizer_config.json
βββ Archaea74M_Training_Loss_Curve.png
βββ Archaea74M_Learning_Rate_Schedule.png
βββ README.md
Limitations
Archaea-74M is a base pretrained model and has not undergone:
- Instruction tuning
- RLHF
- Preference optimization
- Safety alignment
Known limitations:
- Hallucinations and factual inaccuracies
- Limited reasoning due to model scale
- Sensitivity to prompt phrasing
- Fixed 1024-token context window
- Not suitable for high-stakes applications
Future Work
- Instruction tuning
- Expanded benchmark coverage
- Longer context lengths
- Improved data quality and curriculum design
Citation
@misc{archaea74m,
title={Archaea-74M},
author={Akshit Kumar},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/GODELEV/Archaea-74M}
}
- Downloads last month
- 288