SykoLLM-V6.0-Test

Model Overview

SykoLLM-V6.0-Test is an up-scaled and structurally expanded version of the previous SykoLLM models. Developed by SykoSLM, this model is currently in the experimental/testing phase.

The primary objective of this release is to provide a structurally larger foundation model by expanding both the depth (number of layers) and the width (intermediate size / MLP capacity) of the previous architecture, without losing the pre-trained knowledge.

Architectural Expansion (Up-Scaling)

In order to overcome the "Knowledge Interference" (capacity bottleneck) observed in previous iterations, significant architectural changes have been applied to this model:

  • Depth Up-Scaling (DUS): The number of hidden layers has been increased to 24. This was achieved by carefully duplicating and mapping the existing layers to preserve the logical and syntactic capabilities of the model.
  • Width Expansion (MLP Scaling): The intermediate_size has been expanded to 3072. To prevent catastrophic forgetting, the newly added weights in the feed-forward networks were initialized with exact zero (0.0). This ensures that the newly added parameters act as identity functions during the initial forward pass.

⚠️ Important Notice: Status of the Model

This model is currently UNTRAINED on the newly added parameters.

It has been expanded solely to save pre-training time and preserve existing knowledge. While the model retains the capabilities of its predecessor, the newly added parameters (~100M+ new parameters) are currently dormant (zeroed out).

To fully utilize the expanded capacity and activate the new parameters, fine-tuning is required. If used in its current state, the model will function similarly to the previous smaller version, as the new structural capacity has not yet been fine-tuned on new or existing datasets.

Why This Approach?

Training a Large Language Model from scratch requires immense computational resources and time. By utilizing Net2Net (Knowledge Distillation) principles:

  1. We preserve the billions of tokens worth of knowledge already embedded in the model.
  2. We provide the model with a much larger "encyclopedic" memory (MLP expansion) to prevent data overlapping and hallucination.
  3. We drastically reduce the time required to achieve a higher parameter count.

Usage

You can load the model using the transformers library, but please keep in mind that it requires further fine-tuning for optimal performance.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SykoSLM/SykoLLM-V6.0-Test"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")

Developed by SykoSLM

Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SykoSLM/SykoLLM-V6.0-Test

Finetuned
(1)
this model