Instructions to use Soofi-Project/Soofi-S-Instruct-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Soofi-Project/Soofi-S-Instruct-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Soofi-Project/Soofi-S-Instruct-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Soofi-Project/Soofi-S-Instruct-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Soofi-Project/Soofi-S-Instruct-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Soofi-Project/Soofi-S-Instruct-Preview

SGLang

How to use Soofi-Project/Soofi-S-Instruct-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Soofi-Project/Soofi-S-Instruct-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Soofi-Project/Soofi-S-Instruct-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Soofi-Project/Soofi-S-Instruct-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Soofi-Project/Soofi-S-Instruct-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Soofi-Project/Soofi-S-Instruct-Preview with Docker Model Runner:
```
docker model run hf.co/Soofi-Project/Soofi-S-Instruct-Preview
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Soofi-S-Instruct-Preview Overview

⚠️ Preview / internal checkpoint. Weights and metadata may still change.

Description

Soofi-S-Instruct-Preview generates text responses for general assistant and instruction-following tasks. It is the instruction-tuned variant of SOOFI-S, a sovereign, open-source language model developed by a German research consortium. SOOFI (Sovereign Open Source Foundation Models) is designed to provide a secure, European open-source alternative to US and Chinese AI models for industrial use, featuring strong reasoning and AI agent capabilities.

For explicit chain-of-thought reasoning, see the thinking variants Soofi-S-Isar-Preview and Soofi-S-Rhine-Preview.

This model is for research and development only (Preview).

License/Terms of Use

Released under a custom license ("Other"). TODO: add the full license text / link — the official card references a License section that is not yet filled in.

Deployment Geography

Global (open release on the Hugging Face Hub). Development and training infrastructure are located in Europe (see Computational Load).

Use Case

Enterprise developers and researchers seeking a sovereign, European open-source LLM for industrial use: general assistant tasks, instruction following, and AI-agent / tool-use workflows. English and German are the primary languages.

Release Date

Hugging Face Hub — Preview at https://huggingface.co/Soofi-Project/Soofi-S-Instruct-Preview. TODO: final release date (MM/DD/YYYY).

Reference(s)

Project: https://soofi.info
Related models: see the Related models section below.
TODO: link the technical report / paper once published.

Model Architecture

Architecture Type: Transformer-based hybrid Mixture-of-Experts (MoE) with Mamba-2 state-space (SSM) layers and attention layers.
Network Architecture: Custom Hybrid Mamba-2/MoE (Nemotron-style), designed from scratch — 23 Mamba-2/MoE layers + 6 attention layers; 128 routing experts

1 shared expert per MoE layer; 6 experts activated per token.

This model was developed from scratch (no base model).
Number of model parameters: 3.0×10^10 total (30B), with ~3.5B active parameters during inference.

Computational Load

Cumulative Compute: TODO.
Estimated Energy and Emissions for Model Training: TODO. Training infrastructure is hosted entirely in Europe on T-Systems' Industrial AI Cloud (Deutsche Telekom) to ensure data sovereignty.

Input

Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Chat/ChatML-style messages via the embedded chat template. No system prompt is required (none is injected by default). Context length: see config.json (TODO: confirm maximum context).

Output

Output Type(s): Text
Output Format(s): String
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Non-thinking by default (no explicit reasoning trace). Supports the model's native tool-calling format.

Software Integration

Runtime Engine(s):

Hugging Face transformers (trust_remote_code=True)
vLLM, llama.cpp/Ollama via the quantized variants (see Related models)

Supported Hardware Microarchitecture Compatibility:

NVIDIA GPUs (Ampere and newer recommended)

Preferred/Supported Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment.

Model Version(s)

Soofi-S-Instruct-Preview — bf16 safetensors, unquantized (this repo).
Quantized derivatives: …-GGUF (llama.cpp/Ollama) and …-FP8 (vLLM); see Related models.

Installation & Usage

SOOFI-S ships with custom modeling code. You must load it using trust_remote_code=True with transformers.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Soofi-Project/Soofi-S-Instruct-Preview"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, torch_dtype="auto", device_map="auto"
)

# No system prompt is required (none is injected by default).
messages = [{"role": "user", "content": "Briefly explain the concept of AI sovereignty."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(**{"input_ids": inputs})   # sampling defaults come from generation_config.json

print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Training, Testing, and Evaluation Datasets

Dataset Overview

Total Size: ~2.5×10^13 tokens (25 trillion).
Languages: English, German (primary); French, Italian, Spanish (limited). English acts as the pivot language.
Knowledge Cutoff: End of 2025.
Training Start: April 2026.

Training Dataset

Link: TODO.
Data Modality: Text.
Text Training Data Size: More than 10 Trillion Tokens (~25T).
Data Collection Method by dataset: Hybrid (freely available, high-quality sources). TODO: refine.
Labeling Method by dataset: TODO.
Properties: Trained entirely from scratch on freely available, high-quality tokens.

Testing Dataset

Link: TODO.
Properties: TODO.

Evaluation Dataset

Link: TODO.
Benchmark Score: TODO — add key benchmarks (e.g. reasoning, multilingual) once available.
Properties: TODO.

Inference

Acceleration Engine: transformers; vLLM / llama.cpp via quantized variants.
Specific Test Hardware: TODO.

Ethical Considerations

The SOOFI consortium believes Trustworthy AI is a shared responsibility and has established policies and practices to enable development for a wide array of AI applications. When downloaded or used, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information, see the Model Card++ subcards below. Please report model quality, risk, security vulnerabilities, or concerns to contact@soofi.info.

Bias Subcard

Field	Response
Participation considerations from adversely impacted groups in model design and testing	TODO
Measures taken to mitigate against unwanted bias	TODO
Bias Metric (if measured)	TODO

Explainability Subcard

Field	Response
Intended Task/Domain	General assistant, instruction following, AI-agent/tool use
Model Type	Hybrid Mixture-of-Experts (MoE) autoregressive language model
Intended Users	Enterprise developers and researchers
Output	Text (String)
Describe how the model works	Generates text autoregressively; a router activates 6 of 128 experts per token across hybrid Mamba-2/MoE and attention layers
Technical Limitations	Preview checkpoint; non-primary languages (FR/IT/ES) are limited; may produce inaccurate or outdated content (knowledge cutoff end of 2025)
Verified to have met prescribed quality standards	TODO
Performance Metrics	TODO (see Evaluation Dataset)
Potential Known Risks and Mitigation	May generate incorrect, biased, or unsafe content; apply use-case-specific testing and guardrails before deployment
Terms of Use/Licensing	Other (see License/Terms of Use)

Privacy Subcard

Field	Response
Generatable or reverse engineerable personal data?	TODO
Personal data used to create this model?	TODO
Was consent obtained for any personal data used?	TODO
How often is dataset reviewed?	TODO
Was data from user interactions with the AI model used to train the model?	No
Is there provenance for all datasets used in training?	TODO
Applicable Privacy Policy	TODO

Safety & Security Subcard

Field	Response
Model Application Field(s)	Industrial use; customer service; general-purpose assistant and agent applications
Describe the life critical impact (if present)	None intended. Not for use in life-critical or safety-critical decision-making without independent validation
Use Case Restrictions	Abide by the applicable license agreement (see License/Terms of Use)
Model and dataset restrictions	TODO

Related models

Reasoning variants: Soofi-Project/Soofi-S-Isar-Preview and Soofi-Project/Soofi-S-Rhine-Preview
GGUF quantizations (llama.cpp/Ollama): Soofi-Project/Soofi-S-Instruct-Preview-GGUF
FP8 quantization (vLLM): Soofi-Project/Soofi-S-Instruct-Preview-FP8

Citation

@misc{soofi_s_instruct_preview,
  title  = {Soofi-S-Instruct-Preview},
  author = {SOOFI Consortium},
  year   = {2026},
  url    = {https://huggingface.co/Soofi-Project/Soofi-S-Instruct-Preview}
}

Downloads last month: 4,231

Safetensors

Model size

32B params

Tensor type

F32

BF16

Model tree for Soofi-Project/Soofi-S-Instruct-Preview

Quantizations

2 models

Collection including Soofi-Project/Soofi-S-Instruct-Preview

Soofi S Beta Models

Collection

9 items • Updated about 6 hours ago