Instructions to use Soofi-Project/Soofi-S-Instruct-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Soofi-Project/Soofi-S-Instruct-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Soofi-Project/Soofi-S-Instruct-Preview", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Soofi-Project/Soofi-S-Instruct-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Soofi-Project/Soofi-S-Instruct-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Soofi-Project/Soofi-S-Instruct-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Soofi-Project/Soofi-S-Instruct-Preview
- SGLang
How to use Soofi-Project/Soofi-S-Instruct-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Soofi-Project/Soofi-S-Instruct-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Soofi-Project/Soofi-S-Instruct-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Soofi-Project/Soofi-S-Instruct-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Soofi-Project/Soofi-S-Instruct-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Soofi-Project/Soofi-S-Instruct-Preview with Docker Model Runner:
docker model run hf.co/Soofi-Project/Soofi-S-Instruct-Preview
Soofi-S-Instruct-Preview Overview
⚠️ Preview / internal checkpoint. Weights and metadata may still change.
Description
Soofi-S-Instruct-Preview generates text responses for general assistant and instruction-following tasks. It is the instruction-tuned variant of SOOFI-S, a sovereign, open-source language model developed by a German research consortium. SOOFI (Sovereign Open Source Foundation Models) is designed to provide a secure, European open-source alternative to US and Chinese AI models for industrial use, featuring strong reasoning and AI agent capabilities.
For explicit chain-of-thought reasoning, see the thinking variants Soofi-S-Isar-Preview and Soofi-S-Rhine-Preview.
This model is for research and development only (Preview).
License/Terms of Use
Released under a custom license ("Other"). TODO: add the full license text / link — the official card references a License section that is not yet filled in.
Deployment Geography
Global (open release on the Hugging Face Hub). Development and training infrastructure are located in Europe (see Computational Load).
Use Case
Enterprise developers and researchers seeking a sovereign, European open-source LLM for industrial use: general assistant tasks, instruction following, and AI-agent / tool-use workflows. English and German are the primary languages.
Release Date
Hugging Face Hub — Preview at https://huggingface.co/Soofi-Project/Soofi-S-Instruct-Preview. TODO: final release date (MM/DD/YYYY).
Reference(s)
- Project: https://soofi.info
- Related models: see the Related models section below.
- TODO: link the technical report / paper once published.
Model Architecture
Architecture Type: Transformer-based hybrid Mixture-of-Experts (MoE) with
Mamba-2 state-space (SSM) layers and attention layers.
Network Architecture: Custom Hybrid Mamba-2/MoE (Nemotron-style), designed
from scratch — 23 Mamba-2/MoE layers + 6 attention layers; 128 routing experts
- 1 shared expert per MoE layer; 6 experts activated per token.
This model was developed from scratch (no base model).
Number of model parameters: 3.0×10^10 total (30B), with ~3.5B active
parameters during inference.
Computational Load
Cumulative Compute: TODO.
Estimated Energy and Emissions for Model Training: TODO. Training
infrastructure is hosted entirely in Europe on T-Systems' Industrial AI Cloud
(Deutsche Telekom) to ensure data sovereignty.
Input
Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other Properties Related to Input: Chat/ChatML-style messages via the
embedded chat template. No system prompt is required (none is injected by
default). Context length: see config.json (TODO: confirm maximum context).
Output
Output Type(s): Text
Output Format(s): String
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Non-thinking by default (no explicit
reasoning trace). Supports the model's native tool-calling format.
Software Integration
Runtime Engine(s):
- Hugging Face
transformers(trust_remote_code=True) - vLLM, llama.cpp/Ollama via the quantized variants (see Related models)
Supported Hardware Microarchitecture Compatibility:
- NVIDIA GPUs (Ampere and newer recommended)
Preferred/Supported Operating System(s):
- Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment.
Model Version(s)
- Soofi-S-Instruct-Preview — bf16 safetensors, unquantized (this repo).
- Quantized derivatives:
…-GGUF(llama.cpp/Ollama) and…-FP8(vLLM); see Related models.
Installation & Usage
SOOFI-S ships with custom modeling code. You must load it using trust_remote_code=True with transformers.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Soofi-Project/Soofi-S-Instruct-Preview"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, torch_dtype="auto", device_map="auto"
)
# No system prompt is required (none is injected by default).
messages = [{"role": "user", "content": "Briefly explain the concept of AI sovereignty."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(**{"input_ids": inputs}) # sampling defaults come from generation_config.json
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Training, Testing, and Evaluation Datasets
Dataset Overview
- Total Size: ~2.5×10^13 tokens (25 trillion).
- Languages: English, German (primary); French, Italian, Spanish (limited).
English acts as the pivot language.
- Knowledge Cutoff: End of 2025.
- Training Start: April 2026.
Training Dataset
Link: TODO.
Data Modality: Text.
Text Training Data Size: More than 10 Trillion Tokens (~25T).
Data Collection Method by dataset: Hybrid (freely available, high-quality
sources). TODO: refine.
Labeling Method by dataset: TODO.
Properties: Trained entirely from scratch on freely available, high-quality
tokens.
Testing Dataset
Link: TODO.
Properties: TODO.
Evaluation Dataset
Link: TODO.
Benchmark Score: TODO — add key benchmarks (e.g. reasoning, multilingual)
once available.
Properties: TODO.
Inference
Acceleration Engine: transformers; vLLM / llama.cpp via quantized
variants.
Specific Test Hardware: TODO.
Ethical Considerations
The SOOFI consortium believes Trustworthy AI is a shared responsibility and has established policies and practices to enable development for a wide array of AI applications. When downloaded or used, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information, see the Model Card++ subcards below. Please report model quality, risk, security vulnerabilities, or concerns to contact@soofi.info.
Bias Subcard
| Field | Response |
|---|---|
| Participation considerations from adversely impacted groups in model design and testing | TODO |
| Measures taken to mitigate against unwanted bias | TODO |
| Bias Metric (if measured) | TODO |
Explainability Subcard
| Field | Response |
|---|---|
| Intended Task/Domain | General assistant, instruction following, AI-agent/tool use |
| Model Type | Hybrid Mixture-of-Experts (MoE) autoregressive language model |
| Intended Users | Enterprise developers and researchers |
| Output | Text (String) |
| Describe how the model works | Generates text autoregressively; a router activates 6 of 128 experts per token across hybrid Mamba-2/MoE and attention layers |
| Technical Limitations | Preview checkpoint; non-primary languages (FR/IT/ES) are limited; may produce inaccurate or outdated content (knowledge cutoff end of 2025) |
| Verified to have met prescribed quality standards | TODO |
| Performance Metrics | TODO (see Evaluation Dataset) |
| Potential Known Risks and Mitigation | May generate incorrect, biased, or unsafe content; apply use-case-specific testing and guardrails before deployment |
| Terms of Use/Licensing | Other (see License/Terms of Use) |
Privacy Subcard
| Field | Response |
|---|---|
| Generatable or reverse engineerable personal data? | TODO |
| Personal data used to create this model? | TODO |
| Was consent obtained for any personal data used? | TODO |
| How often is dataset reviewed? | TODO |
| Was data from user interactions with the AI model used to train the model? | No |
| Is there provenance for all datasets used in training? | TODO |
| Applicable Privacy Policy | TODO |
Safety & Security Subcard
| Field | Response |
|---|---|
| Model Application Field(s) | Industrial use; customer service; general-purpose assistant and agent applications |
| Describe the life critical impact (if present) | None intended. Not for use in life-critical or safety-critical decision-making without independent validation |
| Use Case Restrictions | Abide by the applicable license agreement (see License/Terms of Use) |
| Model and dataset restrictions | TODO |
Related models
- Reasoning variants: Soofi-Project/Soofi-S-Isar-Preview and Soofi-Project/Soofi-S-Rhine-Preview
- GGUF quantizations (llama.cpp/Ollama): Soofi-Project/Soofi-S-Instruct-Preview-GGUF
- FP8 quantization (vLLM): Soofi-Project/Soofi-S-Instruct-Preview-FP8
Citation
@misc{soofi_s_instruct_preview,
title = {Soofi-S-Instruct-Preview},
author = {SOOFI Consortium},
year = {2026},
url = {https://huggingface.co/Soofi-Project/Soofi-S-Instruct-Preview}
}
- Downloads last month
- 4,231