Instructions to use proxectonos/Carballo-Science with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use proxectonos/Carballo-Science with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="proxectonos/Carballo-Science")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("proxectonos/Carballo-Science")
model = AutoModelForCausalLM.from_pretrained("proxectonos/Carballo-Science")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use proxectonos/Carballo-Science with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "proxectonos/Carballo-Science"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "proxectonos/Carballo-Science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/proxectonos/Carballo-Science

SGLang

How to use proxectonos/Carballo-Science with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "proxectonos/Carballo-Science" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "proxectonos/Carballo-Science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "proxectonos/Carballo-Science" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "proxectonos/Carballo-Science",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use proxectonos/Carballo-Science with Docker Model Runner:
```
docker model run hf.co/proxectonos/Carballo-Science
```

Carballo-Science

Click to expand

Carballo-Legal

Model description

Carballo-Science is a specialized 7B-parameter instruction-tuned model designed for scientific text understanding and generation in Galician (GL) and Spanish (ES).

It is based on the foundation model BSC-LT/salamandra-7b-instruct and has been further trained on high-quality scientific corpora extracted from diverse sources.

Intended uses and limitations

Intended uses

Scientific-oriented text generation (summaries, rephrasing, explanations).
Chat-style scientific assistance (non-professional).

Limitations

May produce incomplete or incorrect scientific statements.
Not suitable for high-stakes or science decision-making.
Works best for GL and ES; other languages are not reinforced in this checkpoint.

How to use

from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "proxectonos/Carballo-Science"

text = "Qué sabes sobre o Proxecto Nós?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
  )

message = [ { "role": "user", "content": text } ]
date_string = datetime.today().strftime('%Y-%m-%d')

prompt = tokenizer.apply_chat_template(
    message,
    tokenize=False,
    add_generation_prompt=True,
    date_string=date_string
)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)
generated_tokens = outputs[0][len(inputs[0]):]
response = self.tokenizer.decode(generated_tokens, skip_special_tokens=False).strip()
response = response.split("<|reserved_token_1|>")[0].strip()
print(response)

Training

Training data

The model was trained on a mixture of general instructions and domain-specific legal texts.

Dataset Type	Languages	Sources
Instruction set	GL, ES , PT , CAT , EN	Galician Instruction Datasets
Scientific corpus	GL, ES	Wikipedia, PhD Thesis

Training hyperparameters

epochs: 0.5
dtype: bf16
block size: 2048
total batch size: 128
learning rate: 2e-6
scheduler: Linear
optimizations:
- gradient checkpointing: True
- flash attention: True
- liger kernels: True
- DeepSpeed stage: 2

Framework

Training was performed at the Galician Supercomputing Center (CESGA) on 2 nodes with 2× NVIDIA A100 40GB each, totaling 4 GPUs, across 2 days.

Evaluation

Formal evaluation is in progress. Early observations show improved handling of legal terminology, structured documents, and administrative phrasing in GL and ES.

Additional information

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA

Cite this model

Please cite the model as follows:

@misc{carballo_legal_2025,
    title     = {Carballo-Science: A Science Domain Instruction-Tuned Model for Galician and Spanish},
    author    = {Proxecto Nós Team},
    year      = {2025},
    publisher = {HuggingFace},
    howpublished = {\url{https://huggingface.co/proxectonos/Carballo-Science}},
}

Downloads last month: 9

Safetensors

Model size

8B params

Tensor type

F16

Model tree for proxectonos/Carballo-Science

Base model

BSC-LT/salamandra-7b

Finetuned

BSC-LT/salamandra-7b-instruct