YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Gemmagain Multimodal
Gemma3 multimodal model with layer looping support for the text decoder. This allows running the same physical text decoder layers multiple times in sequence, enabling parameter-efficient deep networks while leaving the vision tower unchanged.
Features
- Layer looping for text decoder only - Vision tower (SiglipVisionModel) is unchanged
- 100% weight compatible with
unsloth/gemma-3-4b-ptand other Gemma3 multimodal models - Supports generation with KV caching - Cache slots properly allocated for looped layers
- Flexible layer sequence format - Specify which layers to loop and how many times
Usage
import torch
from transformers import AutoConfig, Gemma3ForConditionalGeneration
# Load config with layer looping
config = AutoConfig.from_pretrained('rpDungeon/gemmagain-mm', trust_remote_code=True)
# Configure layer looping: layers 0-9 once, layers 10-27 twice, layers 28-33 once
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]
# Import and create model
from modeling_gemmagain import GemmagainForConditionalGeneration
model = GemmagainForConditionalGeneration(config)
# Load weights from any Gemma3 multimodal checkpoint
orig = Gemma3ForConditionalGeneration.from_pretrained(
'unsloth/gemma-3-4b-pt',
torch_dtype=torch.bfloat16,
)
model.load_state_dict(orig.state_dict())
del orig
model = model.to(dtype=torch.bfloat16, device='cuda')
Layer Sequence Format
The layer_sequence config accepts a flexible format:
| Format | Example | Meaning |
|---|---|---|
| Integer | 5 |
Single layer 5 |
| 2-element list | [4, 20] |
Layers 4-19 (end exclusive) |
| 3-element list | [10, 28, 2] |
Layers 10-27, repeated 2 times |
Example configurations:
# Default: all 34 layers once
config.text_config.layer_sequence = [[0, 34, 1]]
# Loopstral-style: loop middle layers twice
# Physical: 34 layers, Effective: 52 layers
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]
# Loop all layers twice (2x depth, same params)
config.text_config.layer_sequence = [[0, 34, 2]]
Architecture
GemmagainForConditionalGeneration
βββ model (GemmagainModel)
β βββ vision_tower (SiglipVisionModel) # Unchanged from Gemma3
β βββ multi_modal_projector # Unchanged from Gemma3
β βββ language_model (GemmagainTextModel) # Layer looping support
β βββ embed_tokens
β βββ layers[0..33] # Physical layers
β βββ _layer_sequence # Execution order with loops
β βββ norm
βββ lm_head
License
Apache 2.0 (same as Gemma3)
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support