YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Gemmagain Multimodal

Gemma3 multimodal model with layer looping support for the text decoder. This allows running the same physical text decoder layers multiple times in sequence, enabling parameter-efficient deep networks while leaving the vision tower unchanged.

Features

  • Layer looping for text decoder only - Vision tower (SiglipVisionModel) is unchanged
  • 100% weight compatible with unsloth/gemma-3-4b-pt and other Gemma3 multimodal models
  • Supports generation with KV caching - Cache slots properly allocated for looped layers
  • Flexible layer sequence format - Specify which layers to loop and how many times

Usage

import torch
from transformers import AutoConfig, Gemma3ForConditionalGeneration

# Load config with layer looping
config = AutoConfig.from_pretrained('rpDungeon/gemmagain-mm', trust_remote_code=True)

# Configure layer looping: layers 0-9 once, layers 10-27 twice, layers 28-33 once
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]

# Import and create model
from modeling_gemmagain import GemmagainForConditionalGeneration

model = GemmagainForConditionalGeneration(config)

# Load weights from any Gemma3 multimodal checkpoint
orig = Gemma3ForConditionalGeneration.from_pretrained(
    'unsloth/gemma-3-4b-pt',
    torch_dtype=torch.bfloat16,
)
model.load_state_dict(orig.state_dict())
del orig

model = model.to(dtype=torch.bfloat16, device='cuda')

Layer Sequence Format

The layer_sequence config accepts a flexible format:

Format Example Meaning
Integer 5 Single layer 5
2-element list [4, 20] Layers 4-19 (end exclusive)
3-element list [10, 28, 2] Layers 10-27, repeated 2 times

Example configurations:

# Default: all 34 layers once
config.text_config.layer_sequence = [[0, 34, 1]]

# Loopstral-style: loop middle layers twice
# Physical: 34 layers, Effective: 52 layers
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]

# Loop all layers twice (2x depth, same params)
config.text_config.layer_sequence = [[0, 34, 2]]

Architecture

GemmagainForConditionalGeneration
β”œβ”€β”€ model (GemmagainModel)
β”‚   β”œβ”€β”€ vision_tower (SiglipVisionModel)     # Unchanged from Gemma3
β”‚   β”œβ”€β”€ multi_modal_projector                 # Unchanged from Gemma3
β”‚   └── language_model (GemmagainTextModel)   # Layer looping support
β”‚       β”œβ”€β”€ embed_tokens
β”‚       β”œβ”€β”€ layers[0..33]                     # Physical layers
β”‚       β”œβ”€β”€ _layer_sequence                   # Execution order with loops
β”‚       └── norm
└── lm_head

License

Apache 2.0 (same as Gemma3)

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support