gemma4-31b-ja-agent-coder
Japanese-enhanced agentic coding model — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.
Highlights
- Agentic behavior: ReAct reasoning, multi-step tool calling, self-correction
- Japanese coding: Code generation, review, debugging in Japanese
- Claude Code compatible: Designed as a local subagent for Claude Code via MCP
- Function calling: Native Ollama/OpenAI tool use format
- Zero API cost: Runs locally on 20GB+ VRAM
Benchmark Results
Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).
| Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
|---|---|---|---|
| ReAct Tool Call | 10.0 | 10.0 | — |
| Function Calling | 8.0 | 10.0 | +2.0 |
| Multi-step ReAct | 8.0 | 10.0 | +2.0 |
| JP Code Gen (API) | 10.0 | 10.0 | — |
| JP Code Gen (Algorithm) | 10.0 | 10.0 | — |
| JP Code Gen (Database) | 9.0 | 10.0 | +1.0 |
| JP Debug (TypeError) | 10.0 | 10.0 | — |
| JP Debug (KeyError) | 10.0 | 10.0 | — |
| JP Code Review | 8.0 | 10.0 | +2.0 |
| JP Git Strategy | 10.0 | 10.0 | — |
| JP Self-correction | 10.0 | 10.0 | — |
| JP Documentation | 10.0 | 10.0 | — |
| Overall | 9.4 | 10.0 | +0.6 |
Key Improvements
- Function Calling: Clean
<tool_call>JSON format output (base model adds extra explanation) - Multi-step ReAct: Structured JSON reasoning with proper Thought/Action/Observation flow
- Code Review: Parameterized query suggestions for SQL injection fixes
- Database CRUD: Complete Create/Read/Update/Delete coverage
Inference Test Results (v2 adapter)
| Test | Input | Result |
|---|---|---|
| ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
| JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with /healthz endpoint |
| JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
| Function Calling | "Use read_file to read README.md" | Clean <tool_call> JSON format |
Training Details
| Parameter | Value |
|---|---|
| Base model | google/gemma-4-31b-it |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | 133M / 31B (0.43%) |
| Training data | 1,546 custom samples (v2) |
| Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
| Learning rate | 1.5e-4 (cosine) |
| Final loss | 0.98 |
| Token accuracy | 96.8% |
| Training time | ~1.5 hours |
| Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |
Training Data Categories
| Category | Samples | Description |
|---|---|---|
| ReAct Tool Calling | ~120 | Single/chained tool calls |
| Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops |
| Self-correction | ~40 | Error recovery patterns |
| Function Calling | ~50 | Ollama native tool format |
| Japanese Code Generation | ~200 | JP instruction → Python/TS code |
| Japanese Code Review | ~100 | Security, refactoring, best practices |
| Japanese Error Explanation | ~80 | Error → JP diagnosis + fix |
| Japanese Comprehension | ~50 | Reading, reasoning, summarization |
| Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix |
| Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions |
| Project Planning | ~80 | Requirements → task decomposition |
| Technical Documentation | ~80 | README, API docs, specs |
| Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting |
| Web Frameworks | ~200 | FastAPI, Django, React, Next.js |
| Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis |
| Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform |
Use with Ollama
# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder
Use with helix-agents (Claude Code MCP)
Reduce Claude Code API token consumption by delegating routine tasks to this local model.
{
"mcpServers": {
"helix-agents": {
"command": "uv",
"args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
}
}
}
Use with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
Note: Gemma4 uses
Gemma4ClippableLinearwhich requires a PEFT monkey-patch. See this gist for the workaround.
License
Apache 2.0 (same as base model)
Author
tsunamayo7 — Builder of helix-agents, a local LLM delegation framework for Claude Code.