gemma4-31b-ja-agent-coder

Japanese-enhanced agentic coding model — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.

Highlights

Agentic behavior: ReAct reasoning, multi-step tool calling, self-correction
Japanese coding: Code generation, review, debugging in Japanese
Claude Code compatible: Designed as a local subagent for Claude Code via MCP
Function calling: Native Ollama/OpenAI tool use format
Zero API cost: Runs locally on 20GB+ VRAM

Benchmark Results

Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).

Category	Base (gemma4-31b-it)	Fine-tuned (v2)	Delta
ReAct Tool Call	10.0	10.0	—
Function Calling	8.0	10.0	+2.0
Multi-step ReAct	8.0	10.0	+2.0
JP Code Gen (API)	10.0	10.0	—
JP Code Gen (Algorithm)	10.0	10.0	—
JP Code Gen (Database)	9.0	10.0	+1.0
JP Debug (TypeError)	10.0	10.0	—
JP Debug (KeyError)	10.0	10.0	—
JP Code Review	8.0	10.0	+2.0
JP Git Strategy	10.0	10.0	—
JP Self-correction	10.0	10.0	—
JP Documentation	10.0	10.0	—
Overall	9.4	10.0	+0.6

Key Improvements

Function Calling: Clean <tool_call> JSON format output (base model adds extra explanation)
Multi-step ReAct: Structured JSON reasoning with proper Thought/Action/Observation flow
Code Review: Parameterized query suggestions for SQL injection fixes
Database CRUD: Complete Create/Read/Update/Delete coverage

Inference Test Results (v2 adapter)

Test	Input	Result
ReAct	"Read src/main.py using read_file tool"	Correct JSON with thought + action
JP Code Gen	"FastAPIでヘルスチェックエンドポイントを作成"	Clean Python with `/healthz` endpoint
JP Debug	"TypeError: 'NoneType' is not subscriptable の原因と修正"	Japanese explanation + fix code
Function Calling	"Use read_file to read README.md"	Clean `<tool_call>` JSON format

Training Details

Parameter	Value
Base model	google/gemma-4-31b-it
Method	QLoRA (4-bit NF4)
LoRA rank	16
LoRA alpha	32
Target modules	q/k/v/o_proj, gate/up/down_proj
Trainable params	133M / 31B (0.43%)
Training data	1,546 custom samples (v2)
Epochs	2 (3rd epoch interrupted, checkpoint-388 used)
Learning rate	1.5e-4 (cosine)
Final loss	0.98
Token accuracy	96.8%
Training time	~1.5 hours
Hardware	NVIDIA RTX PRO 6000 (96GB VRAM)

Training Data Categories

Category	Samples	Description
ReAct Tool Calling	~120	Single/chained tool calls
Multi-step Agentic Trajectory	~100	Plan→Tool→Observe→Correct→Answer loops
Self-correction	~40	Error recovery patterns
Function Calling	~50	Ollama native tool format
Japanese Code Generation	~200	JP instruction → Python/TS code
Japanese Code Review	~100	Security, refactoring, best practices
Japanese Error Explanation	~80	Error → JP diagnosis + fix
Japanese Comprehension	~50	Reading, reasoning, summarization
Debugging & Troubleshooting	~100	Error analysis → root cause → fix
Git & CI/CD	~80	Branch strategy, PR, GitHub Actions
Project Planning	~80	Requirements → task decomposition
Technical Documentation	~80	README, API docs, specs
Algorithms & Data Structures	~200	Binary search, DP, graph, sorting
Web Frameworks	~200	FastAPI, Django, React, Next.js
Database Operations	~150	SQLAlchemy, PostgreSQL, Redis
Testing & DevOps	~150	pytest, Docker, K8s, Terraform

Use with Ollama

# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder

Use with helix-agents (Claude Code MCP)

Reduce Claude Code API token consumption by delegating routine tasks to this local model.

{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}

Use with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                          bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
                                              quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")

Note: Gemma4 uses Gemma4ClippableLinear which requires a PEFT monkey-patch. See this gist for the workaround.

License

Apache 2.0 (same as base model)

Author

tsunamayo7 — Builder of helix-agents, a local LLM delegation framework for Claude Code.

Downloads last month: -; Downloads are not tracked for this model. How to track