gemma4-31b-ja-agent-coder

Japanese-enhanced agentic coding model — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.

Highlights

  • Agentic behavior: ReAct reasoning, multi-step tool calling, self-correction
  • Japanese coding: Code generation, review, debugging in Japanese
  • Claude Code compatible: Designed as a local subagent for Claude Code via MCP
  • Function calling: Native Ollama/OpenAI tool use format
  • Zero API cost: Runs locally on 20GB+ VRAM

Benchmark Results

Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).

Category Base (gemma4-31b-it) Fine-tuned (v2) Delta
ReAct Tool Call 10.0 10.0
Function Calling 8.0 10.0 +2.0
Multi-step ReAct 8.0 10.0 +2.0
JP Code Gen (API) 10.0 10.0
JP Code Gen (Algorithm) 10.0 10.0
JP Code Gen (Database) 9.0 10.0 +1.0
JP Debug (TypeError) 10.0 10.0
JP Debug (KeyError) 10.0 10.0
JP Code Review 8.0 10.0 +2.0
JP Git Strategy 10.0 10.0
JP Self-correction 10.0 10.0
JP Documentation 10.0 10.0
Overall 9.4 10.0 +0.6

Key Improvements

  • Function Calling: Clean <tool_call> JSON format output (base model adds extra explanation)
  • Multi-step ReAct: Structured JSON reasoning with proper Thought/Action/Observation flow
  • Code Review: Parameterized query suggestions for SQL injection fixes
  • Database CRUD: Complete Create/Read/Update/Delete coverage

Inference Test Results (v2 adapter)

Test Input Result
ReAct "Read src/main.py using read_file tool" Correct JSON with thought + action
JP Code Gen "FastAPIでヘルスチェックエンドポイントを作成" Clean Python with /healthz endpoint
JP Debug "TypeError: 'NoneType' is not subscriptable の原因と修正" Japanese explanation + fix code
Function Calling "Use read_file to read README.md" Clean <tool_call> JSON format

Training Details

Parameter Value
Base model google/gemma-4-31b-it
Method QLoRA (4-bit NF4)
LoRA rank 16
LoRA alpha 32
Target modules q/k/v/o_proj, gate/up/down_proj
Trainable params 133M / 31B (0.43%)
Training data 1,546 custom samples (v2)
Epochs 2 (3rd epoch interrupted, checkpoint-388 used)
Learning rate 1.5e-4 (cosine)
Final loss 0.98
Token accuracy 96.8%
Training time ~1.5 hours
Hardware NVIDIA RTX PRO 6000 (96GB VRAM)

Training Data Categories

Category Samples Description
ReAct Tool Calling ~120 Single/chained tool calls
Multi-step Agentic Trajectory ~100 Plan→Tool→Observe→Correct→Answer loops
Self-correction ~40 Error recovery patterns
Function Calling ~50 Ollama native tool format
Japanese Code Generation ~200 JP instruction → Python/TS code
Japanese Code Review ~100 Security, refactoring, best practices
Japanese Error Explanation ~80 Error → JP diagnosis + fix
Japanese Comprehension ~50 Reading, reasoning, summarization
Debugging & Troubleshooting ~100 Error analysis → root cause → fix
Git & CI/CD ~80 Branch strategy, PR, GitHub Actions
Project Planning ~80 Requirements → task decomposition
Technical Documentation ~80 README, API docs, specs
Algorithms & Data Structures ~200 Binary search, DP, graph, sorting
Web Frameworks ~200 FastAPI, Django, React, Next.js
Database Operations ~150 SQLAlchemy, PostgreSQL, Redis
Testing & DevOps ~150 pytest, Docker, K8s, Terraform

Use with Ollama

# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder

Use with helix-agents (Claude Code MCP)

Reduce Claude Code API token consumption by delegating routine tasks to this local model.

{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}

Use with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                          bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
                                              quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")

Note: Gemma4 uses Gemma4ClippableLinear which requires a PEFT monkey-patch. See this gist for the workaround.

License

Apache 2.0 (same as base model)

Author

tsunamayo7 — Builder of helix-agents, a local LLM delegation framework for Claude Code.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support