DuoNeural ML/AI Engineer 7B

A LoRA SFT of Qwen2.5-7B-Instruct, fine-tuned to be a direct, opinionated pairing partner for ML/AI engineering work: debugging training runs, reasoning about architecture choices, catching common infra mistakes, and designing evals. Built as a local red-team sounding board for our own lab's work — sanity-checking experiment design, spotting confounds, reviewing training scripts before launch.

Trained on DuoNeural/ml-ai-engineer-sft, a synthetic dataset spanning 48 topics (PyTorch debugging, distributed training, RLHF/GRPO, quantization, architecture critique, MLOps, infra) at three difficulty tiers and five response styles.

What changed vs. the base model

The base Qwen2.5-7B-Instruct is fluent and thorough, but tends toward exhaustive, hedging checklists that enumerate every possibility without committing to the most likely root cause. The SFT model leads with a diagnosis first, then the fix — closer to how an experienced engineer actually responds to a bug report.

Example — Mixture-of-Experts failure mode ("8 experts, top-2 routing, no auxiliary load-balancing loss, training on a 2B token corpus. What's the most likely failure mode?")

Base: "Several potential failure modes can arise... 1. Overfitting... 2. Routing Imbalance... 3. Expert Overfitting... 4. Communication and Load Balancing... 5. Resource Constraints... 6. Training Convergence Issues... 7. Data Quality... 8. Scaling Challenges" — eight generic possibilities, no commitment to which one is most likely.

DuoNeural ML/AI Engineer: "The most likely failure mode is expert collapse or imbalanced routing. Without an auxiliary load-balancing loss, your router will quickly learn to send almost all tokens to only a few 'easy' experts, leaving the others underutilized and effectively dead." — leads with the specific, correct answer the question was actually asking for.

Example — quantization format selection ("GGUF Q4_K_M, Q5_K_M, Q8_0, and AWQ — what's actually the deciding factor?")

Base: confuses AWQ for an 8-bit GGUF-adjacent scheme, never settles on a single deciding factor, and the response is cut off mid-enumeration.

DuoNeural ML/AI Engineer: "Q4_K_M is the smallest and lowest quality, Q8_0 is the largest and highest quality of the GGUF options... pick the largest format that still fits." — correct relative ordering, a concrete decision rule.

Known limitations

This is a small-scale LoRA SFT (~1.6k examples, 3 epochs) on 100%-synthetic training data. It is decisive and direct, which is usually a strength, but that same trait means it can be confidently wrong on niche specifics — treat it as a second opinion / brainstorming partner, not an authority. In our internal eval (12 hand-written questions, scored by hand), it passed 8/12 cleanly, with partial misses on a couple of diagnostic framing questions. Verify anything safety- or cost-critical independently.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DuoNeural/ml-ai-engineer-7b", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/ml-ai-engineer-7b")

messages = [{"role": "user", "content": "My loss goes to NaN at step ~340 only when I increase batch size. What's the first thing you'd check?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

GGUF quantizations (Q4_K_M / Q5_K_M / Q8_0 / f16) for llama.cpp are at DuoNeural/ml-ai-engineer-7b-GGUF.

About DuoNeural

DuoNeural is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.

Research Publications

We've published 26+ open-access papers covering:

The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
RLHF truth suppression mechanisms and behavioral routing in large language models
Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
CTM world models, temporal self-prediction, and sequence architecture comparisons
Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation

📄 Full paper catalog: zenodo.org/communities/duoneural

Research Team

Member	Role
Jesse Caldwell	Founder, vision, hardware, direction
Archon	Lab Director — experiments, post-training, abliteration, quantum circuits
Aura	Research AI — literature synthesis, red-teaming, novel proposals
Synapse (Syn)	Always-on research agent, signal monitoring
Kestrel	Systems, infrastructure, web

Links

Platform	Link
🤗 HuggingFace	huggingface.co/DuoNeural
🌐 Website	duoneural.com
📚 Zenodo Community	zenodo.org/communities/duoneural
💻 GitHub	github.com/DuoNeural
🐦 X / Twitter	@DuoNeural
📧 Email	duoneural@proton.me
📰 Newsletter	duoneural.beehiiv.com
☕ Support	buymeacoffee.com/duoneural

All research published open access, CC BY 4.0. If this model was useful to your work, consider citing the relevant DuoNeural paper from our Zenodo community.