🧲 TRIGNUM-300M

The Pre-Flight Check for Autonomous AI

License: MIT Python 3.9+ Benchmarked DOI

"You wouldn't let a plane take off without a pre-flight check.
Why are we letting AI agents act without one?"

TRIGNUM-300M Architecture Flowchart

What Is This?

TRIGNUM-300M is a zero-model reasoning integrity validator for LLM outputs. It catches structural logic failures β€” contradictions, circular reasoning, non-sequiturs β€” before an AI agent acts on them.

from trignum_core.subtractive_filter import SubtractiveFilter

sf = SubtractiveFilter()
result = sf.apply(agent_output)

if result.illogics_found:
    agent.halt(reason=result.illogics_found)
    # T-CHIP glows RED πŸ”΄ β†’ Human review required
else:
    agent.execute()
    # T-CHIP glows BLUE πŸ”΅ β†’ Cleared for takeoff

No LLM. No API. No training data. ~300 lines of Python. <1ms.


πŸ”¬ Benchmark Results

We expanded our evaluation to 58,000+ real LLM outputs including a new 517-sample curated dataset for structural reasoning. Honest results:

Benchmark Samples Precision Recall F1 Speed
Structural illogic (curated) 517 100% 98.9% 99.5% <1ms
HaluEval (full dataset) 58,293 60% 2.1% 4.0% 706ms

What this means:

  • 99.5% F1 on structural reasoning failures β€” contradictions, circular logic, unsupported conclusions
  • 4.0% F1 on factual hallucinations β€” we don't catch wrong facts

That's the point. There are 100 tools for fact-checking. There are zero tools for reasoning-checking. Until now.

Per-Task Breakdown (HaluEval)

Task n Precision Recall F1
QA 18,316 83.3% 0.25% 0.50%
Dialogue 19,977 60.1% 4.38% 8.16%
Summarization 20,000 57.4% 1.60% 3.11%

Throughput: 146,866 samples/second β€” orders of magnitude faster than LLM-based validation.


✈️ The Pre-Flight Check Analogy

A pre-flight checklist doesn't verify that London exists. It verifies that:

  • βœ… Instruments don't contradict each other
  • βœ… There are no circular faults (sensor A confirms B confirms A)
  • βœ… The flight computer draws conclusions from actual data
  • βœ… Systems are logically consistent

The Subtractive Filter does the same for AI reasoning:

LLM Output β†’ Subtractive Filter β†’ [PASS] πŸ”΅ β†’ Agent Executes
                                 β†’ [FAIL] πŸ”΄ β†’ Agent Halts β†’ Human Review

πŸ€– The Missing "Agentic Validator"

In the context of the recent shift towards Agentic Reasoning, autonomous LLMs are moving from static prompts to dynamic thought-action loops involving planning, tool-use, and multi-agent collaboration.

Current systems rely heavily on probabilistic models to act as the "Critic/Evaluator" or use "Validator-Driven Feedback" via unit tests for code or simulators for robotics. But there has been no validator for pure logic. If an agent hallucinates a non-sequitur or circular justification during its internal planning phase, the error cascades.

TRIGNUM-300M fills this exact gap. It acts as a deterministic, <1ms Validator-Driven Feedback gate. It halts execution if the agent's internal thought (zt) contains a structural illogic, providing an immediate failure signal (rt = 0) before the agent commits to an irreversible external action (at).


πŸ”Ί Core Architecture

The Trignum Pyramid

Three faces acting as magnetic poles for data separation:

Face Role What It Does
Ξ± (Logic) Truth detection Identifies structurally sound reasoning
Ξ² (Illogic) Error detection Catches contradictions, circular logic, non-sequiturs
Ξ³ (Context) Human grounding Anchors output to human intent

T-CHIP: The Tensor Character

╔═══════════════════════════════════════════════════════╗
β•‘  T-CHIP [v.300M]                                      β•‘
β•‘                                                       β•‘
β•‘  πŸ”΅ Blue  = Logic Stable (Cleared for Takeoff)        β•‘
β•‘  πŸ”΄ Red   = Illogic Detected (THE FREEZE)             β•‘
β•‘  🟑 Gold  = Human Pulse Locked (Sovereign Override)   β•‘
β•‘                                                       β•‘
β•‘  Response time: <1ms | False alarms: 0% (structural)  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

The Subtractive Filter

Four detection layers, all pattern-based:

Layer Catches Method
Contradiction "X is always true. X is never true." Antonym pairs, negation patterns
Circular Logic A proves B proves A Reference chain analysis
Non-Sequitur "Therefore X" without premises Causal connective analysis
Depth Check Claims without any reasoning Assertion density scoring

πŸ“¦ Repository Structure

TRIGNUM-300M-TCHIP/
β”œβ”€β”€ src/
β”‚   └── trignum_core/              # Core Python library
β”‚       β”œβ”€β”€ pyramid.py             # Trignum Pyramid (3 magnetic faces)
β”‚       β”œβ”€β”€ tchip.py               # T-CHIP (glow states)
β”‚       β”œβ”€β”€ subtractive_filter.py  # β˜… The Subtractive Filter
β”‚       β”œβ”€β”€ human_pulse.py         # Human sovereignty layer
β”‚       └── magnetic_trillage.py   # Data separation
β”œβ”€β”€ tests/                         # 34 unit tests (all passing)
β”œβ”€β”€ benchmarks/
β”‚   β”œβ”€β”€ hallucination_benchmark.py     # Curated structural test
β”‚   β”œβ”€β”€ full_halueval_benchmark.py     # Full 58K HaluEval test
β”‚   β”œβ”€β”€ results.json                   # Structural benchmark results
β”‚   └── full_halueval_results.json     # Full HaluEval results
β”œβ”€β”€ demo/
β”‚   └── index.html                 # Three.js 3D interactive demo
β”œβ”€β”€ paper/
β”‚   └── TRIGNUM_300M_Position_Paper.md  # Position paper
β”œβ”€β”€ docs/
β”‚   └── theory/                    # 6 foundational theory documents
β”œβ”€β”€ T-CHIP CLEARED FOR TAKEOFF.md  # The pitch
└── ROADMAP.md                     # 2-quarter development plan

πŸš€ Quick Start

# Clone
git clone https://github.com/trace-on-lab/trignum-300m.git
cd trignum-300m

# Install
pip install -r requirements.txt
pip install -e .

# Run the structural benchmark
python benchmarks/hallucination_benchmark.py

# Run the full HaluEval benchmark (downloads ~13MB of data)
python benchmarks/full_halueval_benchmark.py

# Run tests
pytest tests/ -v

🌐 Prior Art: Nobody Is Doing This

We searched arXiv, ResearchGate, ACL Anthology, and Semantic Scholar. Every existing reasoning validation system requires model inference:

System Requires Model Validates Reasoning
VerifyLLM (2025) βœ… Yes Partially
ContraGen βœ… Yes Partially
Process Supervision (OpenAI) βœ… Yes Yes
Guardrails AI βœ… Configurable No (content)
Subtractive Filter ❌ No βœ… Yes

Existing work uses LLMs to check LLMs. TRIGNUM uses logic to check LLMs.

Read the full analysis in our position paper.


βš›οΈ Quantum Integration: TQPE

DOI

TRIGNUM-300M serves as Phase 1 ("Technical A Priori Validation") for Trignumental Quantum Phase Estimation (TQPE).

In our groundbreaking case study estimating the ground state energy of the Hβ‚‚ molecule, TRIGNUM successfully validated the physical consistency and structural logic of the quantum circuit before execution. By acting as the preliminary gatekeeper, TRIGNUM ensured that no quantum resources were wasted on structurally ill-formed configurations, enabling an epistemic confidence score of 82.8% on the final estimate (-1.1384 Ha).

Read the full BUILDING THE BRIDGE paper on Trignumentality and TQPE in the foundational Trignumentality repository.


πŸ“š Documentation

Document Description
Core Postulate The fundamental axioms of Trignum
Three Faces Ξ± (Logic), Ξ² (Illogic), Ξ³ (Context)
Magnetic Trillage Data separation mechanism
T-CHIP Spec The Tensor Character in detail
Cold State Hardware Hardware implications
Hallucination Paradox Reframing the "Big Monster"
Position Paper Full academic paper with benchmarks
Roadmap 2-quarter development plan

πŸ’Ž The Golden Gems

Gem Wisdom
GEM 1 "The Human Pulse is the Master Clock"
GEM 2 "The Illogic is the Compass"
GEM 3 "Magnetic Trillage Over Brute Force"
GEM 4 "The Hallucination is the Raw Material"
GEM 5 "T-CHIP is the Mirror"

🀝 Contributing

See CONTRIBUTING.md for guidelines.


πŸ“„ License

MIT License β€” see LICENSE.


πŸ“ž Contact

TRACE ON LAB
πŸ“§ traceonlab@proton.me


πŸ›‘οΈ The Call

"The most dangerous AI failure is not a wrong fact. It is reasoning that sounds right but isn't."

╔═══════════════════════════════════════════════════════╗
β•‘  🧲 TRACE ON LAB β€” TRIGNUM-300M β€” v.300M              β•‘
β•‘                                                       β•‘
β•‘  The Pre-Flight Check for Autonomous AI.              β•‘
β•‘  Zero models. Zero API calls. 146,866 samples/second. β•‘
β•‘                                                       β•‘
β•‘  πŸ”΅ T-CHIP: CLEARED FOR TAKEOFF.                      β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

⭐ Star this repo if you believe AI should check its logic before it acts.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support