VORTEXRAG: 7-Layer RAG — Eliminates Semantic Drift & Context Poisoning (EM 74.8, +13.6 vs Naive RAG)

#27
by vigneshwar234 - opened

Hi all! Sharing VORTEXRAG — a new 7-layer RAG framework specifically relevant to this model.

The problem: Cosine similarity retrieval cannot distinguish causally relevant chunks from topically similar ones. The result: semantic drift (wrong chunks in context) and context window poisoning (irrelevant chunks drowning the right one).

VORTEXRAG encodes text as a 864-dimensional tri-vector (semantic + syntactic + causal), then runs a 7-layer pipeline:

  • SDC filters chunks with SDS = 1−tanh(‖D‖/τ) ≥ 0.72 (causal drift gate)
  • CPG purges context until ESR ≥ 3.5 (provably optimal greedy algorithm)
  • FV post-generation faithfulness check: ΔR = 1−ROUGE-L×NLI ≤ 0.15

Results vs baselines:

System EM F1 Faithfulness
VORTEXRAG 74.8 82.6 0.94
Self-RAG 68.4 77.1 0.81
Naive RAG 61.2 69.4 0.71

If you use this model in a RAG pipeline, VORTEXRAG can dramatically reduce hallucinations.

📄 Paper: https://doi.org/10.5281/zenodo.20579702
💻 Code (229 tests, MIT): https://github.com/vignesh2027/VORTEXRAG
🚀 Demo: https://huggingface.co/spaces/vigneshwar234/VORTEXRAG

vigneshwar234 changed discussion status to closed

Sign up or log in to comment