VORTEXRAG: 7-Layer RAG — Eliminates Semantic Drift & Context Poisoning (EM 74.8, +13.6 vs Naive RAG)

#27

by vigneshwar234 - opened 2 days ago

Hi all! Sharing VORTEXRAG — a new 7-layer RAG framework specifically relevant to this model.

The problem: Cosine similarity retrieval cannot distinguish causally relevant chunks from topically similar ones. The result: semantic drift (wrong chunks in context) and context window poisoning (irrelevant chunks drowning the right one).

VORTEXRAG encodes text as a 864-dimensional tri-vector (semantic + syntactic + causal), then runs a 7-layer pipeline:

SDC filters chunks with SDS = 1−tanh(‖D‖/τ) ≥ 0.72 (causal drift gate)
CPG purges context until ESR ≥ 3.5 (provably optimal greedy algorithm)
FV post-generation faithfulness check: ΔR = 1−ROUGE-L×NLI ≤ 0.15

Results vs baselines:

System	EM	F1	Faithfulness
VORTEXRAG	74.8	82.6	0.94
Self-RAG	68.4	77.1	0.81
Naive RAG	61.2	69.4	0.71

If you use this model in a RAG pipeline, VORTEXRAG can dramatically reduce hallucinations.

📄 Paper: https://doi.org/10.5281/zenodo.20579702
💻 Code (229 tests, MIT): https://github.com/vignesh2027/VORTEXRAG
🚀 Demo: https://huggingface.co/spaces/vigneshwar234/VORTEXRAG

vigneshwar234 changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment