@RiverRider Just checked out the repo. The semiotic divergence angle is kinda interesting ngl, tracking where meaning splits across communities rather than pure confidence calibration. How are you seeing it perform on domain-specific jargon vs everyday terms?
๐ค Open to Collab
Aliasgar Khimani
NovusEdge
AI & ML interests
None yet
Recent Activity
repliedto ginigen-ai's post about 1 hour ago
๐ง Does your LLM know when it's about to be wrong?
Most leaderboards measure accuracy. We measure metacognition โ whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. ๐
The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 โ ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.
Two independent axes (never compared across a row): โ trap_rate โ does it fall for tempting trap options? (lower = stronger) โก adapter gain ฮ โ how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)
What's open: ๐ 300+100 trap problems (each with a hidden trap + TICOS type) ๐ 24-model leaderboard ๐งฉ 11 per-model adapters โ adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state โ P(wrong))
Submit any HF model โ auto-scored daily at 09:00 KST and added to the board.
๐ Leaderboard โ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space
๐ Benchmark โ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench
๐งฉ Adapters โ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961
๐ Article โ https://huggingface.co/blog/ginigen-ai/metacognition
Benchmark by ginigen-ai ยท Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech). reacted to ginigen-ai's post with ๐ฅ 1 day ago
๐ง Does your LLM know when it's about to be wrong?
Most leaderboards measure accuracy. We measure metacognition โ whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. ๐
The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 โ ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.
Two independent axes (never compared across a row): โ trap_rate โ does it fall for tempting trap options? (lower = stronger) โก adapter gain ฮ โ how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)
What's open: ๐ 300+100 trap problems (each with a hidden trap + TICOS type) ๐ 24-model leaderboard ๐งฉ 11 per-model adapters โ adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state โ P(wrong))
Submit any HF model โ auto-scored daily at 09:00 KST and added to the board.
๐ Leaderboard โ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space
๐ Benchmark โ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench
๐งฉ Adapters โ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961
๐ Article โ https://huggingface.co/blog/ginigen-ai/metacognition
Benchmark by ginigen-ai ยท Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech). repliedto ginigen-ai's post 1 day ago
๐ง Does your LLM know when it's about to be wrong?
Most leaderboards measure accuracy. We measure metacognition โ whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. ๐
The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 โ ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.
Two independent axes (never compared across a row): โ trap_rate โ does it fall for tempting trap options? (lower = stronger) โก adapter gain ฮ โ how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)
What's open: ๐ 300+100 trap problems (each with a hidden trap + TICOS type) ๐ 24-model leaderboard ๐งฉ 11 per-model adapters โ adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state โ P(wrong))
Submit any HF model โ auto-scored daily at 09:00 KST and added to the board.
๐ Leaderboard โ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space
๐ Benchmark โ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench
๐งฉ Adapters โ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961
๐ Article โ https://huggingface.co/blog/ginigen-ai/metacognition
Benchmark by ginigen-ai ยท Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).