Heting Mao's picture

In a Training Loop 🔄

18 1

Heting Mao

IkanRiddle

·

IkanRiddle

AI & ML interests

None yet

Recent Activity

reacted to kanaria007's post with ❤️ about 2 hours ago

✅ New Article: *Designing Ethics Overlays* (v0.1) Title: 🧩 Designing Ethics Overlays: Constraints, Appeals, and Sandboxes 🔗 https://huggingface.co/blog/kanaria007/designing-ethics-overlay --- Summary: “ETH” isn’t a content filter, and it isn’t just prompt hygiene. This article frames *ethics as runtime governance for effectful actions*: an overlay that can *allow / modify / hard-block / escalate*, while emitting a *traceable EthicsTrace* you can audit and explain. The key move is to treat safety/rights as *hard constraints or tight ε-bounds*, not a soft “ethics score” that gets traded off against convenience. > Safety / basic rights are never “weighted-summed” against speed. > They’re enforced—then you optimize inside the safe set. --- Why It Matters: • Prevents silent trade-offs (fairness/privacy/safety “lost in weights”) • Makes “Why did it say no?” answerable via *machine-grade traces + human-grade explanations* • Adds *appeals + controlled exceptions (break-glass)* so ETH doesn’t become unchallengeable authority • Enables safe policy iteration with *ETH sandboxes* (replay/shadow/counterfactual), not blind prod tuning • Gives operators real KPIs: block rate, appeal outcomes, false positives/negatives, fairness gaps, latency --- What’s Inside: • How ETH sits in the runtime loop (OBS → candidates → ETH overlay → RML) • A layered rule model: *baseline (“never”) / context (“allowed if…”) / grey (“escalate”)* • Concrete flows: appeal records, exception tokens, SLA-based review loops • ETH sandbox patterns + an evaluation loop for policy changes • Performance + failure handling (“hot path”, fail-safe) and common anti-patterns to avoid --- 📖 Structured Intelligence Engineering Series this is the *how-to-design / how-to-operate* layer for ETH overlays that survive real-world governance.

upvoted an article about 4 hours ago

Designing Ethics Overlays: Constraints, Appeals, and Sandboxes

reacted to kanaria007's post with ❤️ 2 days ago

✅ New Article: *Observations, Under-Observation, and Repair Loops* (v0.1) Title: 👁️ Observations, Under-Observation, and Repair Loops: The OBS Cookbook for SI-Core 🔗 https://huggingface.co/blog/kanaria007/observations-under-observation --- Summary: SI-Core’s rule is simple: *No effectful Jump without PARSED observations.* This article turns that slogan into an operational design: define *observation units* (sem_type/scope/status/confidence/backing_refs), detect *under-observation* (missing / degraded / biased), and run *repair loops* instead of “jumping in the dark.” Key clarification: under-observed conditions may still run *read / eval_pre / jump-sandbox*, but must not commit or publish (sandbox: `publish_result=false`, `memory_writes=disabled`). --- Why It Matters: • Prevents “we had logs, so we had context” failures: *logs ≠ observations* unless typed + contract-checked • Makes safety real: even PARSED observations should be gated by *coverage/confidence minima* (declared thresholds) • Turns OBS into something measurable: *SCover_obs + SInt* become “OBS health” and safe-mode triggers • Links semantic compression to reality: distinguish *missing raw* vs *compression loss*, and fix the right thing --- What’s Inside: • A practical observation-status taxonomy: `PARSED / DEGRADED / STUB / ESTIMATED / MISSING / REDACTED / INVALID` (+ mapping to core status) • Per-jump *observation contracts* (required sem_types, allowed statuses, age/confidence limits) + explicit fallback actions • Fallback patterns: *safe-mode / conservative default / sandbox-only / human-in-loop* • Repair loops as first-class: ledgered `obs.repair_request`, PLB proposals, governance review for contract changes • Testing OBS itself: property tests, chaos drills, golden-diff for observation streams --- 📖 Structured Intelligence Engineering Series this is the *“how to operate OBS”* layer—so the system can *know when it doesn’t know* and repair over time.

View all activity

Organizations

None yet

IkanRiddle 's datasets

None public yet