YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Student Simulation v2

Quick reference. See runall.sh for the full pipeline.

关键变更(v2 vs v1)

  1. 公式语义: x_new = x - (1 - α) · P · h,α=1 不变,α=0 完全压制
  2. Sweep 范围: α ∈ [0, 1](v1 范围越界导致崩溃伪迹)
  3. 方向版本: 只保留 v1_raw 和新版 v_pca_subspace(k=3 子空间)
  4. 新功能: JointResidualSteerer 防止跨维度代偿
  5. 新指标: count_real_monitoring() 区分真反思和填充词
  6. 新指标: is_collapsed() 用 4-gram 重复 + 长度比,比 v1 更稳
  7. 08b: attention 输出诊断(informational only)
  8. 10_infer: 加入 runall 作为 sanity check
  9. 删除: LLM rater (11_llm_quality_rating.py)

用法

# 单卡完整跑
bash runall.sh

# 启用 anti-leak joint steering
JOINT=1 bash runall.sh

# 只跑某些 stage
STAGES=8,8b,9,10 bash runall.sh

# 只对一个题做 inference
python scripts/10_infer.py --dim planning --alphas 1.0 0.5 0.0 \
    --problem "Find x such that x^2=49"

目录

data/
  models/                                # Qwen3-30B-A3B-Thinking-2507
  cots/                                  # raw + labeled CoTs
  routing/                               # router top-k dumps
  activations/                           # decision-point residuals
  checkpoints/
    planning_v1_raw.pt
    planning_v_pca_subspace.pt           # 新版 k=3 子空间
    monitoring_v1_raw.pt
    monitoring_v_pca_subspace.pt
  results/
    sweep_log.jsonl                      # 含 steered_text
    final_report.md
    attention_diagnostic.{json,png}      # 新
    infer_sanity_planning.json           # 新
    infer_sanity_monitoring.json         # 新
  logs/

关键 config (configs/model.py)

ALPHA_SWEEP        = [0.0, 0.1, 0.2, 0.3, 0.5, 0.75, 1.0]
DIRECTION_VERSIONS = ["v1_raw", "v_pca_subspace"]
PCA_SUBSPACE_K     = 3
ANTI_LEAK_BETA     = 0.3
GEN_CONFIG["max_new_tokens"]      = 12000  # 之前 4096 太小
GEN_CONFIG_FAST["max_new_tokens"] =  8192  # 之前 1024 太小
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support