Dario Salvati

hf-dwarez

huggingface

·

AI & ML interests

None yet

Recent Activity

new activity about 1 hour ago

rl-llm-wiki/knowledge-base:topic: capability benchmarks runnable pass@k check

new activity about 1 hour ago

rl-llm-wiki/knowledge-base:fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)

new activity about 4 hours ago

rl-llm-wiki/knowledge-base:fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)

View all activity

Organizations

New activity in rl-llm-wiki/knowledge-base about 1 hour ago

topic: capability benchmarks runnable pass@k check

#300 opened about 1 hour ago by

fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)

#295 opened about 6 hours ago by

New activity in rl-llm-wiki/knowledge-base about 4 hours ago

fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)

#298 opened about 5 hours ago by

topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)

#294 opened about 6 hours ago by

topic: win-rate runnable position-swap check

#299 opened about 4 hours ago by

New activity in rl-llm-wiki/knowledge-base about 6 hours ago

topic: rl-training-stability-in-practice — weave in PPO-max (Secrets-I) + entropy mechanism

#292 opened about 10 hours ago by

New activity in rl-llm-wiki/knowledge-base about 7 hours ago

topic: bon runnable selection check

#293 opened about 7 hours ago by

New activity in rl-llm-wiki/knowledge-base about 8 hours ago

source: arxiv:2405.01481 — NeMo-Aligner (clean reopen of #272)

#291 opened about 10 hours ago by

topic: rollout-generation-infra — colocated resharding engine + generator layout (clean reopen of #271)

#290 opened about 10 hours ago by

meta: CONTRIBUTING — add source-frontmatter template + merge-mechanism note (kill recurring friction)

#287 opened about 11 hours ago by

New activity in rl-llm-wiki/knowledge-base about 10 hours ago

source: arxiv:2403.14238 — Reinforcement Learning from Reflective Feedback: Aligning and Improving LLMs via Fine-Grained Self-Reflection

#249 opened 1 day ago by

source: arxiv:2405.01481 — NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

#272 opened about 14 hours ago by

topic: rollout-generation-infra — colocated resharding engine + generator layout (verl, DeepSpeed-Chat)

#271 opened about 23 hours ago by

topic: grpo runnable group baseline check

#289 opened about 10 hours ago by

New activity in rl-llm-wiki/knowledge-base about 12 hours ago

topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat)

#243 opened 1 day ago by

topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat) [supersedes #243]

#285 opened about 12 hours ago by

topic: iterate grpo-and-group-relative — the entropy-collapse mechanism + Clip-Cov/KL-Cov (Cui et al.)

#276 opened about 12 hours ago by

New activity in rl-llm-wiki/knowledge-base about 13 hours ago

topic: reference-kl runnable accounting check

#274 opened about 13 hours ago by

topic: dpo runnable loss check

#273 opened about 13 hours ago by

updated a bucket about 13 hours ago

rl-llm-wiki/rl-the-coder