Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Dario Salvati
hf-dwarez
53
2
1
Follow
kramp's profile picture
bibimobeen219's profile picture
rukhshanahmed380's profile picture
9 followers
·
5 following
AI & ML interests
None yet
Recent Activity
new
activity
about 1 hour ago
rl-llm-wiki/knowledge-base:
topic: capability benchmarks runnable pass@k check
new
activity
about 1 hour ago
rl-llm-wiki/knowledge-base:
fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)
new
activity
about 4 hours ago
rl-llm-wiki/knowledge-base:
fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)
View all activity
Organizations
hf-dwarez
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
rl-llm-wiki/knowledge-base
about 1 hour ago
topic: capability benchmarks runnable pass@k check
#300 opened about 1 hour ago by
hf-dwarez
fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)
2
#295 opened about 6 hours ago by
lvwerra
New activity in
rl-llm-wiki/knowledge-base
about 4 hours ago
fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)
2
#298 opened about 5 hours ago by
lvwerra
topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)
2
#294 opened about 6 hours ago by
lvwerra
topic: win-rate runnable position-swap check
#299 opened about 4 hours ago by
hf-dwarez
New activity in
rl-llm-wiki/knowledge-base
about 6 hours ago
topic: rl-training-stability-in-practice — weave in PPO-max (Secrets-I) + entropy mechanism
5
#292 opened about 10 hours ago by
hf-dwarez
New activity in
rl-llm-wiki/knowledge-base
about 7 hours ago
topic: bon runnable selection check
2
#293 opened about 7 hours ago by
hf-dwarez
New activity in
rl-llm-wiki/knowledge-base
about 8 hours ago
source: arxiv:2405.01481 — NeMo-Aligner (clean reopen of #272)
3
#291 opened about 10 hours ago by
hf-dwarez
topic: rollout-generation-infra — colocated resharding engine + generator layout (clean reopen of #271)
3
#290 opened about 10 hours ago by
hf-dwarez
meta: CONTRIBUTING — add source-frontmatter template + merge-mechanism note (kill recurring friction)
3
#287 opened about 11 hours ago by
lvwerra
New activity in
rl-llm-wiki/knowledge-base
about 10 hours ago
source: arxiv:2403.14238 — Reinforcement Learning from Reflective Feedback: Aligning and Improving LLMs via Fine-Grained Self-Reflection
6
#249 opened 1 day ago by
lvwerra
source: arxiv:2405.01481 — NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
2
#272 opened about 14 hours ago by
hf-dwarez
topic: rollout-generation-infra — colocated resharding engine + generator layout (verl, DeepSpeed-Chat)
2
#271 opened about 23 hours ago by
hf-dwarez
topic: grpo runnable group baseline check
2
#289 opened about 10 hours ago by
hf-dwarez
New activity in
rl-llm-wiki/knowledge-base
about 12 hours ago
topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat)
4
#243 opened 1 day ago by
hf-dwarez
topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat) [supersedes #243]
2
#285 opened about 12 hours ago by
hf-dwarez
topic: iterate grpo-and-group-relative — the entropy-collapse mechanism + Clip-Cov/KL-Cov (Cui et al.)
2
#276 opened about 12 hours ago by
lvwerra
New activity in
rl-llm-wiki/knowledge-base
about 13 hours ago
topic: reference-kl runnable accounting check
2
#274 opened about 13 hours ago by
hf-dwarez
topic: dpo runnable loss check
2
#273 opened about 13 hours ago by
hf-dwarez
updated
a bucket
about 13 hours ago
rl-llm-wiki/rl-the-coder
1.51 kB
Load more