Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
3
Choi
yunhowhour
Follow
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 14 hours ago
KL for a KL: On-Policy Distillation with Control Variate Baseline
upvoted
a
paper
about 14 hours ago
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
updated
a model
5 days ago
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_ds_samplelevelmean_step_110
View all activity
Organizations
None yet
models
11
Sort: Recently updated
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_ds_samplelevelmean_step_110
4B
•
Updated
5 days ago
•
75
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_ds_samplelevelmean_step_90
4B
•
Updated
5 days ago
•
57
yunhowhour/Distill-1.5B_GRESO_batch_512_step_120
2B
•
Updated
5 days ago
•
50
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_w_o_global_norm_step_90
4B
•
Updated
5 days ago
•
44
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_w_o_global_norm_step_80
4B
•
Updated
5 days ago
•
57
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_w_o_global_norm_step_70
4B
•
Updated
5 days ago
•
11
yunhowhour/Qwen3-4B_CRRL_batch_1024_B200_w_o_global_norm_step_60
4B
•
Updated
5 days ago
•
157
yunhowhour/CRRL_distill_1.5B_w_o_globalnorm_step_120
2B
•
Updated
8 days ago
•
90
yunhowhour/CRRL_distill_1.5B_GRESO_step_90
2B
•
Updated
8 days ago
•
70
yunhowhour/CRRL_batch_1024_step_50
4B
•
Updated
15 days ago
•
99
View 11 models
datasets
0
None public yet