cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesno_reward_modeling_anthropic_hh 1B • Updated Sep 16, 2024 • 49
cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesepoch_reward_modeling_anthropic_hh 1B • Updated Sep 16, 2024 • 44
cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesno_reward_modeling_anthropic_hh 1B • Updated Sep 15, 2024 • 45
cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesepoch_reward_modeling_anthropic_hh 1B • Updated Sep 14, 2024 • 44