stair-lab/code_insights_csv
Viewer
• Updated • 3.07M • 27
• 1
stair-lab/nonmyopia_results
Updated • 102k
stair-lab/code_insights_results
Preview
• Updated • 215
Viewer
• Updated • 404 • 72
Viewer
• Updated • 21.2k • 7
stair-lab/cultural_value_understanding_wvs
Viewer
• Updated • 1k • 15
stair-lab/chatbot_arena_embedding
Viewer
• Updated • 323k • 8
Viewer
• Updated • 23.3k • 22
stair-lab/zeroshot_evaluator
Viewer
• Updated • 1M • 6
stair-lab/zero_shot_evaluator_openllm_val
Preview
• Updated • 11
stair-lab/zero_evaluator_agentic
Viewer
• Updated • 34.7k • 7
stair-lab/zero_shot_open_llm_leaderboard
Viewer
• Updated • 74.6M • 115
stair-lab/irsl_downstream_resmat1_fullinfo
Updated • 75
stair-lab/irsl_testtime_resmat1
stair-lab/irsl_downstream_resmat1_prob
Updated • 11
stair-lab/deprecated_2choice_irsl_downstream_resmat1
stair-lab/deprecated_2choice_irsl_downstream_resmat1_fullinfo
Updated • 15
Preview
• Updated • 1.13k
stair-lab/irsl_testtime_resmat2
stair-lab/irsl_downstream_resmat1_binary
Updated • 63
stair-lab/information-gathering
Preview
• Updated • 25
stair-lab/denoise_eval_query
Preview
• Updated • 435
stair-lab/deval_helm_hyperturing1
Updated • 601
stair-lab/fantastic_bugs_result
Viewer
• Updated • 405k • 15
stair-lab/platinum_detect
Viewer
• Updated • 282 • 178
stair-lab/fantastic_bugs_result_deprecated
Preview
• Updated • 98
stair-lab/monkey_query_pre
Updated • 257
stair-lab/one_question_less_samples
Viewer
• Updated • 2.34k • 8
Viewer
• Updated • 5.69M • 73
• 1