Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".
Xuan Yang
TorresYang
·
AI & ML interests
LLM reasoning, agent
Recent Activity
updated a collection 2 days ago
RUT-Bench updated a collection 3 days ago
RUT-Bench new activity 3 days ago
Miaow-Lab/RUT-Bench:Add task categories and link to paper