ai4s-r2

community

AI & ML interests

None defined yet.

Recent Activity

amphora submitted a paper 1 day ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

amphora submitted a paper 3 months ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

amphora authored a paper 12 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

View all activity

ai4s-r2 's models

None public yet