Scale AI

company

Verified

https://scale.com/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

X1ingang updated a dataset 9 days ago

ScaleAI/EnigmaEval

anisha2102 submitted a paper about 1 month ago

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

feyzaakyurek published a dataset about 1 month ago

ScaleAI/DrugDiscoveryBench-Preview

View all activity

Papers

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

View all Papers

updated a dataset 9 days ago

ScaleAI/EnigmaEval

Viewer • Updated 9 days ago • 1.16k • 87 • 3

submitted a paper to Daily Papers about 1 month ago

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

Paper • 2606.30573 • Published Jun 29 • 8

published 2 datasets about 1 month ago

ScaleAI/DrugDiscoveryBench-Preview

Viewer • Updated Jun 26 • 82 • 116

ScaleAI/DrugDiscoveryBench

Viewer • Updated Jun 26 • 82 • 155

updated 2 datasets about 1 month ago

ScaleAI/DrugDiscoveryBench

Viewer • Updated Jun 26 • 82 • 155

ScaleAI/DrugDiscoveryBench-Preview

Viewer • Updated Jun 26 • 82 • 116

in ScaleAI/PRBench about 1 month ago

Update to revised rubrics (data_v2): 18,692 criteria

#2 opened about 1 month ago by

updated a dataset about 2 months ago

ScaleAI/ndm_bench

Viewer • Updated Jun 17 • 955 • 42

published a dataset about 2 months ago

ScaleAI/ndm_bench

Viewer • Updated Jun 17 • 955 • 42

submitted a paper to Daily Papers 2 months ago

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Paper • 2605.20164 • Published May 19 • 6

authored 10 papers 2 months ago

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Paper • 2405.15683 • Published May 24, 2024

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Paper • 2406.11768 • Published Jun 17, 2024 • 24

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published Oct 24, 2024 • 24

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Paper • 2310.08753 • Published Oct 12, 2023

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Paper • 2510.12712 • Published Oct 14, 2025

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction

Paper • 2512.14865 • Published Dec 16, 2025 • 2

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

Paper • 2604.10718 • Published Apr 12 • 4

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models

Paper • 2603.29263 • Published Mar 31

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

Paper • 2406.04286 • Published Jun 6, 2024

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Paper • 2605.20164 • Published May 19 • 6