Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge Paper • 2605.08518 • Published 13 days ago • 10
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments Paper • 2605.09131 • Published 12 days ago • 54
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules Paper • 2605.08614 • Published 12 days ago • 6
Sleeping RL My Env Environment Server 💿 Interact with a reinforcement environment via text actions
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published Mar 14 • 89
Sleeping RL My Env Environment Server 💿 Interact with a reinforcement environment via text actions
view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST ibm-research • Feb 18 • 19
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality ibm-research • Jan 21 • 33
ibm-granite/granite-docling-258M Image-Text-to-Text • 0.3B • Updated Sep 23, 2025 • 723k • 1.18k
Qwen/Qwen3-30B-A3B-Instruct-2507 Text Generation • 31B • Updated Sep 17, 2025 • 1.13M • • 809