NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper β’ 2512.12730 β’ Published 25 days ago β’ 43
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper β’ 2512.13168 β’ Published 24 days ago β’ 49
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper β’ 2512.12692 β’ Published 25 days ago β’ 13
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 11 items β’ Updated 16 days ago β’ 89
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 β’ 206
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper β’ 2510.05592 β’ Published Oct 7, 2025 β’ 106
WhiteRabbitNeo-V3 Collection The latest and most capable cybersecurity model we've ever created β’ 1 item β’ Updated Jun 25, 2025 β’ 16
view article Article LLMGameHub: How We Won the Gradio Agents & MCP HackathonΒ 2025 Jul 28, 2025 β’ 20
Test-Time Scaling with Reflective Generative Model Paper β’ 2507.01951 β’ Published Jul 2, 2025 β’ 107
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper β’ 2507.06920 β’ Published Jul 9, 2025 β’ 28
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper β’ 2506.21551 β’ Published Jun 26, 2025 β’ 28
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper β’ 2506.05010 β’ Published Jun 5, 2025 β’ 80
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper β’ 2506.09790 β’ Published Jun 11, 2025 β’ 53
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs Paper β’ 2506.05629 β’ Published Jun 5, 2025 β’ 37