arxiv:2603.28589

Towards a Medical AI Scientist

Published on Mar 30

· Submitted by

Boyun Zheng on Mar 31

#2 Paper of the day

Upvote

Authors:

Hongtao Wu ,

Boyun Zheng ,

Dingjie Song ,

Jianfeng Gao ,

Lichao Sun ,

Yixuan Yuan

Abstract

Medical AI Scientist represents the first autonomous research framework designed for clinical applications, enabling evidence-based hypothesis generation and manuscript drafting through clinician-engineer collaboration across three research modes.

AI-generated summary

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.

View arXiv page View PDF Project page Add to collection

Community

Byzzz0301

Paper author Paper submitter 1 day ago

Hi everyone 👋

We are excited to share "Towards a Medical AI Scientist" — the first end-to-end autonomous framework tailored for clinical medical research.

While general AI Scientists have shown promise in math, chemistry, and ML, they fall short in medicine due to the need for strong clinical grounding, handling of heterogeneous medical data, and rigorous ethical compliance. Our system bridges this gap with:

Clinician-Engineer Co-Reasoning for traceable, evidence-based idea generation
Specialized experimental pipelines with medical toolboxes
Structured manuscript composition with built-in ethical review

Evaluated on Med-AI Bench (171 cases, 19 tasks, 6 modalities), our framework generates significantly higher-quality research ideas than commercial LLMs and produces executable experiments with much higher success rates. Remarkably, double-blind human expert reviews and Stanford Agentic Reviewer assessments indicate that the generated manuscripts reach quality levels close to MICCAI, outperforming ISBI and BIBM submissions. One manuscript was accepted at ICAIS 2025 after peer review.

This work demonstrates the exciting potential of AI to accelerate trustworthy scientific discovery in healthcare.

Project homepage: https://cuhk-aim-group.github.io/Med-AI-Scientist-Homepage/

librarian-bot

about 18 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

avahal

about 18 hours ago

that clinician–engineer co-reasoning hub is the most interesting hinge here, the thing that actually grounds ideas in medical evidence rather than letting the output drift into plausible but unfounded claims. to convince me it's not just a better prompt trick, i'd love to see a clean ablation where you disable the back-and-forth reasoning and compare against a fixed policy to quantify the co-reasoning's real contribution. also curious how you handle conflicting evidence across the six modalities within the loop, is there a rule to reconcile discordant signals or a learned weighting scheme? the arxivlens breakdown helped me parse the method details, it has a nice walkthrough on the co-reasoning and evidence-assembly flow: https://arxivlens.com/PaperView/Details/towards-a-medical-ai-scientist-4249-c1b05879. overall this feels like a solid step toward end-to-end medical autonomous research, but the next move should be a transparent ablation and a test on truly adversarial or edge-case clinical scenarios.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.28589

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.28589 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.28589 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.28589 in a Space README.md to link it from this page.