MATHx Lab Evaluates LLMs for Radiology Systematic Review Screening
December 18, 2025
The MATHx Lab examined how well several large language models (LLMs), including GPT-4o and Gemini, could help screen research paper titles and abstracts for radiology systematic reviews. The researchers tested how accurately the models identified relevant studies, how confident they were in their answers, and how they responded when their decisions disagreed with human reviewers. Overall, the findings suggest that LLMs could help speed up systematic review screening, but they still need careful oversight from human reviewers.