
New Nature studies show medical AI rivaling doctors, but specialized scaffolding becomes redundant on newer models
Two new studies published in Nature demonstrate that specialized medical AI systems—TUD Dresden's MIRA and Google's AMIE—can diagnose diseases and draft treatment plans as well as or better than human physicians in simulated settings. MIRA hit an 88.9% diagnostic accuracy across 500 emergency cases, while AMIE matched 21 primary care doctors, scoring 95% plan appropriateness. However, independent experts like Oxford professor Catherine Pope caution that these simulated environments are a far cry from the complex, messy reality of actual clinical care. More importantly, supplementary data from Google's paper shows that the specialized, multi-agent scaffolding that boosted the older Gemini 1.5 Flash model became redundant when ported to the newer Gemini 2.5 Flash. As base models improve, they natively perform the structured reasoning that developers previously had to force via complex engineering, making specialized frameworks obsolete. This pattern of foundational model upgrades rendering complex engineering "dead weight" highlights a repeating cycle in AI development.
AI systems rival doctors in new Nature studies, but one result suggests the tech won't age well