Can AI Solve the Skills Shortage in Antimicrobial Stewardship?

News
Article
ContagionContagion, Summer 2025 Digital Edition
Volume 10
Issue 2

A recent literature review describes the current state of artificial intelligence tools for guiding antimicrobial therapy. Are you ready for change?

The integration of artificial intelligence (AI) into health care has generated both enthusiasm and skepticism. Specifically, AI holds great promise for revolutionizing antimicrobial stewardship, offering potential benefits but also presenting challenges. On the one hand, AI algorithms can guide clinicians in determining the likelihood of infections. On the other hand, AI models depend on large data sets, which can be inaccurate and contain harmful biases that impact AI’s behavior and outputs and potentially exacerbate disparities in infection-related outcomes.1 Regardless, AI’s application extends far beyond health care and is continuously advancing. And the world needs a solution to the crisis of antimicrobial stewardship expertise and leadership as many exit the health care workforce or pursue infectious disease–related training. Can AI tools aid in alleviating this skills shortage?

AI’s Role in Antimicrobial Therapy

A 2025 systematic review by AlGain et al evaluated 23 studies to determine whether AI tools, including machine learning (ML) and large language models (LLMs), can reliably guide antibiotic prescribing. The findings revealed critical insights into where AI excels, where it falls short, and what must be addressed before these tools become clinical mainstays. The review focused on 2 AI approaches: (1) ML-based clinical decision support systems (CDSS) for predicting resistance, optimizing prescriptions, and advancing stewardship, and (2) LLMs (eg, ChatGPT) for generating treatment recommendations. The studies were assessed using the PICO (population, intervention, comparison, and outcome) framework, focusing on patients with infectious diseases (P), interventions using AI-based management (I), comparisons with standard management provided by usual care providers (C), and primary outcomes (O) including the accuracy, efficacy, and limitations of AI in antimicrobial management. Characteristics of the included studies are shown in the TABLE.2

Key Findings

Of the 5 studies evaluating the application to augment antimicrobial stewardship efforts, all demonstrated promising outcomes.2 ML-based CDSS doesn’t require an active application service provider to have significant benefit, as evidenced in the 2 included INSPIRE trials. In these randomized controlled studies, optimizing antimicrobial selection at the point of order entry reduced extended-spectrum antibiotic days by 28.4% for pneumonia and 17.4% for urinary infections, with similar reductions for vancomycin and antipseudomonal use. No significant differences in safety outcomes or transfers to the intensive care unit were observed in either trial. ML showed clinical utility in antimicrobial- resistance prediction, prescription optimization, and stewardship impact.

The INSPIRE 3 and 4 results became available after publication of AlGain et al’s review article, further adding to ML antimicrobial stewardship literature but for skin and soft tissue infections (INSPIRE 3; NCT05423756) and abdominal infections (INSPIRE 4; NCT05423743).3,4 LLMs, on the other hand, were shown to lag in reliability. Five studies evaluated the accuracy of various LLMs in the management of infectious diseases. In one of these studies, ChatGPT (model GPT-4, OpenAI) was prompted with true-false questions about a variety of disease states, such as bacteremia, meningitis, and endocarditis, demonstrating a 70% accuracy rate. In a study evaluating ChatGPT’s (model GPT-4) responses for managing bloodstream infections, the chatbot provided appropriate and optimal suggestions in approximately 35% of cases. Although LLMs in these studies performed acceptably in simple recall questions, they struggled to answer questions of increasing complexity, with high error rates observed in complex clinical cases (especially evident in LLMs).

Critical Gaps in the Evidence Base

ChatGPT dominates LLM research, with other popular chatbots, including Gemini (formerly Bard), Perplexity AI, and OpenEvidence, evaluated less or not at all in the included studies, creating a significant blind spot because performance may vary across models due to differences in training data and architecture. Further, only 1 of 6 LLM studies systematically tested how prompt design, often referred to as prompt engineering, influenced output quality. Without standardization, LLMs risk generating inconsistent or unsafe recommendations in real-world use.

It cannot be stressed enough that the world of AI is evolving faster than our ability to assess its application through research, peer review, and publication. For example, ChatGPT rapidly advanced beyond GPT-4 to GPT 4.5 and GPT-4o (the o represents omni),5 but none of the evaluated studies assessed the ability of OpenAI’s GPT models that use features such as “reasoning” (ie, the model solves problems that require multistep thinking, logical connections, or intermediate steps, rather than simply retrieving or generating factual information in a single step).

Where Does This Leave Us?

The review underscores AI’s dual reality in antimicrobial therapy: ML-based CDSS are poised to enhance stewardship programs, particularly in resource-limited settings lacking infectious disease specialists, but LLMs remain unreliable for clinical decision-making. For safe integration, institutions should prioritize ML-based CDSS for resistance prediction and stewardship, and LLMs, if used for preliminary recommendations, should be rigorously validated, use structured prompts, and be subject to human oversight.

This article was written with assistance from Perplexity AI.

References
1.Marra AR, Langford BJ, Nori P, Bearman G. Revolutionizing antimicrobial stewardship, infection prevention, and public health with artificial intelligence: the middle path. Antimicrob Steward Healthc Epidemiol. 2023;3(1):e219. doi:10.1017/ash.2023.494
2. AlGain S, Marra AR, Kobayashi T, et al. Can we rely on artificial intelligence to guide antimicrobial therapy? a systematic literature review. Antimicrob Steward Healthc Epidemiol. 2025;5(1):e90. doi:10.1017/ash.2025.47
3. Gohil SK, Septimus E, Kleinman K, et al. Improving empiric antibiotic selection for patients hospitalized with skin and soft tissue infection: the INSPIRE 3 skin and soft tissue randomized clinical trial. JAMA Intern Med. Published online April 10, 2025. doi:10.1001/jamainternmed.2025.0887
4.Gohil SK, Septimus E, Kleinman K, et al. Improving empiric antibiotic selection for patients hospitalized with abdominal infection: the INSPIRE 4 cluster randomized clinical trial. JAMA Surg. Published online April 10, 2025. doi:10.1001/jamasurg.2025.1108
5.Craig L. GPT-4o vs. GPT-4: how do they compare? TechTarget. February 3, 2025. Accessed April 25, 2025. https://www.techtarget.com/searchenterpriseai/feature/GPT-4o-vs-GPT-4-How-do-they-compare

Newsletter

Stay ahead of emerging infectious disease threats with expert insights and breaking research. Subscribe now to get updates delivered straight to your inbox.

Recent Videos
© 2025 MJH Life Sciences

All rights reserved.