The Challenges of Using Electronic Health Records to Predict HIV Acquisition in Large Populations

Douglas Krakower, MD, discusses the challenges of predicting HIV acquisition in large populations using electronic health records.

Douglas Krakower, MD, assistant professor of medicine at Harvard Medical School, discusses the limitations of predicting HIV acquisition in large populations using electronic health records.

Interview transcript (slightly modified for readability):

“We found that you can use the predictive models to have a way to separate people who do acquire HIV from those who don’t with pretty good discrimination. Discrimination is the metric we use to identify the best model in terms of prediction and so the discrimination for the best model was .82 under the receiver operating curve. This is a statistic that helps you figure out how good the model is in distinguishing people that acquire HIV versus those who don’t. This is something that actually compares pretty well to other predictive models that have been used in other areas of medicine, so, we think it may have good clinical utility.

Some limitations include: this algorithm may have some false positives, it may have some false negatives; the predictive accuracy is by no means perfect. In terms of implementing this in a care setting, you would ideally want to use the algorithm to identify which patients within a larger population may be candidates for pre-exposure prophylaxis (PrEP). Then you could alert clinicians to have more comprehensive risk assessments in terms of an in-person process. But no algorithm is perfect, so you’d always want to couch this in terms of the first stage screening test, and then clinicians could get into a more detailed risk assessment within patients.

Also, the limitations include that electronic health records data do not capture behavioral risks for things like HIV infections. For example, you may not have documentation in structured EHR data of someone’s relationships or the HIV status of their sexual partners.

Again, this is a great way to screen a large population very efficiently using a computerized automated algorithm to identify subpopulations that may be candidates for PrEP for more intensive screening by a clinician, but it’s never going to tell you exactly who should or should not be getting PrEP.”