Identifying Variables Predictive of HIV Acquisition in Electronic Health Records

Douglas Krakower, MD, discusses which variables in electronic health records could be used to identify candidates for pre-exposure prophylaxis.

Douglas Krakower, MD, assistant professor of medicine at Harvard Medical School, discusses which variables in electronic health records could be used to identify candidates for pre-exposure prophylaxis.

Interview Transcript (modified slightly for readability):

“In terms of the variables that we identified in terms of the electronic health record that would be predictive of HIV acquisition, we initially put in a large number of variables into a machine learning model, 168 different variables; these are from a number of different categories including diagnosis codes, registration data, like demographics such as race, ethnicity, age, prescription information, and laboratory patterns in terms of testing and results.

All of these different categories provided complementary information with the machine learning methodology that we used the machine learning program has an internal variable selection procedure, so, it actually figures out which of those many variables are the ones that are most predictive and then it retains those in a final predicative model. In the final model it drew from each of those different categories that I mentioned, such as prescriptions, diagnosis codes, demographic information, to try and come up with a way to identify people who are candidates for pre-exposure prophylaxis (PrEP).”