¡Nos trasladamos! E-Prints cerrará el 7 de junio.

En las próximas semanas vamos a migrar nuestro repositorio a una nueva plataforma con muchas funcionalidades nuevas. En esta migración las fechas clave del proceso son las siguientes:

Es muy importante que cualquier depósito se realice en E-Prints Complutense antes del 7 de junio. En caso de urgencia para realizar un depósito, se puede comunicar a docta@ucm.es.

Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers



Downloads per month over past year

Garnica Alcázar, Oscar and Gómez, Diego and Ramos, Víctor and Hidalgo González, José Ignacio and Ruiz Giardín, José Manuel (2021) Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers. EPMA Journal, 12 (3). pp. 365-381. ISSN 1878-5077

[thumbnail of Garnica2021_Article_DiagnosingHospitalBacteraemiaI.pdf]
Creative Commons Attribution.


Official URL: https://doi.org/10.1007/s13167-021-00252-3


The bacteraemia prediction is relevant because sepsis is one of the most important causes of morbidity and mortality. Bacteraemia prognosis primarily depends on a rapid diagnosis. The bacteraemia prediction would shorten up to 6 days the diagnosis, and, in conjunction with individual patient variables, should be considered to start the early administration of personalised antibiotic treatment and medical services, the election of specific diagnostic techniques and the determination of additional treatments, such as surgery, that would prevent subsequent complications. Machine learning techniques could help physicians make these informed decisions by predicting bacteraemia using the data already available in electronic hospital records.

This study presents the application of machine learning techniques to these records to predict the blood culture’s outcome, which would reduce the lag in starting a personalised antibiotic treatment and the medical costs associated with erroneous treatments due to conservative assumptions about blood culture outcomes.

Six supervised classifiers were created using three machine learning techniques, Support Vector Machine, Random Forest and K-Nearest Neighbours, on the electronic health records of hospital patients. The best approach to handle missing data was chosen and, for each machine learning technique, two classification models were created: the first uses the features known at the time of blood extraction, whereas the second uses four extra features revealed during the blood culture.

The six classifiers were trained and tested using a dataset of 4357 patients with 117 features per patient. The models obtain predictions that, for the best case, are up to a state-of-the-art accuracy of 85.9%, a sensitivity of 87.4% and an AUC of 0.93.

Our results provide cutting-edge metrics of interest in predictive medical models with values that exceed the medical practice threshold and previous results in the literature using classical modelling techniques in specific types of bacteraemia. Additionally, the consistency of results is reasserted because the three classifiers’ importance ranking shows similar features that coincide with those that physicians use in their manual heuristics. Therefore, the efficacy of these machine learning techniques confirms their viability to assist in the aims of predictive and personalised medicine once the disease presents bacteraemia-compatible symptoms and to assist in improving the healthcare economy.

Item Type:Article
Additional Information:

CRUE-CSIC (Acuerdos Transformativos 2021)

Uncontrolled Keywords:Predictive, Preventive and personalised medicine (PPPM/3PM), Machine learning, Modelling, Bacteraemia diagnosis, Bacteraemia prediction, Blood culture’s outcome prediction, Individualised electronic patient record analysis, Personalised antibiotic treatment, Support vector machine, Random forest, K-Nearest neighbours, Healthcare economy, Health policy, COVID-19
Subjects:Sciences > Computer science > Bioinformatics
Sciences > Statistics
Medical sciences > Medicine > Medical telematics
ID Code:70440
Deposited On:16 Feb 2022 15:11
Last Modified:18 Feb 2022 09:54

Origin of downloads

Repository Staff Only: item control page