Contact: Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “G. Caporale” brucellosis2022.izs.it brucellosis2022@izs.it
O6-3 Machine Learning for MALDI-TOF MS identification of Brucella

Keywords

Brucella
Machine Learning
MALDI-TOF MS

Categories

Abstract

MALDI-TOF mass spectrometry (MS) is a fast and reliable method for bacterial identification widely used. Most common databases used for this purpose lack reference profiles for Brucella species, hampering the correct species identification and reliable sub-species characterization. Here, we report the creation of peptide mass reference spectra, which were used to train a machine learning (ML) algorithm for predicting Brucella species. We selected two datasets composed of 107 Brucella strains for the ML alghoritm training, and 160 Brucella strains for validation. The strains have been isolated from our diagnostic activities and included: B. melitensis bv 3 and bv1, B. abortus bv 1 and bv3; B. suis bv 2, B. ceti and B. ovis. All strains were typed by means of molecular and biochemical analyses. Strains were heat inactivated, and protein extraction was performed using ethanol-formic acid protocol. The samples were spotted on a 96-spot steel plate target and covered with alpha-cyano-4-hydroxy-cinnamic acid (HCCA) matrix solution (Bruker Daltonics) before MALDI-TOF MS analysis. The ML algorithm XGBoost was trained with features engineered from mass spectra observations related to the corresponding Brucella species. We used a 5-fold cross-validation, repeated 10 times, to evaluate 50 models whose hyper-parameters were selected according to the random search procedure for determining the greatest accuracy model. Subsequently, the selected model was validated by testing 480 samples from the validation dataset. The Brucella species identification had 99.4% accuracy with 100% diagnostic sensitivity (dSe) and specificity (dSp) for B. abortus, B. ceti and B. ovis. However we observed a small decrease of performance for B. melitensis and B. suis bv2 which showed 97.2% dSe and 99.2% dSp, respectively. Overall, the ML algorithm misidentified 3 B. melitensis with B. suis bv2. Our results showed that MALDI-TOF MS is reliable for Brucella identification to the species level from culture plates. The trained ML algorithm revealed to be specific and highly sensitive and appears to be an efficient and reproducible method for the rapid detection of the genus Brucella. Considering the presence of at least 12 Brucella species, the preliminary dataset needs to be enlarged for comprehensive representation of the entire genus.