ML, a branch of AI, is an upcoming method to analyze large microbiome data sets
Microbiome data is becoming more complex with advancing sequencing technologies
ML carries out complex analysis by learning from the data, instead of using pre-programmed rules
Best performing ML algorithms are trained to predict/identify target variables
Predict variables based on the data (e.g. age of patient based on microbiome structure)
Create models to identify phenotypes of samples (e.g. healthy vs diseased) with accuracy
Advantages of ML
Robust with high dimensional and noisy data
Predictive performance can be validated on cross-study samples
Important microbes/pathways for the classifier can be ranked in order
Complements other standard microbiome analyses
Metrics used to evaluate ML models:
Area under the ROC curve
Overall accuracy of predictions wherein accuracy is the proportion of unknown samples the model predicts correctly (e.g. identifies if the sample is from healthy or diseased)