Machine learning services

  • ML, a branch of AI, is an upcoming method to analyze large microbiome data sets
  • Microbiome data is becoming more complex with advancing sequencing technologies
  • ML carries out complex analysis by learning from the data, instead of using pre-programmed rules
  • Best performing ML algorithms are trained to predict/identify target variables
  • Predict variables based on the data (e.g. age of patient based on microbiome structure)
  • Create models to identify phenotypes of samples (e.g. healthy vs diseased) with accuracy

Advantages of ML

  • Robust with high dimensional and noisy data
  • Predictive performance can be validated on cross-study samples
  • Important microbes/pathways for the classifier can be ranked in order
  • Complements other standard microbiome analyses

Metrics used to evaluate ML models:

  • Area under the ROC curve
  • Overall accuracy of predictions wherein accuracy is the proportion of unknown samples the model predicts correctly (e.g. identifies if the sample is from healthy or diseased)

Algorithms for classification in ML:

  • Random Forest
  • Adaptive Boosting
  • Extremely Randomized Trees
  • Support Vector Machine
  • Logistic Regression
  • Neural networks
  • Gradient boosting