Four different alpha diversity metrics are employed to look at within ileum sample diversity, visualized with basic boxplot presentation. (A) Faith’s phylogenetic distance, (B) Shannon index, (C) Observed OTU’s and (D) Pielou’s evenness
Four different alpha diversity metrics are employed to look at within rectum sample diversity, visualized with basic boxplot presentation. (A) Faith’s phylogenetic distance, (B) Shannon index, (C) Observed OTU’s and (D) Pielou’s evenness
Between sample diversity of ileum samples, visualized in two dimensions.
(A) Bray-Curtis dissimilarity, (B) Jaccard distance, (C) Unweighted Unifrac distance and (D) Weighted Unifrac distance
Between sample diversity of rectum samples, visualized in two dimensions. (A) Bray-Curtis dissimilarity, (B) Jaccard distance, (C) Unweighted Unifrac distance and (D) Weighted Unifrac distance
Heatmap of classification performance of five different machine learning algorithms; Random Forest (RF), Adaptive boosting (ADA), Extremely Randomized Trees (EXTRA), Support Vector Machine (SVM) and Logistic Regression (LOGREG).
Ileum and Rectum feature tables were collapsed to genera and species to examine differences on both levels. Standard deviation is noted in the parenthesis below ROC AUC value.
ROC AUC interpretation: Values usually range between 0.5 and 1, where 0.5 representing a random predictor while 1 represents the perfect predictor
Receiver Operating Characteristics (ROC) curves of Ileum genera collapsed table classifier performances. Positive class is represented by Crohn’s disease samples and negative by controls. Shuffled labels curves are generated by taking the same test set, but randomizing the sample metadata labels, for example, sick/healthy. Statistical testing can be done between shuffled and real predictions to test if the classifiers have statistically different results compared to shuffled labels. ROC AUC interpretation: Shuffled labels represent the random classifiers performance on a randomized test set, which has a 0.5 ROC AUC performance. All of the models thus had skill in predicting unseen samples sick or healthy labels.
Training importance of each variable to the models in Ileum genera collapsed samples. Mean decrease impurity (MDI) represents how well each variable splits the training data when the models are built. Most informative variables for the model have the highest MDI. Black bars represent the standard deviation.