Confusion Matrix is a table with the ‘true’ class in rows and the ‘predicted’ class in columns. The diagonal elements represent correctly classified compounds while the cross-diagonal elements represent misclassified compounds. The table also shows the accuracy of the classifier as the percentage of correctly classified compounds in a given class divided by the total number of compounds in that class. The overall (average) accuracy of the classifier is also depicted.
As in the Confusion Matrix shown below, the classifier is an accurate one. Of the 33 compounds that are ‘Binders’ (B), all of them were indeed predicted as ‘Binders’ thus achieving 100% accuracy in predicting the ‘Binder’ class. Similarly, of the 67 compounds that are ‘Non-Binders’ (NB), all of them were predicted as ‘Non-Binders’ thus again achieving 100% accuracy in predicting the ‘Non-Binder’ class. Finally, the Overall Accuracy of the classifier is also thus 100%.
Another classifier, for the same dataset, is not so accurate as can be observed from the confusion matrix shown below. Of the 33 compounds that are ‘Binders’ (B), 27 were correctly predicted as ‘Binders’ while 6 were incorrectly predicted as ‘Non-Binders’ (NB) (an accuracy of 81.8%). Similarly, of the 67 compounds that are ‘Non-Binders’, 57 are correctly predicted as ‘Non-Binders’ while 10 were incorrectly predicted as ‘Binders’ (an accuracy of 85.1%). The overall accuracy of the classifier in predicting the two classes for the given dataset is 84%.
Thus, a Confusion Matrix tells us how the classifier is behaving for individual classes. Overall Accuracy does not indicate that. This is of particular importance in case of imbalanced datasets – wherein the number of compounds in one class is significantly more than that in the other class – as at times happens with Binder/Non-Binder type of datasets.
For the results shown in the Confusion Matrix below, the data is an imbalanced one with only 10 out of 100 compounds being ‘Binders’ (B) and the rest being ‘Non-Binders’ (NB). The classifier predicts everything as ‘Non-Binder’ and just for that reason achieves an overall accuracy of 90%. However, the Confusion Matrix illustrates the inability of the classifier in predicting the ‘Binder’ class (an accuracy of 0%). Given that the prediction of ‘Binder’ class would be the one of actual interest, this particular classifier is no good even though it happens to be 90% accurate in its predictions.
Cite This As:
Dogra, Shaillay K., "Confusion Matrix" From QSARWorld--A Strand Life Sciences Web Resource.