QSAR WORLD
Home | About QSAR World | Strand Life Sciences | Contact Us
Google Custom Search

Confusion Matrix

Confusion Matrix is a table with the ‘true’ class in rows and the ‘predicted’ class in columns. The diagonal elements represent correctly classified compounds while the cross-diagonal elements represent misclassified compounds. The table also shows the accuracy of the classifier as the percentage of correctly classified compounds in a given class divided by the total number of compounds in that class. The overall (average) accuracy of the classifier is also depicted.

As in the Confusion Matrix shown below, the classifier is an accurate one. Of the 33 compounds that are ‘Binders’ (B), all of them were indeed predicted as ‘Binders’ thus achieving 100% accuracy in predicting the ‘Binder’ class. Similarly, of the 67 compounds that are ‘Non-Binders’ (NB), all of them were predicted as ‘Non-Binders’ thus again achieving 100% accuracy in predicting the ‘Non-Binder’ class. Finally, the Overall Accuracy of the classifier is also thus 100%.


Confusion Matrix Image one


Another classifier, for the same dataset, is not so accurate as can be observed from the confusion matrix shown below. Of the 33 compounds that are ‘Binders’ (B), 27 were correctly predicted as ‘Binders’ while 6 were incorrectly predicted as ‘Non-Binders’ (NB) (an accuracy of 81.8%). Similarly, of the 67 compounds that are ‘Non-Binders’, 57 are correctly predicted as ‘Non-Binders’ while 10 were incorrectly predicted as ‘Binders’ (an accuracy of 85.1%). The overall accuracy of the classifier in predicting the two classes for the given dataset is 84%.


Confusion Matrix Image two


Thus, a Confusion Matrix tells us how the classifier is behaving for individual classes. Overall Accuracy does not indicate that. This is of particular importance in case of imbalanced datasets – wherein the number of compounds in one class is significantly more than that in the other class – as at times happens with Binder/Non-Binder type of datasets.

For the results shown in the Confusion Matrix below, the data is an imbalanced one with only 10 out of 100 compounds being ‘Binders’ (B) and the rest being ‘Non-Binders’ (NB).  The classifier predicts everything as  ‘Non-Binder’ and just for that reason achieves an overall accuracy of 90%. However, the Confusion Matrix illustrates the inability of the classifier in predicting the ‘Binder’ class (an accuracy of 0%). Given that the prediction of ‘Binder’ class would be the one of actual interest, this particular classifier is no good even though it happens to be 90% accurate in its predictions.


Confusion Matrix Image three


See Also:

References:

http://en.wikipedia.org/wiki/Confusion_matrix


Cite This As:

Dogra, Shaillay K., "Confusion Matrix" From QSARWorld--A Strand Life Sciences Web Resource.
http://www.qsarworld.com/qsar-ml-confusion-matrix.php

Have any Questions?
Name:
Email:
Enter your query/comment here
 

    Facilitated by
    Strand Life Sciences Pvt. LtdStrandls Logo