Statistics - ROC Plot and Area under the curve (AUC)
Table of Contents
1 - About
You may want to move this threshold. False positives (legitimate emails erroneously predicted as spam) are likely to cause more harm than false negatives (spam emails that are not identified as spam), as we might miss an important email, while it is easy to delete a spam message. In this case, we could require a higher threshold (probability) that a message is spam before we move it into a spam folder.
ROC means Receiver Operating Characteristic. It's an historical term from WW2 that was used to measure the accuracy of radar operators.
This is a single curve that captures the behaviour of the classification rate when varying the classification threshold.
2 - Articles Related
3 - Problem description
4 - Curve
With a true positive rate of one and a false positive rate of zero, the best curve will right up as far as possible into the top left hand corner.
The 45 degree line is the kind of no information line (ie random See ROC curve with sensitivity).
4.1 - False positive / True positive
4.2 - Sensitivity / Specificity
5 - Metrics
5.1 - Area under the curve (AUC)
6 - Tools
6.1 - Weka
A ROC curve for a J48 algorithm.