We always aim to reduce the generalization error of a classification model. Generalization error gives the idea that how well the model do on unseen data. But even after getting the lowest generalization error or highest accuracy, how confident are we that it shall hold on new/unseen data? A confidence interval informs where the true accuracy or generalization error would lie. Here is a generic example taken from PSU website-
Let’s assume we trained a decision tree on a dataset and I got 85% accuracy after all optimizations that could be done. There are total of 200 instances in the data. How can we convince the client that the decision tree model will give 85% accuracy on unseen data as well? Could we give him this confidence? Statistics says yes!
We can calculate the confidence interval, i.e. lower bound and upper bound of the accuracy scores.
We want to be very confident — 90% confident of the range where true accuracy would lie. Confidence interval would cover 90% of accuracy scores on unseen data.
The formula of calculating confidence interval is-
where, p = 85%, N = 200 test instances, 1-a = 90% then the confidence interval is [0.809, 0.891]
Hence, we can say that we are 90% confident that actual accuracy of our decision tree model which show 85% on 200 test instances, would be somewhere in range of 80–89% on unseen data.
As suspected, the increasing or decreasing the number of test instances would affect the confidence interval. More data we test on more tighter the confidence interval becomes. If the model gives 85% accuracy on 500 test instances then the confidence interval would be somewhere 82–87%.
References-