When it comes to evaluating how well a model performs there are multiple metrics that can be used. To choose the evaluation metric to best evaluate your model, it is vital that you understand what each metric calculates. You may hear a model is extremely accurate but depending on the business question your model is attempting to answer, another metric may be better suited to evaluate the model.
Accuracy is an evaluation metric that allows you to measure the total number of predictions a model gets right. The formula for accuracy is below:
Accuracy will answer the question, what percent of the models predictions were correct? Accuracy looks at True Positives and True Negatives. We will see in some of the evaluation metrics later, not both are used.
A confusion matrix displays counts of the True Positives, False Positives, True Negatives, and False Negatives produced by a model. Using a confusion matrix we can get the values needed to compute the accuracy of a model. Looking at the below confusion matrix, we can use the formula above to calculate accuracy.
An accuracy of 0.45 is pretty low, meaning our model is not predicting many True Positives or True Negatives correctly. However having a high accuracy does not mean we have a good model either.
An accuracy of .96666 is very high. You may be tempted to choose a model that has a high accuracy but you need to think about the business question. Perhaps you built a model that to predict if someone had a disease or not. If most people don?t have the disease and the model predicts negative every time like in the example above, you will achieve a high accuracy. However, this model will never recognize the disease in anyone and therefore would be useless in predicting whether someone has a disease. When accuracy is not a good metric to evaluate your model you can look at other metrics.
Precision evaluates how precise a model is in predicting positive labels. Precision answers the question, out of the number of times a model predicted positive, how often was it correct? Precision is the percentage of your results which are relevant. The formula for precision is below:
The top of the formula is the number of positive observations that a model predicted correctly. The denominator is the number of times the model predicted a positive label in total. Precision is a good evaluation metric to use when the cost of a false positive is very high and the cost of a false negative is low. For example, precision is good to use if you are a restaurant owner looking to buy wine for your restaurant only if it is predicted to be good by a classifier algorithm. If you are only looking to buy wine that is good to sell to your customers then the cost of a false positive, selling a customer a wine that you say is good when it is not, can result in the restaurant owner upsetting a customer and perhaps losing that customer. If the restaurant owner says I hear this is not good wine and the customer chooses to buy anyway, and it turns out to be good, no big deal. This would be a false negative and the cost of this scenario is not really bad.
The above example shows a model that produces a precision of 1. This may sound good but if you investigate the confusion matrix, you?ll see that it only predicted positive 3 times when the data was labeled positive 60 times or 50% of the time. If this model was used to predict whether someone would have a disease or not, it would be telling people majority of the people who had a disease they were healthy. When the cost of false negatives is high, it is better to use recall as an evaluation metric.
Recall calculates the percentage of actual positives a model correctly identified (True Positive). When the cost of a false negative is high, you should use recall. The formula for recall is below:
The numerator is the number of true positives or the number of positives the model correctly identified. The denominator is the number of actual positives predicted by the model and the number of positives incorrectly predicted as negative by the model. For example, you should use recall when looking to predict wether a credit card charge is fraudulent or not. If you have a lot of false negatives, then you have a lot of fraudulent charges that are being labeled as not fraudulent and customers will have money stolen from them.
However having a high recall doesn?t necessary mean a model is good. For example, if a model predicted that everyone had a disease, the model would have a perfect recall but it would have a lot of false positives and be telling people they were sick when they were not.
It is important to consider what question you are trying to answer when deciding what evaluation metric to use and wether false positives or false negatives are worse.