1. Introduction
In machine learning, we often find ourselves measuring the accuracy of our models, but, are we doing it correctly?
In this tutorial, we’ll talk about the difference between Top-1 Accuracy and Top-N Accuracy, and why they’re important.
2. Top-1 Accuracy
Let’s say we have a model, which tries to classify images of animals. Let’s assume we show the model the image of a cat. Using Top-1 Accuracy, this measurement will consider a prediction as correct if and only if the most probable prediction is a cat.
Let’s expand our example to several predictions:
True label
Predicted label
Correct
Cat
Cat
Dog
Giraffe
Lion
Cat
Giraffe
Giraffe
Dolphin
Dolphin
Given this example, our model predicted correctly 3/5 images, having an accuracy of 60%. As can be seen, Top-1 Accuracy is just what we generally refer to when talking about accuracy.
3. Top-N Accuracy
Top-N Accuracy takes the model predictions with higher probability. If one of them is a true label, it classifies the prediction as correct. Top-1 Accuracy is a special case, in which only the highest probability prediction is taken into account.
Let’s use the same example as before, assuming a Top-3 Accuracy:
True label
Top-3 Predicted labels
Correct
Cat
Cat, Lion, Dog
Dog
Giraffe, Lion, Cat
Lion
Cat, Lion, Dog
Giraffe
Giraffe, Dog, Cat
Dolphin
Dolphin, Cat, Giraffe
Now, using the 3 most probable predictions, we can see that the model predicted correctly 4/5 images, having a Top-3 Accuracy of 80%.
Notice that, with , Top-N Accuracy Top-K Accuracy. In other words, with a higher , the Top-N Accuracy can either get higher or remain the same. This allows us to get insight into how our model works. For example, if the Top-1 Accuracy is really low we might think our model doesn’t know much about the dataset. However, if accuracy increases significantly, we can find that it is actually learning but is lacking some fine-tuning. This can be especially helpful for classification problems with a high number of classes. Depending on the problem, this metric might be more appropriate to measure the model. For example, in the case of a recommendation system. Whether it is for videos, music, or online shops, we value novelty and diversity. We, as a client, are looking for new and diverse videos, music, or products. Therefore, we do not aim to find the most relevant recommendation, but a set of interesting recommendations. It might be more interesting to have the best prediction among a set of interesting predictions; rather than just one good prediction.
4. Conclusion
There are several different methods to measure how good a model is. It is really important to find the most appropriate one for the given problem. In this article, we showed how Top-N Accuracy can be used for certain problems. Also, we’ve seen the difference between Top-1 Accuracy and Top-N Accuracy, and how they can be used to get a better understanding of our model.