The symmetry between recall and precision

Arguably the most common metrics: recall and precision, are very easy to understand once you see the symmetry.

What makes everything clear is the reference: what is the reference frame for this number (the denominator). For recall, the reference frame is the ground truth data; for precision, the reference frame is the data we found using our algorithm. The numerator is the same thing, so that’s simple.

It’s basically different normalisation methods. Recall normalise with the number of things we are trying to approach (ground truth), so the recall tells you how much we approached the “truth”; precision normalise with the number of things we calculated, so the precision tell us how likely that our calculation is correct.

The ideal case would be a bijection: for every ground truth we have one calculated value corresponds to, and vice versa. But this is not always the case! If it’s possible to have repeated value in the ground truth or the calculated value, be careful to not double count!

The problem is clearer when we allow a certain degree of fuzziness:

IMG_2197.png

For example, the problem in the figure above, we want to see: given the ground truth (in red), how close are the calculated values (in orange). So we need a threshold to define how close is close!

It’s easy so far because in the case above I plotted a bijection. How about the cases below:

What should we do now? The precision and recall better not to be > 1, right?

One way is to “find” a one-to-one(injective) mapping (not necessarily onto or surjective) by not counting the value even if they are close to the ground truth/calculated value.

Also, depending on which reference frame you are in (whether you are calculating recall or precision), you need to discard different values. TheĀ mapping can be different! And the way to create the two mappings is symmetric to each other! You can even find a injective/surjective symmetry there. Fun fun!