Over the years, many activation functions have been introduced by various machine learning researchers. And with so many different activation functions to choose from, aspiring machine learning practitioners might not be able to see the forest for the trees. Although this range of options allows practitioners to train more accurate networks, it also makes it harder to know which one to use.
In this post, I’ll demonstrate a little research project I did to see how each of the activation functions performs on the MNIST dataset. I’ll start by providing a quick overview of the theory behind each function. If you already know what activation functions are, feel free to skip to the last section to see the benchmarks.