λmem.ac
imagesmodepost cover

The Softmax Diaries

One function, infinite drama

Softmax turns logits into a probability distribution:

softmax(z)i=ezi/Tjezj/T\mathrm{softmax}(z)_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}}

The temperature TT sets the mood: low TT makes the model confident and spiky, high TT makes it hesitant and flat.

lightbulbTipexpand_more

Sampling with T0.7T \approx 0.7 is a good default when you want variety without chaos.

Why it’s everywhere

It’s differentiable, it normalises, and it plays nicely with cross-entropy. Of course it’s everywhere.