>>> Softmax can be implemented as a composition of primitive TensorFlow ops (exponent, reduction, elementwise division, etc.): softmax = exp(logits) / reduce_sum(exp(logits), dim)<p>No, it can not be implemented this way, it is numerically unstable, and will produce NaNs if any input is greater than ~88.7. Luckily, it is also not how its implemented in Tensorflow: <a href="https://github.com/tensorflow/tensorflow/blob/2c8d0dca978a246f54c506aae4587dbce5d3bcf0/tensorflow/core/kernels/softmax_op_functor.h#L43" rel="nofollow">https://github.com/tensorflow/tensorflow/blob/2c8d0dca978a24...</a><p>For a clean (and more efficient) C version of this algorithm, take a look at NNPACK reference implementation: <a href="https://github.com/Maratyszcza/NNPACK/blob/master/src/ref/softmax-output.c#L30" rel="nofollow">https://github.com/Maratyszcza/NNPACK/blob/master/src/ref/so...</a>