"As Jeremy Howard points out, even academic papers often use softmax for multi-class classification, and I too have already seen it used incorrectly in blogs and papers during my short time studying DL."<p>AFAIK softmax should be used with mutli-class classification and sigmoid can be used with mutli-label classification.