> <i>So you'd call the dropout function on the activations from each layer, zeroing out some at random so that they don't contribute to the "downstream" calculations. (As I understand it, this means that they are also not adjusted during back-propagation -- if nothing else, it would be terribly unfair to the poor ignored neurons to have their weights changed when they didn't contribute to the error.)</i><p>If the weights are effectively set to zero by the dropout, shouldn't the propagated error in the backward pass be zero too, automatically?<p>(I.e., as I understand it, OP's intuitive notion of "fairness" is literally how the error propagation works: Neurons are adjusted by the degree by which they contributed to the output)