A further step is Langevin Dynamics, where the system has damped momentum, and the noise is inserted into the momentum. This can be used in molecular dynamics simulations, and it can also be used for Bayesian MCMC sampling.<p>Oddly, most mentions of Langevin Dynamics in relation to AI that I've seen omit the use of momentum, even though gradient descent with momentum is widely used in AI. To confuse matters further, "stochastic" is used to refer to approximating the gradient using a sub-sample of the data at each step. You can apply both forms of stochasticity at once if you want to!