WOW! They are using BFGS! Haven't heard of that in decades! Had to think a little: Yup, the full name is Broyden–Fletcher–Goldfarb–Shanno for iterative unconstrained non-linear optimization!<p>Some of the earlier descriptions of the optimization being used in the AI <i>learning</i> was about steepest descent, that is, just find the gradient of the function are trying to minimize and move some distance in that direction. Just using the gradient was concerning since that method tends to <i>zig zag</i> where after, say, 100 iterations the distance moved in the 100 iterations might be several times farther than the distance from the starting point of the iterations to the final one. Can visualize this <i>zig zag</i> already in just two dimensions, say, following a river, say, a river that curves, down a valley the river cut over a million years or so, that is, a valley with steep sides. Then gradient descent may keep crossing the river and go maybe 10 feet for each foot downstream!<p>Right, if just trying to go downhill on a tilted flat plane, then the gradient will point in the steepest descent on the plane and gradient descent will go all way downhill in just one iteration.<p>In even moderately challenging problems, BFGS can a big improvement.