In all machine learning the "top layer" is optimized by GDGS (gradient descent by grad student). All that changes is that underneath that there are more and more layers of calculations, features, parameters and hyperparameters that are then searched/optimized by some algorithm.