Interesting paper, thanks for bringing this up! I have been working on methods for trajectory optimization using both, analytic gradient computations and black box stochastic gradient approximations (proximal policy optimization).<p>I was always wondering about a question that is touched in the paper: despite the analytic gradient computation being intuitively more efficient and mathematically correct, it is much harder to learn a policy with it than with the “brute force trial-and-error” black box methods.<p>This paper brings many new perspectives on why.