Am I wrong in thinking that falling off is not punished enough in this approach? Looking at the numbers provided, falling off seems to still add some distance and not get any punishment, just an end of episode. I'd be tempted to subtract 1 for each step after the car falls off, otherwise the RL will accept the distance gain as progress.<p>Can anyone spot better opportunities for improvement?