I don't know anything about reinforcement learning, but<p>>We can think of this as follows: the high-entropy parts of the sentence mark the starts of high-level actions, while the low-entropy parts represent the execution of those high-level actions.<p>seems like a wonderful insight.