Interesting. I'm not a deep learning guy, but from what I can gather, the new auxiliary tasks are to be rewarded for "pixel changes" and "network features".<p>I haven't nearly finished reading the paper, but is it safe to say this is similar (at a very high level), to a type of "novelty search", where the agent is searching not only for a policy that is directly accomplishing the task at hand, but also for novel stimulus (in the case of pixel changes), and novel internal states (features, or maximally activated hidden nodes in the language of the paper), and that the benefit of this would be to more easily find relevant features that could be useful in the "big picture" task, and maybe not get as stuck in a non-optimal policy?<p>(I may be understanding this completely wrong...just an embedded guy looking to get more into this world, haha)