Maybe this?: <a href="https://www.youtube.com/watch?v=Rfhxy2w3Qi4" rel="nofollow">https://www.youtube.com/watch?v=Rfhxy2w3Qi4</a><p>Using a gram matrix across several layers to hold the style,
and more conventional activation optimization for the content. Trained on VGG. It combines the two loss functions of style and content to optimise noise towards a output image. The amazing thing is being able to mathematically separate content from style; in a similar fashion to how the brain can perform this task. See: <a href="https://arxiv.org/abs/1508.06576" rel="nofollow">https://arxiv.org/abs/1508.06576</a><p>Fwiw, I'm part of a group trying a similar thing on very deep residual net's with recurrent net's to improve upon the temporal sweeps.