It might be fun to exercise this method across an information-theoretic well-bounded set of shapes or object domains to try to quantify its limitations in generating useful independent forms of novelty.<p>For example, you might use it to formulate a set of wavelets that when combined judiciously would effectively span a well-defined distribution of shapes generated from a small grammar. In so doing, you could quantify the shape variance and identify which augmentation transformations added most value for training (minimally modeling that variance) and which added least.<p>Maybe you could also combine this with t-SNE to gain some intuition of which 'wavelet' manifested where in the trained net, which resonated most, and in concert with which other wavelets. You could explore this across different CNN sizes and designs, looking for evidence of wavelet ensemble or hierarchy.<p>With some careful engineering, you could try to force emergent autoencoders to reveal themselves and then explore their interactions.