I wonder whether the split into high-frequency black-and-white and low-frequency color is an artifact of training on images that were compressed using chroma subsampling, which discards high-frequency color variations. That's a pretty common trick for getting better compression ratios without visibly affecting quality, because humans aren't as sensitive to changes in color as to high-frequency lighness changes.