I feel like to really get to the level of hypothetically useful it should be able to take the samples from an input source.<p>I wonder if you could do it on the full 28*28 by never holding the full image in memory at once, just as an input stream.
say a 1d convolution on each line as it comes in to turn a [1,28] to [3,7]
buffer two lines of the [3,7] = 42. Then after there are three results of the third line convolution are produced [3,3]=9, start performing a 2d convolution using the first two lines [2,3,:3] replacing the data at the start (as it has already been processed).