Nick (the author) reflects on this:
"The data needs to be merged into a format which can be used to train a neural network. Solving this leads to the second, much bigger issue: many of the resulting label to image mappings are inappropriate...".<p>If one instead use a RNN (recurrent neural network) particularly with LSTM, then it can take as input the sequence of all photos from a business and the output of the model would then be the sequence of labels, similar to how translation models work.
Of course then another problem could perhaps be the ordering of labels is unrelated to order of photos for a business, but there is probably some way to handle this by either data synthesis (multiple permutations of the training data to ignore ordering of labels) or by sorting the labels in a certain fashion that the model can learn.
I'd love to see more articles from fast.ai students covering topics other than image (multi) classification. I've been gradually going through the course, and attempting to apply what I'm learning as I go, but I rarely see good results, especially with structured data.