To me, this basically says "LLMs aren't pre-trained on enough 1D timeseries data" - there's a classic technique in time series analysis where you just do a wavelet or FFT on the time series and feed it into a convnet as an image, leveraging the massive pre-training on, e.g. ImageNet. This "shouldn't" be the best way to do it, since a giant network should learn a better internal representation than something static like FFT or a wavelet transform. But there's no 1D equivalent of ImageNet so it still often works better than a 1D ConvNet trained from scratch.<p>Same applies here. An LLM trained on tons of time series should be able to create its own internal representation that's much more effective than looking at a static plot, since plots can't represent patterns at all scales (indeed, a human plotting to explore data will zoom in, zoom out, transform the timeseries, etc.). But since LLMs don't have enough 1D timeseries pretraining, the plot-as-image technique leverages the massive amount of image pre-training.