Our cloud service has a ton of jpgs loaded by users. The images are loaded, then viewed occasionally by auditors and sometimes automated OCR. So written once and read/viewed ~4 times. We want to look at ways to save storage footprint, and I'd like to down sample the JPGs yet keep enough quality to keep users and automated OCR systems happy.<p>I'm hoping that someone has studied this problem and any pointers would be greatly appreciated.
For text, 8-bit indexed png or if your scans are high quality then black and white indexed png files will be hard to top.<p>After the downsample, run them through pngquant or advpng. The latter supports zopfli via insane mode and will likely take the longest to compress with the smallest result.