Problems with image resizing is a much deeper rabbit hole than this. Some important talking points:<p>1. The form of interpolation (this article).<p>2. The colorspace used for doing the arithmetic for interpolation. You most likely want a linear colorspace here.<p>3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.<p>4. Quantization and dithering.<p>5. If you have an alpha channel, using pre-multiplied alpha for interpolation arithmetic.<p>I'm not trying to be exhaustive here. ImageWorsener's page has a nice reading list[1].<p>[1] <a href="https://entropymine.com/imageworsener/" rel="nofollow">https://entropymine.com/imageworsener/</a>
I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that. Unless it's actually making a visible change that spoils whatever it is the model supposed to be looking for. To use the standard cat / dog example, filter choice or resampling choice is not going to change what you've got a picture of, and if your model is classifying based in features that change with resampling, it's not trustworthy.<p>If one is concerned about this, one could intentionally vary the resampling or deliberately add different blurring filters during training to make the model robust to these variations
For those going down this rabbit hole, perceptual downscaling is state of the art, and the closest thing we have to a Python implementation is here (with a citation of the original paper): <a href="https://github.com/WolframRhodium/muvsfunc/blob/master/muvsfunc.py#L3671">https://github.com/WolframRhodium/muvsfunc/blob/master/muvsf...</a><p>Other supposedly better CUDA/ML filters give me strange results.
> The definition of scaling function is mathematical and should never be a function of the library being used.<p>Horseshit. Image resizing or any other kind of resampling is essentially always about filling in missing information. The is no mathematical model that will tell you for certain what the missing information is.
Now that's an interesting topic for photographers who like to experiment with anamorphic lenses for panoramas.<p>An anamorphic lens (optically) "squeezes" the image onto the sensor, and afterwards the digital image has to be "desqueezed" (i.e. upscaled in one axis) to give you the "final" image. Which in turn is downscaled to be viewed on either a monitor or a printout.<p>But the resulting images I've seen until now nevertheless look good. I think that's because in natural images you have not that many pixel-level details. And we mostly see downscaled images on the web or in youtube videos most of the time ...
I'm shocked. I don't even know this is a thing.<p>By that I mean, I know what bilinear/bicubic/lanczos resizing algorithms are, and I know they should at least have acceptable results (compared to NN).<p>But I don't know famous libraries (especially OpenCV which is a computer vision library!) could have such poor results.<p>Also a side note, IIRC bilinear and bicubic have constants in the equation. So technically when you're comparing different implementations you need to make sure this input (parameters) is the same. But this shouldn't excuse the extreme poor results in some.
If their worry is the differences between algorithms in libraries in different execution environments, shouldn't they either find a library they like that can be called from all such environments or if they can't find one or there is no single library that can be used in all environments then shouldn't they just write their own using their favorite algorithm? Why make all libraries do this the same way? Which one is undeniably correct?
Hmmm. With respect to feeding an ML system, are visual glitches and artifacts important? Wouldn't the most important thing to use a transformation which preserves as much information as possible and captures relevant structure? If the intermediate picture doesn't look great, who cares if the result is good.<p>Ooops. Just thought about generative systems. Nevermind.
So, what are the dangers? (what's the point of the article?) That you'll get different model with same originals processed by different algorithms?<p>The comparison of resizing algorithms is not something new, importance of adequate input data is obvious, difference in image processing algorithms availability is also understandable. Clickbaity.
I was sort of expecting them to describe this danger to resizing: one can feed a piece of an image into one of these new massive ML models and get back the full image - with things that you didn't want to share. Like cropping out my ex.<p>IS ML sort of like a universal hologram in that respect?
If you upscale (with interpolation) some sensitive image (think security camera), could that be dismissed in court as it "creates" new information that wasn't there in the original image?
The bigger problem is that the pixel domain is not a very good domain to be operating in. How many hours and of training and thousands of images are used to essentially learn about Gabor filters.
This article throws a red flag on proving negative(s). This is impossible with maths. The void is filled by human subjectivity. In a graphical sense, "visual taste."
What are some good image upscaler libraries that exist? I'm assuming the high quality ones would need to use some AI model to fill in missing detail.
Image resizing is one of those things that most companies seem to build in-house over and over. There are several hosted services, but obviously sending your users photos to a 3rd party is pretty weak. For those of us looking for a middle-ground: I've had great success with imgproxy (<a href="https://github.com/imgproxy/imgproxy">https://github.com/imgproxy/imgproxy</a>) which wraps libvips and well is maintained.