Pillow-SIMD – Fast, production-ready image resize for x86

182 点作者 igordebatur将近 8 年前

18 条评论

pedrocr将近 8 年前

Is doing lanczos really faster and/or better quality for scaling down than just doing a simple pixel mixing:<a href="http://entropymine.com/imageworsener/pixelmixing/" rel="nofollow">http://entropymine.com/imageworsener/pixelmixing/</a>I implemented this for my image pipeline:<a href="https://github.com/pedrocr/rawloader/blob/230432a403a9febb5e5c004780211f7e765130b9/src/imageops/demosaic.rs#L155-L190" rel="nofollow">https://github.com/pedrocr/rawloader/blob/230432a403a9febb5e...</a>Makes for simple enough code and even before any serious effort at optimization or SIMD it can convert a 3680x2456x4 image in 32bit float (source article is 3x8bit) to 320x200x4 also in 32bit float in about 60ms (across 4 threads in 2 cores on a i5-6200U).

评论 #14715043 未加载

评论 #14715950 未加载

评论 #14718200 未加载

评论 #14715050 未加载

评论 #14718433 未加载

评论 #14715715 未加载

Asooka将近 8 年前

Let me insert my personal pet-peeve: have you thought of making it colour-space-aware? Most (all?) images you'll encounter are stored in sRGB colourspace, which isn't linear, so you can't do the convolution by just multiplying and adding (the result will be slightly off). The easiest way would be to convert it to 16-bit-per-channel linear colour space using a lookup table, do the convolution in linear 16-bit space, then convert back to 8-bit sRGB.

评论 #14714356 未加载

评论 #14714233 未加载

评论 #14716383 未加载

评论 #14714068 未加载

dahart将近 8 年前

I'd be very interested in an optional Pillow-SIMD downsampling resize that produces 16 bit output internally and then uses a dither to convert from 16 bit to 8 bit. Photoshop does this by default and it produces superior downsampling. Without keeping the color resolution higher, you can end up with visible color banding in resized 8 bit images that wasn't visible in the source image.I am curious if the reason that Pillow-SIMD is more than 4x faster than IPP is due to features IPP supports - like higher internal resolution - that Pillow-SIMD doesn't? The reported speeds here are amazing, and I'm definitely going to check this project out and probably use it, but I'd love a little clarity on what the tradeoffs are against IPP or others. I assume there are some.

评论 #14718394 未加载

评论 #14716430 未加载

FrozenVoid将近 8 年前

Obligatory: <a href="http://johncostella.webs.com/magic/" rel="nofollow">http://johncostella.webs.com/magic/</a>

gioele将近 8 年前

Unrelated: since when has this picture of Bologna become a new lenna.jpg?I think I have already seen it in a couple of recent posts about image compression. (Fits perfectly the definition of Baader-Meinhof phenomenon [1].)[1] <a href="https://en.wikipedia.org/wiki/List_of_cognitive_biases#Frequency_illusion" rel="nofollow">https://en.wikipedia.org/wiki/List_of_cognitive_biases#Frequ...</a>

评论 #14716736 未加载

评论 #14718524 未加载

评论 #14720704 未加载

评论 #14719227 未加载

stephencanon将近 8 年前

FWIW the Accelerate framework[1] gives roughly comparable performance[2] for Lanczos resizing. Apple platforms only, but all Apple platforms, not limited to x86.[1] vImageScale_ARGB8888( ).[2] I don't have identical hardware available to time on, and it's doing an alpha channel as well, so this is slightly hand-wavy.

评论 #14715286 未加载

nostrademons将近 8 年前

Curious how this would compare vs. running it on the GPU? This is literally what GPUs are made for, and they often have levels of parallelism 500+ times greater than SIMD.

评论 #14715481 未加载

mark-r将近 8 年前

I'm really happy to see this. The one time I tried looking at the PIL sources for resizing, I was appalled at what I saw. Simply seeing that you're expanding the filter size as the input to output ratio shrinks is a huge deal.When I wrote my own resizing code, I found it helpful to debug using a nearest-neighbor kernel: 1 from -0.5 to 0.5 and 0 everywhere else. It shook out some off-by-one errors.

cvwright将近 8 年前

> No tricks like decoding a smaller image from a JPEGGiven that most cameras are producing JPEG now, I'm curious why you don't make use of the compressed / frequency-domain representation. To a novice in this area (read: me), It seems like a quick shortcut to an 8x or 4x or 2x downsample.Or is the required iDCT operation just that much more expensive than the convolution approach?

评论 #14715096 未加载

评论 #14715053 未加载

Veratyr将近 8 年前

Looks really nice!I'd love to see vips in the benchmark comparison, perhaps a Halide-based resizer too as those are the fastest I've found so far. Perhaps GraphicsMagick too, as I believe it's meant to be faster than ImageMagick in many cases.

评论 #14715835 未加载

ashishuthama将近 8 年前

Another data point: MATLAB, glnxa64 AVX2, 12 core>> maxNumCompThreads(1);>> im = randi(255, [2560, 1600, 3],'uint8');>> timeit(@()imresize(im,[320,200],'bilinear','Antialiasing',false))ans =<pre><code> 0.0083 </code></pre> >> timeit(@()imresize(im,[320,200],'bilinear'))ans =<pre><code> 0.0301 </code></pre> >> maxNumCompThreads(6);>> timeit(@()imresize(im,[320,200],'bilinear','Antialiasing',false))ans =<pre><code> 0.0062 </code></pre> >> timeit(@()imresize(im,[320,200],'bilinear'))ans =<pre><code> 0.0113 </code></pre> Oh, missed that lanczos2 part:>> maxNumCompThreads(1);>> timeit(@()imresize(im,[320,200],'lanczos2','Antialiasing',false))ans =<pre><code> 0.0146 </code></pre> >> maxNumCompThreads(6);>> timeit(@()imresize(im,[320,200],'lanczos2','Antialiasing',false))ans =<pre><code> 0.0049 </code></pre> Since MATLAB tries to do most of the computation in double precision, its harder to extract much from SIMD.

ttoinou将近 8 年前

Have you tried to use a fast blur (like StackBlur for example : <a href="http://www.quasimondo.com/BoxBlurForCanvas/FastBlur2Demo.html" rel="nofollow">http://www.quasimondo.com/BoxBlurForCanvas/FastBlur2Demo.htm...</a> , the radius should be computed according to the ratio between original size and target size) as a first step before taking the classic nearest neighbor ? And also try to make an algorithm that resize to multiple resolution at the same time could improve speed<pre><code> I take an image of 2560x1600 pixels in size and resize it to the following resolutions: 320x200, 2048x1280, and 5478x3424 </code></pre> So you are also upscaling ?

vadiml将近 8 年前

Great work and great article. One question though: Did you consider to replace convolution based filters by FFT based ones ?

评论 #14715175 未加载

评论 #14714117 未加载

评论 #14714111 未加载

gfody将近 8 年前

> I wasn’t building it for fun: I work for Uploadcare and resizing images has always been a practical issue with on-the-fly image processing.you ever consider pushing the work entirely to the client with a resize implemented in javascript? that would cut down on bandwidth as well.

评论 #14713610 未加载

评论 #14714036 未加载

评论 #14713514 未加载

techdragon将近 8 年前

Any reason this had to be a fork?I would much rather this feature be in Pillow so ALL of the python ecosystem could get 6 times faster image resizing.

评论 #14715251 未加载

vortico将近 8 年前

Looks fantastic! Are there other bottlenecks, such as JPEG encoding and decoding that can be ported to SIMD code in Pillow?

评论 #14713709 未加载

legulere将近 8 年前

Can the speed increased even further using GPGPU?

评论 #14715033 未加载

MuffinFlavored将近 8 年前

> With optimizations, Uploadcare now needs six times fewer servers to handle its load than before.This is devil's advocate, but did you guys have concrete need for this optimization? You now need six times fewer servers, but was that a crippling problem, or is it a cool statistic for the future when you get more users?

评论 #14715348 未加载

评论 #14714147 未加载

评论 #14715268 未加载