Real-Time Adaptive Image Compression

72 点作者 cardigan将近 8 年前

17 条评论

trevyn将近 8 年前

Nice work, but disingenuous to not include a BPG (HEVC) image for comparison -- BPG is close to state-of-the-art, not WebP -- even their own SSIM charts show this.Interesting that decoding is slower than encoding. Also curious about performance on CPU.This approach may also be susceptible to "hallucinating" inaccurate detail; you can see a little bit of this on the upper-right of the girl's circled eyelid compared to the original Kodak image. See also: <a href="http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning" rel="nofollow">http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...</a>

评论 #14367341 未加载

vladdanilov将近 8 年前

As someone interested in the field, this does not look much different from state-of-the-art video codecs, e.g. variable-size blocks, wavelet, predicator, arithmetic coding. Only that the predicator is trained on real data. But symbol dictionaries are already used in modern compressors like brotli and zstd.Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not particularly good to fully rely on [1] [2]. In my experiments it performed poorly.I think Google team effort will have a much bigger impact [3] just by combining all recent practical improvements in the image compression.Meanwhile, they could have optimized images on the website a little better. Saved ~15% with my soon-to-be-obsolete tool [4].[1] <a href="https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#post52583" rel="nofollow">https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...</a>[2] <a href="https://medium.com/netflix-techblog/toward-a-practical-perceptual-video-quality-metric-653f208b9652" rel="nofollow">https://medium.com/netflix-techblog/toward-a-practical-perce...</a>[3] <a href="https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisual-JPEG-encoder?p=52198&viewfull=1#post52198" rel="nofollow">https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisu...</a>[4] <a href="http://getoptimage.com" rel="nofollow">http://getoptimage.com</a>

amelius将近 8 年前

From their website:> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience, 50+ issued patents and 5000+ citations.I just hope this type of research isn't going to end in a patent encumbrance, like it did with JPEG and MPEG.These techniques are right around the corner, no matter who invents the file formats.So if their idea is to lock these general ideas down with more patents, I'd want them to stop their research and let people with more open intentions research this further.

评论 #14367274 未加载

评论 #14367118 未加载

tomaskafka将近 8 年前

A rarely discussed danger of all machine learning models: If they don't know the answer, they'll rather make something up.Here's a Google Translate example: <a href="https://twitter.com/keff85/status/862690920805916672" rel="nofollow">https://twitter.com/keff85/status/862690920805916672</a>I wouldn't like to lose a part of parcel in a lawsuit because an adaptive algorithm made up some details in aerial photograph so that it compresses better ...

maaark将近 8 年前

So is this only good for ridiculously low target filesizes? Noone in their right mind is going to compress a "480x480 image to a file size of 2.3kB"What I want to see is an acceptable looking JPG next to a WaveOne image of the same size. Or an acceptable looking WaveOne next to a JPG of the same size.How small is good enough? How good is small enough?

评论 #14366057 未加载

SimplyUnknown将近 8 年前

I wonder how this compares to FLIF. I also tried to compress images based on shape and structure but by approximating these using skeletons.I'm just a bit struggling with their performance comparison. The graphs they present are very pretty and promising but for the presented images we're quite left in the dark. They dump some images and theirs looks prettier and the authors give us some indication of quality but it's not conclusive evidence that their method produces better images. Typically when different compressed images are presented two things can vary: quality and file-size. In the presented images both seem to varying without telling us which is which. Also, there is no baseline to compare against, either in terms of filesize or what the should look like. Sure, we humans can make a very educated guess but it is just sloppy to not include the uncompressed original image.I will be fully convinced when I can try it for myself on my own image set.

评论 #14366510 未加载

评论 #14366085 未加载

svantana将近 8 年前

Very impressive work, though it seems like a mistake to focus on compression, which gets less valuable as storage and bandwidth gets cheaper. You need only look to the staying power of jpeg, which is so far from the state of the art, yet it's not going anywhere. Why? The demand for replacing it is not strong enough.They obviously have some good image priors here, if I were them I would consider applying this tech to other image-related things, like image manipulation, or image search. Although competition is heating up quickly in these fields...

评论 #14366826 未加载

bhouston将近 8 年前

Deep learning will be a great way to do compression for sure, both for audio, video and images. I could see that one could download "knowledge sets" for these decompressors. Looking at Google Earth, download the supplemental "knowledge set" for overhead shots of cities and country side. Looking at people, download the supplemental "knowledge set" for faces and clothing, etc.Basically each domain you want to do well in you need a knowledge set that is trained on that data. Then you need a discriminator on the compression side to classify an image or subregions of an image into those categories.If you can make the knowledge sets downloadable on demand and then cached you can be incredibly efficient over the long term, while maintaining very small initial download sets. I think evolveable knowledge sets over time also ensure that the codex is flexible to handle currently not foreseen situations. Nobody wants a future where are DL-based image/video compression tool only knows a few pre-determined sets and is mediocre on everything else.

评论 #14374217 未加载

discreditable将近 8 年前

This encoder seems to have some weird distortions that are most visible in the aerial shots. Compare [1] and [2]. The lines on the basketball court are distorted and curved. If you look closely, there is also curvature added to the sidewalks where there isn't any. In case those links break, I'm referring to the top row of aerial images.[1] <a href="https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b040/590bade05016e1ca29d9d74f/5910f88203596eba4ac0e444/1494284425679/bloomington28_3_crop_aerial_bpp0.1_disc_reconst_WO.png" rel="nofollow">https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...</a>[2] <a href="https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b040/590bade05016e1ca29d9d74f/5910f88086e6c0368e048208/1494284423491/bloomington28_3_crop_aerial_bpp0.1_disc_reconst_JP2.png" rel="nofollow">https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...</a>

评论 #14367363 未加载

espadrine将近 8 年前

> While we are slightly faster than JPEG (libjpeg) and significantly faster than JPEG 2000, WebP and BPG, our codec runs on a GPU and traditional codecs do not — so we do not show this comparison.This is great news!I'd actually like to see the plot, though. (Both for encoding and decoding.) It stands to reason that a neural network can optimize image compression, as it can encode high-level information like "this is a face". But encoding / decoding speed is the sticking point, so I feel successes there should be emphasized.The necessity of having a GPU doesn't seem problematic nowadays; everything has one. Testing it with a mobile-grade GPU would be interesting.

评论 #14366522 未加载

boromi将近 8 年前

I'm going to need to see this code in practise to believe it.

评论 #14365759 未加载

rothron将近 8 年前

Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.

评论 #14366196 未加载

评论 #14366875 未加载

mcraiha将近 8 年前

One big reason for "hardcoded" encoders and decoders is that they much easier to implement in hardware.One can improve e.g. H.265 somewhat easily, if software only solution is an option. But if you need cheap hardware only solution then ML-required-way seems a bit too expensive.

CyberDildonics将近 8 年前

Does no one realize this is a joke / marketing?Directly from the paper's PDF:"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery."

评论 #14367406 未加载

评论 #14366738 未加载

bhouston将近 8 年前

Is this open source or something that you are aiming to license?

creo将近 8 年前

Where is PNG?

评论 #14365555 未加载

hojijoji将近 8 年前

for some reaso they do not show the uncompressed image for comparison

评论 #14365873 未加载

17 条评论

trevyn将近 8 年前

评论 #14367341 未加载

vladdanilov将近 8 年前

amelius将近 8 年前

评论 #14367274 未加载

评论 #14367118 未加载

tomaskafka将近 8 年前

maaark将近 8 年前

评论 #14366057 未加载

SimplyUnknown将近 8 年前

评论 #14366510 未加载

评论 #14366085 未加载

svantana将近 8 年前

评论 #14366826 未加载

bhouston将近 8 年前

评论 #14374217 未加载

discreditable将近 8 年前

评论 #14367363 未加载

espadrine将近 8 年前

评论 #14366522 未加载

boromi将近 8 年前

I'm going to need to see this code in practise to believe it.

评论 #14365759 未加载

rothron将近 8 年前

Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.