TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Generative Adversarial Networks for Extreme Learned Image Compression

105 点作者 relate大约 7 年前

11 条评论

rasz大约 7 年前
Picture is not compressed, its hallucinated from vague memory of the real thing, a mere dream. Cars vanish, building change wall structure, even the license plate receives fake text absent from source materia.<p>Its a giant guesswork of what was there originally. Reminds me of Xerox scanners lying about scanned in numbers <a href="http:&#x2F;&#x2F;www.dkriesel.com&#x2F;en&#x2F;blog&#x2F;2013&#x2F;0802_xerox-workcentres_are_switching_written_numbers_when_scanning" rel="nofollow">http:&#x2F;&#x2F;www.dkriesel.com&#x2F;en&#x2F;blog&#x2F;2013&#x2F;0802_xerox-workcentres_...</a>
评论 #16817693 未加载
评论 #16818788 未加载
评论 #16817447 未加载
iTokio大约 7 年前
The trick is not so much about compression, but rather about image generation. The trade off is completely different from usual lossy algorithms. A highly compressed image might still retain high visual quality but with completely different details, textures.<p>Kinda what would happen if you use a perfect painter with a blurry memory.
评论 #16816765 未加载
评论 #16816816 未加载
bcheung大约 7 年前
I&#x27;d be curious to see how different levels of quantization affect the image. From the paper it looks like the quantization is applied at the latent feature space. I wonder if it has similar effects like the celebrity GAN&#x27;s we have seen where interpolating in the latent space results in morphing from one face to another. Could be funny when compression doesn&#x27;t result in something blocky or distorted, but replacing objects with other objects that look similar to them.<p>This seems to be for static images, but this gets me wondering if an RNN can be used and have better motion prediction that other current &quot;hard coded&quot; solutions.<p>Also, the more specific the domain, the better the compression, since it can specialize. I&#x27;m wondering about the practical applications of this. Do we have different baselines that can be used for different use cases?
评论 #16817856 未加载
return1大约 7 年前
I wonder how pied piper will respond to that. This could be a good idea for video compression that is &quot;monothematic&quot;. Hmmm, i can think of a video industry that is monothematic...
mmastrac大约 7 年前
Just waiting for this to show up in a video compression standard. With the right network it could be just as fast to decompress, though probably insanely slow to compress.
评论 #16819415 未加载
stochastic_monk大约 7 年前
I think that the title of the paper should state that it is for <i>lossy</i> image compression, which clearly states how it works and what task it performs.<p>I would be surprised if there wasn&#x27;t a way to provide a learned, lossless method of compression, but that would be a very different paper and result.
tmpmov大约 7 年前
Take the following as coming from a dilettante... I&#x27;m still trying to understand the remainder of the paper but felt like writing on the basics of the encoder&#x2F;decoder&#x2F;quantizer setup they mention.<p>I found this particularly interesting &quot;To compress an image x ∈ X , we follow the formulation of [20, 8] where one learns an encoder E, a decoder G, and a finite quantizer q.&quot;<p>I feel like this is related to some of the standard human memorization&#x2F;learning techniques. Example: I&#x27;m learning the guitar fretboard note placement in e standard. It&#x27;s difficult for me to visualize the first 4 frets on a 6 string guitar with notes on each fret.<p>To help me memorize the note placement I develop various mnemonic devices (both lossy and lossless). I know I&#x27;ve memorized the fretboard sufficiently when I can visualize it.<p>Attempting to translate my reading of the paper I believe the following analogy is apt. My &quot;encoder&quot; operates on a short term image when I close my eyes after looking at a fret diagram. It produces semantic objects, i.e. an ordered sequence of &quot;letters&quot; or pairs of letters (letters that are horizontally, vertically or diagonally aligned). The quantizer takes these objects and looks at the order&#x2F;distribution. The quantizer places more importance on some of the semantic objects than others (the fourth fret has 4 natural notes before an accidental). My decoder is interpreting the stored&#x2F;compressed note information to try to produce the image. It may be off substantially, so I correct and repeat the process.<p>The process of optimizing what the semantic objects are, the weight each gets, and how I use them to derive the original image seems like a fairly good representation of what I do (though at least some of that appears to be fixed in the learning algorithm typically). Of course, analogies are just that and mine doesn&#x27;t take into account the discriminator or the remaining &quot;heart&quot; of the paper.<p>I think the heart of the paper is that they&#x27;re trying to determine through GANs a good way to both store the image and recover it while reducing bits per pixel and increasing the quality of reproduction. Using some classical terms, the GAN algorithm thus tweaks the compressor, the data storage format and the decompressor to optimize what should be &quot;hard-coded&quot; in the compressing&#x2F;decompressing process or program vs what will be stored as a result of the compression program.<p>Very handwavey but I think the general idea is right?
评论 #16817363 未加载
FrozenVoid大约 7 年前
For photo quality material this will be a detail loss, but some media(animation&#x2F;clip art&#x2F;compressed video) can benefit greatly if the algorithm of reconstruction is fast enough. They should compare it with AV1&#x2F;x265 codecs.
pornel大约 7 年前
It seems that some form of neural network based compression is the future, but how to go from academic one-off implementation to a widely deployable codec?
ttoinou大约 7 年前
Was thinking about this use case of neural networks for months... Glad to read a paper about that. Wonder how to adapt that to video
评论 #16817369 未加载
fredguth大约 7 年前
Loved the site. Great way to present research, will be even better with source code or a notebook.