H.266/Versatile Video Coding (VVC)

463 pointsby cautionalmost 5 years ago

30 comments

Unklejoealmost 5 years ago

It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.Is this continued improvement related to the improvement of technology? Or just coincidental?Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?Do we have algorithms today that can compress way better but would be too slow to encode/decode?

评论 #23750947 未加载

评论 #23747940 未加载

评论 #23749363 未加载

评论 #23747936 未加载

评论 #23751959 未加载

评论 #23747952 未加载

评论 #23748107 未加载

评论 #23756338 未加载

评论 #23754275 未加载

评论 #23754550 未加载

评论 #23747902 未加载

评论 #23748534 未加载

评论 #23751925 未加载

评论 #23754649 未加载

评论 #23748151 未加载

KitDuncanalmost 5 years ago

Can we all just agree on using AV1 instead of another patent encumbered format?

评论 #23747859 未加载

评论 #23747834 未加载

评论 #23747897 未加载

评论 #23747829 未加载

评论 #23754100 未加载

评论 #23747804 未加载

DrBazzaalmost 5 years ago

Naively hoped I'd read 'this will be released to the community under a GPL license' or similar. Instead found the words 'patent' and 'transparent licensing model'.I appreciate that it costs money and time to develop these algorithms, but when you're backed by multi-billion dollar "partners from industry including Apple, Ericsson, Intel, Huawei, Microsoft, Qualcomm, and Sony" perhaps they could swallow the costs? It is 2020 after all.

评论 #23748303 未加载

评论 #23749009 未加载

评论 #23748853 未加载

评论 #23747923 未加载

评论 #23749667 未加载

评论 #23748727 未加载

评论 #23747928 未加载

评论 #23748543 未加载

评论 #23747969 未加载

评论 #23748935 未加载

评论 #23749184 未加载

评论 #23748211 未加载

评论 #23749430 未加载

评论 #23747919 未加载

评论 #23748057 未加载

评论 #23748060 未加载

eigenvaluealmost 5 years ago

Can anyone verify if this is a real number? It’s possible sometimes to make surprising claims (such as 50% lower size) by relying on unusual or unrealistic situations. I would rather if they use a standard set of test videos with different content and resolutions, and some objective measure of fidelity to the original, when quoting these percentages. But if the 50% number is real, then that is truly remarkable. I wonder how many more CPU instructions are required per second of decoded video compared to HEVC.

评论 #23747867 未加载

评论 #23747939 未加载

评论 #23765582 未加载

评论 #23753223 未加载

评论 #23748083 未加载

anordalalmost 5 years ago

Nobody mentioning EVC? Worth a read for anyone concerned about patent licensing:<a href="https://en.wikipedia.org/wiki/Essential_Video_Coding" rel="nofollow">https://en.wikipedia.org/wiki/Essential_Video_Coding</a>There are 3 video coding formats expected out of (former) MPEG this year:<a href="https://www.streamingmedia.com/Articles/Editorial/Featured-Articles/Inside-MPEGs-Ambitious-Plan-to-Launch-3-Video-Codecs-in-2020-134694.aspx?utm_source=related_articles&utm_medium=gutenberg&utm_campaign=editors_selection" rel="nofollow">https://www.streamingmedia.com/Articles/Editorial/Featured-A...</a>So this isn't necessarily the successor to HEVC (except that it is, in terms of development and licensing methods).

clouddroveralmost 5 years ago

> A uniform and transparent licensing model based on the FRAND principle (i.e., fair, reasonable, and non-discriminatory) is planned to be established for the use of standard essential patents related to H.266/VVC.Maybe. On the other hand, maybe not. Leoanardo Chiariglione, founder and chairman of MPEG, thinks MPEG has for all practical purposes ceased to be:<a href="https://blog.chiariglione.org/a-future-without-mpeg/" rel="nofollow">https://blog.chiariglione.org/a-future-without-mpeg/</a>The disorganised and fractured licensing around HEVC contributed to that. And, so far, VVC's licensing looks like it's headed down the same path as HEVC.Maybe AV1's simple, royalty-free licensing will motivate them to get their act together with VVC licensing.

xiphias2almost 5 years ago

Shouldn't deep learning based video codecs take over dedicated hardware video decoders as more tensor cores become available in all new hardware?NVIDIA's DLSS 2.0 supersampling is already moving into that direction.

评论 #23748242 未加载

评论 #23765617 未加载

Havocalmost 5 years ago

The fact that the underground scene is still pumping 264 instead of 265 (I'd estimate 90/10 split optimistically) tells me the real world is not quite ready for 266.So I guess it comes down to 266 hw support. Or powerful CPUs that can push sw decoding?

blacklionalmost 5 years ago

What I don't understand, why do internationa; standardization organizations allows patent-encumbered technologies to become de-jure standards.MPEG, WiFi, GSM…IMHO, intentional standards must be implementable without any patent fees, or they are very bad standards.

评论 #23754209 未加载

评论 #23754243 未加载

评论 #23754606 未加载

评论 #23754274 未加载

评论 #23754422 未加载

nickysielickialmost 5 years ago

I'm no expert when it comes to video codecs but I'm surprised that we're still able to see such strong claims of algorithmic improvements to h264, and now to h265. I'm also aware of how patent-encumbered this whole field is and I'm skeptical that this is just a money grab.This is really just a press release, what's actually new? Can it be implemented efficiently in hardware?

评论 #23748606 未加载

cagenutalmost 5 years ago

that sounds great, but this is a press release with no real technical details. can anyone in the know add some context? for instance, whats the tradeoff? I assume more CPU?webrtc based video chats are all still using h264, did they not adopt 265 yet for technical or licensing reasons? what is the likelihood of broad browser support for h266 anytime soon?

评论 #23748164 未加载

评论 #23747852 未加载

评论 #23747979 未加载

评论 #23749448 未加载

donatjalmost 5 years ago

Huh. I wonder how encoding speeds compare. I rarely chose h265 over h264 because similar levels of visual quality took massively more time.

评论 #23748040 未加载

评论 #23748184 未加载

TekMolalmost 5 years ago

At some point the compressed version of "Joker" will be 45 chars:"Sequel to Dark Night starring Joaquin Phoenix"Of course we will not have to film movies in the first place then. We will just put a description into a compressor start watching.

评论 #23750477 未加载

baybal2almost 5 years ago

H.265 is still not mainstream, and not used to full extend of its performanceI'm not sure if 265 is worth spending efforts on now when 266 is about to crash the party, and will be equally adopted at least "equally poorly"

评论 #23749852 未加载

评论 #23748705 未加载

评论 #23751769 未加载

0-_-0almost 5 years ago

Question is, how does it compare to AV1?

评论 #23749297 未加载

prvcalmost 5 years ago

It is mildly amusing that the very simple vector art "VVC" logo on their webpage is displayed by sending the viewer a 711 KB .jpg file.

pxfalmost 5 years ago

Will it be used? Probably the last one that does not use some sort of AI Compression.See this for image compression <a href="https://hific.github.io/" rel="nofollow">https://hific.github.io/</a> In the next 10 years AI Compression will be everywhere. The problem will be standartisation. Classic compression algoritms can't beat AI ones.

评论 #23749981 未加载

znpyalmost 5 years ago

I wonder how small would one of those 700mb divx/xvid movies would be if compressed with this new encoding method.

crazygringoalmost 5 years ago

They talk about saving 50% of bits over h.265, but also talk about it being designed especially for 4K/8K video.Are normal 1080p videos going to see this fabled 50% savings over h.265? Or is the 50% only for 4K/8K, while 1080p gets maybe only 10-20% savings?The press release unfortunately seems rather ambiguous about this.

robomartinalmost 5 years ago

Among other things, I have worked with and developed technology in the uncompressed professional imaging domain for decades. One of the things I always watch out for is precisely the terminology and language used in this release:"for equal perceptual quality"Put a different way: We can fool your eyes/brain into thinking you are looking at the same images.For most consumer use cases where the objective is to view images --rather than process them-- this is fine. The human vision system (HVS, eyes + brain processing) is tolerant of and can handle lots of missing or distorted data. However, the minute you get into having to process the images in hardware or software things can change radically.Take, as an example, color sub-sampling. You start with a camera with three distinct sensors. Each sensor has a full frame color filter. They are optically coupled to see the same image through a prism. This means you sample the red, green and blue portions of the visible spectrum at full spatial resolution. If we are talking about a 1K x 1K image, you are capturing one million pixels of each, red, green and blue.BTW, I am using "1K" to mean one thousand, not 1024.Such a camera is very expensive and impractical for consumer applications. Enter the Bayer filter [0].You can now use a single sensor to capture all three color components. However, instead of having one million samples for each components you have 250K red, 500K green and 250K blue. Still a million samples total (that's the resolution of the sensor) yet you've sliced it up into three components.This can be reconstructed into full one million samples per color components through various techniques, one of them being the use of polyphase FIR (Finite Impulse Response) filters looking across a range of samples. Generally speaking, the wider the filter the better the results, however, you'll always have issues around the edges of the image. There are also more sophisticated solutions that apply FIR filters diagonally as well as temporally (use multiple frames).You are essentially trying to reconstruct the original image by guessing or calculating the missing samples. By doing so you introduce spatial (and even temporal) frequency domain issues that would not have been present in the case of a fully sampled (3 sensor) capture system.In a typical transmission chain the reconstructed RGB data is eventually encoded into the YCbCr color space [1]. I think of this as the first step in the perceptual "let's see what we can get away with" encoding process. YCbCr is about what the HVS sees. "Y" is the "luma", or intensity component. "Cb" and "Cr" are color difference samples for blue and red.However, it doesn't stop there. The next step is to, again, subsample some of it in order to reduce data for encoding, compression, storage and transmission. This is where you get into the concept of chroma subsampling [2] and terminology such as 4:4:4, 4:2:2, etc.Here, again, we reduce data by throwing away (not quite) color information. It turns out your brain can deal with irregularities in color far more so than in the luma, or intensity, portion of an image. And so, "4:4:4" means we take every sample of the YCbCr encoded image, while "4:2:2" means we cut down Cb and Cr in half.There's an additional step which encodes the image in a nonlinear fashion, which, again, is a perceptual trick. This introduces Y' (Y prime) as "luminance" rather than "luma". It turns out that your HVS is far more sensitive to minute detail in the low-lights (the darker portions of the image, say, from 50% down to black) than in the highlights. You can have massive errors in the highlights and your HVS just won't see them, particularly if things are blended through wide FIR filters during display. [3]Throughout this chain of optical and mathematical wrangling you are highly dependent on the accuracy of each step in the process. How much distortion is introduced depends on a range of factors, not the least of which is the way math is done in software or chips that touch every single sample's data. With so much math in the processing chain you have to be extremely careful about not introducing errors by truncation or rounding.We then introduce compression algorithms. In the case of motion video they will typically compress a reference frame as a still and then encode the difference with respect to that frame for subsequent frames. They divide an image into blocks of pixels and then spatially process these blocks to develop a dictionary of blocks to store, transmit, etc.The key technology in compression is the Discrete Cosine Transform (DCT) [4]. This bit of math transforms the image from the spatial domain to the frequency domain. Once again, we are trying to trick the eye. Reduce information the HVS might not perceive. We are not as sensitive to detail, which means it's safe to remove some detail. That's what DCT is about.So, we started with a 3 sensor full-sampling camera, reduced it to a single sensor and three away 75% of red samples, 50% of green samples and 75% of blue samples. We then reconstruct the full RGB data mathematically, perceptually encode it to YCbCr, apply gamma encoding if necessary, apply DCT to reduce high frequency information based on agreed-upon perceptual thresholds and then store and transmit the final result. For display on an RGB display we reverse the process. Errors are introduced every step of the way, the hope and objective being to trick the HVS into seeing an acceptable image.All of this is great for watching a movie or a TikTok video. However, when you work in machine vision or any domain that requires high quality image data, the issues with the processing chain presented above can introduce problems with consequences ranging from the introduction of errors (Was that a truck in front of our self driving car or something else?) to making it impossible to make valid use of the images (Is that a tumor or healthy tissue?).While H.266 sounds fantastic for TikTok or Netflix, I fear that the constant effort to find creative ways to trick the HVS might introduce issues in machine vision, machine learning and AI that most in the field will not realize. Unless someone has a reasonable depth of expertise in imaging they might very well assume the technology they are using is perfectly adequate for the task. Imagine developing a training data set consisting of millions of images without understanding the images have "processing damage" because of the way they were acquired and processed before they even saw their first learning algorithm.Having worked in this field for quite some time --not many people take a 20x magnifying lens to pixels on a display to see what the processing is doing to the image-- I am concerned about the divergence between HVS trickery, which, again, is fine for TikTok and Netflix and MV/ML/AI. A while ago there was a discussion on HN about ML misclassification of people of color. While I haven't looked into this in detail, I am convinced, based on experience, that the numerical HVS trickery I describe above has something to do with this problem. If you train models with distorted data you have to expect errors in classification. As they say, garbage-in, garbage-out.Nothing wrong with H.266, it sounds fantastic. However, I think MV/ML/AI practitioners need to be deeply aware of what data they are working with and how it got to their neural network. It is for this reason that we've avoided using off-the-shelf image processing chips to the extent possible. When you use an FPGA to process images with your own processing chain you are in control of what happens to every single pixel's data and, more importantly, you can qualify and quantify any errors that might be introduced in the chain.[0] <a href="https://en.wikipedia.org/wiki/Bayer_filter" rel="nofollow">https://en.wikipedia.org/wiki/Bayer_filter</a>[1] <a href="https://en.wikipedia.org/wiki/YCbCr" rel="nofollow">https://en.wikipedia.org/wiki/YCbCr</a>[2] <a href="https://en.wikipedia.org/wiki/Chroma_subsampling" rel="nofollow">https://en.wikipedia.org/wiki/Chroma_subsampling</a>[3] <a href="https://en.wikipedia.org/wiki/Gamma_correction" rel="nofollow">https://en.wikipedia.org/wiki/Gamma_correction</a>[4] <a href="https://www.youtube.com/watch?v=P7abyWT4dss" rel="nofollow">https://www.youtube.com/watch?v=P7abyWT4dss</a>

评论 #23765706 未加载

评论 #23755719 未加载

im3w1lalmost 5 years ago

50% is very impressive. It's not just a gold rush of low hanging fruit anymore, they did real work, created real benefits. I'm willing to pay a little tax on my devices or softwares for this.

jonpurdyalmost 5 years ago

My 2012 Mac Mini has quickly become much less useful since YouTube switched from H.264 (AVC) to VP9 for videos larger than 1080p a couple of years ago (Apple devices have hardware decoders). I've tested 4K h.264 videos and they play wonderfully thanks to the hardware.My internet connection speeds and hard drive space have increased much faster than my CPU speeds (internet being basically a free upgrade).So I don't appreciate new codecs coming out and obsoleting my hardware to save companies a few cents on bandwidth. H.264 got a good run in, but there isn't a "universal" replacement for it where I can buy hardware with decoding support that will work for at least 5-10 years.

评论 #23749921 未加载

评论 #23749961 未加载

评论 #23751048 未加载

评论 #23752167 未加载

评论 #23750641 未加载

评论 #23750811 未加载

superkuhalmost 5 years ago

I'd rather have slightly larger files that don't take hardware acceleration only available on modern CPU to decode without dying (ie, h264). Streaming is creating incentives for bad video codecs that only do one thing well: stream. Other aspects are neglected.And it's not like any actual 4K content (besides porn, real, nature, or otherwise) actually exists. Broadcast and movie media is done in 2K then extrapolated and scaled to "4K" for streaming services.

评论 #23755293 未加载

liquid153almost 5 years ago

Will devices need new hardware. Also I thought companies were all on board with royalty free VP9

mrfusionalmost 5 years ago

So how does it achieve this compression from a laypersons perspective?

qwerty456127almost 5 years ago

How many weeks does it take to encode a 1-minute video on an average (non-gaming, I mean without a huge fancy GPU card or an i9/Threadripper CPU) PC?

irrationalalmost 5 years ago

Anyone know how this compares to AV1?

m3kw9almost 5 years ago

It will be first adopted by pirates for sure

评论 #23748098 未加载

xvilkaalmost 5 years ago

Due to the patenting Fraunhofer probably did more harm to humanity than something good. At least its software division.

shmerlalmost 5 years ago

Another patent encumbered monstrosity? No, thanks. Enough of this junk. Some just never learn.