H.264 is Magic (2016)

714 pointsby goranmoominabout 3 years ago

30 comments

dangabout 3 years ago

Related:H.264 is magic (2016) - <a href="https://news.ycombinator.com/item?id=19997813" rel="nofollow">https://news.ycombinator.com/item?id=19997813</a> - May 2019 (180 comments)H.264 is Magic – a technical walkthrough - <a href="https://news.ycombinator.com/item?id=17101627" rel="nofollow">https://news.ycombinator.com/item?id=17101627</a> - May 2018 (1 comment)H.264 is Magic - <a href="https://news.ycombinator.com/item?id=12871403" rel="nofollow">https://news.ycombinator.com/item?id=12871403</a> - Nov 2016 (219 comments)

throw0101aabout 3 years ago

See also H.265:* <a href="https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding" rel="nofollow">https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding</a>And now even H.266:* <a href="https://en.wikipedia.org/wiki/Versatile_Video_Coding" rel="nofollow">https://en.wikipedia.org/wiki/Versatile_Video_Coding</a>Also, at what point will AV1 become "mainstream"? How prevalent is it? Still seems that hardware decoding (never mind encoding) support is still only so-so.

评论 #30713654 未加载

评论 #30710798 未加载

评论 #30711085 未加载

alevskayaabout 3 years ago

For those interested in this topic, I highly recommend the approachable but more extensive technical introduction at <a href="https://github.com/leandromoreira/digital_video_introduction" rel="nofollow">https://github.com/leandromoreira/digital_video_introduction</a>

评论 #30720790 未加载

评论 #30715419 未加载

rayinerabout 3 years ago

This is an awesome introduction to video coding, though it’s thin on motion compensation: <a href="https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node259.html" rel="nofollow">https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node259.h...</a>An elucidating exercise is to use python to do a DCOS or wavelet transform on an image and then quantize it to look at the results. It’s a few hundred lines of code if that and gives you a solid idea of the significance of working in the frequency domain and how that makes compression much easier.

yborisabout 3 years ago

PSA: For images, there's a finally successor to jpeg: JPEG XL (.jxl) - has lossy and lossless mode; is progressive (you can download just the first parts of the bitstream to get a lower resolution image; and other benefits!)<a href="https://jpegxl.info/" rel="nofollow">https://jpegxl.info/</a>

评论 #30713253 未加载

评论 #30713436 未加载

评论 #30713037 未加载

评论 #30713301 未加载

评论 #30714716 未加载

评论 #30713373 未加载

评论 #30714133 未加载

评论 #30712010 未加载

评论 #30713971 未加载

评论 #30715895 未加载

评论 #30713458 未加载

评论 #30715126 未加载

评论 #30715806 未加载

userbinatorabout 3 years ago

H.264 patents are not expiring yet, if anyone was wondering. 2027 seems to be when that happens. On the other hand, I believe H.263 patents already expired, and MPEG-4 ASP (DivX etc.) is expiring this year.

评论 #30711119 未加载

评论 #30716127 未加载

评论 #30714831 未加载

vagab0ndabout 3 years ago

I worked on a project where I extracted the motion vectors from the h264 encoded stream from the camera, to detect motion. It's like a basic motion detector for free.

评论 #30714532 未加载

jstanleyabout 3 years ago

Does anything interesting happen if you take the frequency domain representation of an image, represent the frequency domain as an image itself, and compress that with some sort of image compression?For example, encode the frequency domain representation as a low quality JPEG, and then undo the steps to turn it back into the "original". How do the JPEG artifacts on the frequency domain manifest in the resulting image?

评论 #30711113 未加载

评论 #30711549 未加载

评论 #30711144 未加载

评论 #30712757 未加载

评论 #30711068 未加载

isaakabout 3 years ago

Almost all of those encoding concepts mentioned are not introduced with H.264, but much earlier with MPEG-2 in the early 90's <a href="https://en.wikipedia.org/wiki/MPEG-2" rel="nofollow">https://en.wikipedia.org/wiki/MPEG-2</a>

评论 #30711097 未加载

评论 #30712832 未加载

评论 #30715201 未加载

评论 #30715134 未加载

ollybeeabout 3 years ago

There was once a blog titled "Diary Of An x264 Developer" that gave some interesting detail of how h264 worked and the and the x264 implementation. It's still available on via the internet archive.

评论 #30716396 未加载

marcodiegoabout 3 years ago

Whatever it is, AFAIK, most of its significant patents will expire this decade: <a href="https://www.osnews.com/story/24954/us-patent-expiration-for-mp3-mpeg-2-h264/" rel="nofollow">https://www.osnews.com/story/24954/us-patent-expiration-for-...</a>

memcoabout 3 years ago

One thing I was curious about is how the PNG would compare after running it through an optimizer since that would be a more fair comparison since the H.264 encoding does optimize. Even so, I bet the H.264 would fare well. I did an experiment with using single-frame H.264 instead of images on a site: I think it’s a viable technique but didn’t have time to flesh it out in full. If you have some kind of asset pipeline for a site it’s not really more work to encode and the HTML is now a video tag with no player controls so isn’t a ton of work client side either. Would love to explore that more at some point.

评论 #30714229 未加载

credulouspersonabout 3 years ago

For those interested, Ronald Bultje (who co-devloped VP9) wrote some information here on VP9 which has some more concrete details about video coding than this article: <a href="https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-vp9-video-codec/" rel="nofollow">https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-v...</a>It's not H.264 but the coding techniques will be similar.

coinabout 3 years ago

Pretty much all the techniques there are used in MPEG-2. I would have like to hear about the H.264 improvements.

jancsikaabout 3 years ago

> Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures. This is obviously an oversimplification, but you've transformed some data into another shorter representation of the same information.Future generations will disagree it's the "same information."I predict there will be a small but vocal cabal who seize on your example with nine H's to argue that it's not the case.On a more serious note, if you lose the decoder ring does it cease to be a "representation of the same information?"

评论 #30713381 未加载

评论 #30714988 未加载

the-dudeabout 3 years ago

The article starts off with talking about "raw" video containing 3 bytes for every pixel.However, most video is YUV, which is typically 1.5 bytes/pixel.<a href="https://en.wikipedia.org/wiki/YUV" rel="nofollow">https://en.wikipedia.org/wiki/YUV</a>

评论 #30710861 未加载

评论 #30710725 未加载

评论 #30712000 未加载

评论 #30710724 未加载

Melatonicabout 3 years ago

I met one of the creators of H264 years ago while photographing a party in Palo Alto. He and his wife were really genuinely nice and interesting people. At the time H265 was not released but it sounded like they were actively working on it and even something past it

dukeofdoomabout 3 years ago

Just got a drone air 2s and it recorded in h.265. I tried to play it on 2013 MacBook retina and it plays a frame every minute or so. I can play 4k h.264 no problem. Am I missing something why its not playing

throwawaybbq1about 3 years ago

Is there as a (good) college-level text book on the topic of image and video encoders? I never took a digital image processing course .. curious if someone has a favorite book on this topic.

评论 #30721872 未加载

atum47about 3 years ago

Back in college when I was taking Digital Image Processing class we discussed h.264. Truly impressive technology no doubt. This article goes in much more detail about it, job well done.

graderjsabout 3 years ago

Scoooped: <a href="https://news.ycombinator.com/item?id=30686148" rel="nofollow">https://news.ycombinator.com/item?id=30686148</a> ;) ;p xx ;p

评论 #30710923 未加载

评论 #30710792 未加载

lekeviciusabout 3 years ago

Someone do this for AV1 / AVIF, and particularly how it compares to H.264 and HEVC / HEIF. I'd love a deep dive.

la64710about 3 years ago

Beautiful beautiful beautiful !!

AtNightWeCodeabout 3 years ago

I stopped reading when the author started to do comparisons with PNG. PNG or alternatives should only be used when the errors created by lossy compression is too visible.With that said. Improvements of audio and video compression over the last 25 years are very impressive and have changed how the world works in several areas.

评论 #30716154 未加载

extedabout 3 years ago

Very intuitive explanations. Love it!

tzsabout 3 years ago

Two questions for the compression gurus here.Suppose you have a bunch of raw video. You take extracts of it and put them together to make a movie, M1. You make an H.264 encoded copy of that. Let's call it C1.You then make a new cut of your movie, M2, which is mostly the same footage as M1 except that you've shortened a few scenes and lengthened others. You make an H.264 encoded copy of that. Call this C1.When making C1 and C2 your H.264 encoders have to decide which frames to turn into I-frames and which to turn into P-frames.If they just do something simple like make every Nth frame an I-frame then after the first different between M1 and M2 it is unlikely that C1 and C2 will have many I-frames in common, and therefore also not have many P-frames in common.If they look for scene changes and make new I-frames on scene changes, then we might expect that at least for the scenes that start identically in M1 and M2 they will get identical I-frames and P-frames up to their first edit if any.Scenes that are edited in the front would still end up encoded totally different in C1 and C2.Question: are there any encoders that when encoding M2 to produce C2 can be given M1 and C1 as references using them to adjust I-frame spacing so as make as many C2 I-frames as possible match C1 I-frames?That would allow C2 to be stored efficiently as a binary diff from C1. This could be handy if C1 and C2 needed to be checked into a version control system, or you needed to distribute C2 over a low bandwidth or expensive link to someone who already had C1.The second question concerns recompressing after decompression. I actually thought of this question in terms of audio so will ask in those terms, but I guess it applies to video too.Suppose someone has an uncompressed source S. They compress it with a lossy compressor producing C and distribute C to you. You decompress C producing S'.You then compress S' with a lossy compressor (the same type that the original producer used--e.g.., if C is an MP3 you use an MP3 compressor) producing C'. I don't know about video, but for audio (at least back in days when MP3 was starting to get big) C' would be lower quality than C.Are there any compressors that can figure out that they are dealing with something that already has undergone the "throw out imperceptible parts to make it more compressible" step done and just skip to the next stage, so they produce a C' that is a lossless representation of S'?

评论 #30716152 未加载

评论 #30715549 未加载

评论 #30715875 未加载

评论 #30715071 未加载

评论 #30714307 未加载

marcoriolabout 3 years ago

Nice article!

tmp65535about 3 years ago

Fabrice Bellard's BPG image format uses H.265's keyframe compressor: <a href="https://bellard.org/bpg/" rel="nofollow">https://bellard.org/bpg/</a>Here is a side-by-side visual comparison: <a href="http://xooyoozoo.github.io/yolo-octo-bugfixes/#ballet-exercise&jpg=s&bpg=s" rel="nofollow">http://xooyoozoo.github.io/yolo-octo-bugfixes/#ballet-exerci...</a>Amazing.I ported his BPG decoder to Android ARM for a pornographic app. See my comment history for details. It reduced data transfer by more than 60%.

评论 #30712077 未加载

评论 #30711491 未加载

评论 #30711673 未加载

评论 #30710846 未加载

评论 #30711092 未加载

评论 #30713160 未加载

评论 #30712303 未加载

评论 #30713445 未加载

评论 #30710995 未加载

评论 #30710994 未加载

airstrikeabout 3 years ago

> "a technical walkthrough"> "Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures."Ahh, my favorite kind of technical walkthrough. Love it

评论 #30711763 未加载

评论 #30711460 未加载

评论 #30711552 未加载

评论 #30712261 未加载

ksecabout 3 years ago

Missing 2016 in the title.