See also H.265:<p>* <a href="https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding" rel="nofollow">https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding</a><p>And now even H.266:<p>* <a href="https://en.wikipedia.org/wiki/Versatile_Video_Coding" rel="nofollow">https://en.wikipedia.org/wiki/Versatile_Video_Coding</a><p>Also, at what point will AV1 become "mainstream"? How prevalent is it? Still seems that hardware decoding (never mind encoding) support is still only so-so.
For those interested in this topic, I highly recommend the approachable but more extensive technical introduction at <a href="https://github.com/leandromoreira/digital_video_introduction" rel="nofollow">https://github.com/leandromoreira/digital_video_introduction</a>
This is an awesome introduction to video coding, though it’s thin on motion compensation: <a href="https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node259.html" rel="nofollow">https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node259.h...</a><p>An elucidating exercise is to use python to do a DCOS or wavelet transform on an image and then quantize it to look at the results. It’s a few hundred lines of code if that and gives you a solid idea of the significance of working in the frequency domain and how that makes compression much easier.
PSA: For images, there's a finally successor to jpeg: <i>JPEG XL</i> (.jxl) - has lossy and lossless mode; is progressive (you can download just the first parts of the bitstream to get a lower resolution image; and other benefits!)<p><a href="https://jpegxl.info/" rel="nofollow">https://jpegxl.info/</a>
H.264 patents are not expiring yet, if anyone was wondering. 2027 seems to be when that happens. On the other hand, I believe H.263 patents already expired, and MPEG-4 ASP (DivX etc.) is expiring this year.
I worked on a project where I extracted the motion vectors from the h264 encoded stream from the camera, to detect motion. It's like a basic motion detector for free.
Does anything interesting happen if you take the frequency domain representation of an image, represent the frequency domain as an image itself, and compress that with some sort of image compression?<p>For example, encode the frequency domain representation as a low quality JPEG, and then undo the steps to turn it back into the "original". How do the JPEG artifacts on the frequency domain manifest in the resulting image?
Almost all of those encoding concepts mentioned are not introduced with H.264, but much earlier with MPEG-2 in the early 90's <a href="https://en.wikipedia.org/wiki/MPEG-2" rel="nofollow">https://en.wikipedia.org/wiki/MPEG-2</a>
There was once a blog titled "Diary Of An x264 Developer" that gave some interesting detail of how h264 worked and the and the x264 implementation. It's still available on via the internet archive.
Whatever it is, AFAIK, most of its significant patents will expire this decade: <a href="https://www.osnews.com/story/24954/us-patent-expiration-for-mp3-mpeg-2-h264/" rel="nofollow">https://www.osnews.com/story/24954/us-patent-expiration-for-...</a>
One thing I was curious about is how the PNG would compare after running it through an optimizer since that would be a more fair comparison since the H.264 encoding does optimize. Even so, I bet the H.264 would fare well. I did an experiment with using single-frame H.264 instead of images on a site: I think it’s a viable technique but didn’t have time to flesh it out in full. If you have some kind of asset pipeline for a site it’s not really more work to encode and the HTML is now a video tag with no player controls so isn’t a ton of work client side either. Would love to explore that more at some point.
For those interested, Ronald Bultje (who co-devloped VP9) wrote some information here on VP9 which has some more concrete details about video coding than this article:
<a href="https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-vp9-video-codec/" rel="nofollow">https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-v...</a><p>It's not H.264 but the coding techniques will be similar.
> Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures. This is obviously an oversimplification, but you've transformed some data into another shorter representation of the same information.<p>Future generations will disagree it's the "same information."<p>I predict there will be a small but vocal cabal who seize on your example with nine H's to argue that it's not the case.<p>On a more serious note, if you lose the decoder ring does it cease to be a "representation of the same information?"
The article starts off with talking about "raw" video containing 3 bytes for every pixel.<p>However, most video is YUV, which is typically 1.5 bytes/pixel.<p><a href="https://en.wikipedia.org/wiki/YUV" rel="nofollow">https://en.wikipedia.org/wiki/YUV</a>
I met one of the creators of H264 years ago while photographing a party in Palo Alto. He and his wife were really genuinely nice and interesting people. At the time H265 was not released but it sounded like they were actively working on it and even something past it
Just got a drone air 2s and it recorded in h.265. I tried to play it on 2013 MacBook retina and it plays a frame every minute or so. I can play 4k h.264 no problem. Am I missing something why its not playing
Is there as a (good) college-level text book on the topic of image and video encoders? I never took a digital image processing course .. curious if someone has a favorite book on this topic.
Back in college when I was taking Digital Image Processing class we discussed h.264. Truly impressive technology no doubt. This article goes in much more detail about it, job well done.
I stopped reading when the author started to do comparisons with PNG. PNG or alternatives should only be used when the errors created by lossy compression is too visible.<p>With that said. Improvements of audio and video compression over the last 25 years are very impressive and have changed how the world works in several areas.
Two questions for the compression gurus here.<p>Suppose you have a bunch of raw video. You take extracts of it and put them together to make a movie, M1. You make an H.264 encoded copy of that. Let's call it C1.<p>You then make a new cut of your movie, M2, which is mostly the same footage as M1 except that you've shortened a few scenes and lengthened others. You make an H.264 encoded copy of that. Call this C1.<p>When making C1 and C2 your H.264 encoders have to decide which frames to turn into I-frames and which to turn into P-frames.<p>If they just do something simple like make every Nth frame an I-frame then after the first different between M1 and M2 it is unlikely that C1 and C2 will have many I-frames in common, and therefore also not have many P-frames in common.<p>If they look for scene changes and make new I-frames on scene changes, then we might expect that at least for the scenes that start identically in M1 and M2 they will get identical I-frames and P-frames up to their first edit if any.<p>Scenes that are edited in the front would still end up encoded totally different in C1 and C2.<p>Question: are there any encoders that when encoding M2 to produce C2 can be given M1 and C1 as references using them to adjust I-frame spacing so as make as many C2 I-frames as possible match C1 I-frames?<p>That would allow C2 to be stored efficiently as a binary diff from C1. This could be handy if C1 and C2 needed to be checked into a version control system, or you needed to distribute C2 over a low bandwidth or expensive link to someone who already had C1.<p>The second question concerns recompressing after decompression. I actually thought of this question in terms of audio so will ask in those terms, but I guess it applies to video too.<p>Suppose someone has an uncompressed source S. They compress it with a lossy compressor producing C and distribute C to you. You decompress C producing S'.<p>You then compress S' with a lossy compressor (the same type that the original producer used--e.g.., if C is an MP3 you use an MP3 compressor) producing C'. I don't know about video, but for audio (at least back in days when MP3 was starting to get big) C' would be lower quality than C.<p>Are there any compressors that can figure out that they are dealing with something that already has undergone the "throw out imperceptible parts to make it more compressible" step done and just skip to the next stage, so they produce a C' that is a lossless representation of S'?
Fabrice Bellard's BPG image format uses H.265's keyframe compressor: <a href="https://bellard.org/bpg/" rel="nofollow">https://bellard.org/bpg/</a><p>Here is a side-by-side visual comparison: <a href="http://xooyoozoo.github.io/yolo-octo-bugfixes/#ballet-exercise&jpg=s&bpg=s" rel="nofollow">http://xooyoozoo.github.io/yolo-octo-bugfixes/#ballet-exerci...</a><p>Amazing.<p>I ported his BPG decoder to Android ARM for a pornographic app. See my comment history for details. It reduced data transfer by more than 60%.
<i>> "a technical walkthrough"</i><p><i>> "Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures."</i><p>Ahh, my favorite kind of technical walkthrough. Love it