FFmpeg School of Assembly Language

869 pointsby davikr3 months ago

29 comments

Another resource on the same topic: <a href="https://blogs.gnome.org/rbultje/2017/07/14/writing-x86-simd-using-x86inc-asm/" rel="nofollow">https://blogs.gnome.org/rbultje/2017/07/14/writing-x86-simd-...</a>As I'm seeing in the comments here, the usefulness of handwritten SIMD ranges from "totally unclear" to "mission critical". I'm seeing a lot on the "totally unclear" side, but not as much on the "mission critical", so I'll talk a bit about that.FFmpeg is a pretty clear use case because of how often it is used, but I think it is easier to quantify the impact of handwriting SIMD with something like dav1d, the universal production AV1 video decoder.dav1d is used pretty much everywhere, from major browsers to the Android operating system (superseding libgav1). A massive element of dav1d's success is its incredible speed, which is largely due to how much of the codebase is handwritten SIMD.While I think it is a good thing that languages like Zig have built-in SIMD support, there are some use cases where it becomes necessary to do things by hand because even a potential performance delta is important to investigate. There are lines of code in dav1d that will be run trillions of times in a single day, and they need to be as fast as possible. The difference between handwritten & compiler-generated SIMD can be up to 50% in some cases, so it is important.I happen to be somewhat involved in similar use cases, where things I write will run a lot of times. To make sure these skills stay alive, resources like the FFmpeg school of assembly language are pretty important, in my opinion.

评论 #43147116 未加载

评论 #43150078 未加载

评论 #43145300 未加载

评论 #43144963 未加载

buserror3 months ago

I used to do quite a bit of SIMD version of critical functions, but now I rarely do -- one thing to try is isolate that code, and run it in the Most Excellent Compiler Explorer [0].And stare at the generated code!More often than not, the auto-vectorisation now generates pretty excellent SIMD version of your function, and all you have to do is 'hint' the compiler -- for example explicitly list alignment, provide your own vector source/destination type -- you can do a lot by 'styling' your C code while thinking about what the compiler might be able to do with it -- for example, use extra intermediary variables, really break down all the operations you want etc.Worst case if REALLY the compiler isn't clever enough, this give you a good base to adapt the generated assembly to tweak, without having to actually write the boilerplate bits.In most case, the resulting C function will be vectorized as good, or better than the hand coded one I'd do -- and in many other cases, it's "close enough" not to matter that much. The other good news is that that code will probably vectorize fine for WASM and NEON etc without having to have explicit versions.[0] <a href="https://godbolt.org/" rel="nofollow">https://godbolt.org/</a>

评论 #43148677 未加载

评论 #43152807 未加载

评论 #43151397 未加载

评论 #43152839 未加载

评论 #43148651 未加载

评论 #43149073 未加载

kierank3 months ago

I am the author of these lessons.Ask me anything.

评论 #43147225 未加载

评论 #43144916 未加载

评论 #43145698 未加载

评论 #43149665 未加载

评论 #43145257 未加载

评论 #43144771 未加载

Daniel_Van_Zant3 months ago

I'm curious from anyone who has done it. Is there any "pleasure" to be had in learning or implementing assembly (like there is for LISP or RISC-V) or is it something you learn and implement because you want to do something else (like learning COBOL if you need to work with certain kinds of systems). It has always piqued my interest but I don't have a good reason in my day-to-day job to get into it. Wondering if it is worth committing some time to for the fun of it.

评论 #43142181 未加载

评论 #43143755 未加载

评论 #43142854 未加载

评论 #43142314 未加载

评论 #43142125 未加载

评论 #43142954 未加载

评论 #43142170 未加载

评论 #43144138 未加载

评论 #43142489 未加载

评论 #43144453 未加载

评论 #43158494 未加载

评论 #43148359 未加载

评论 #43142467 未加载

评论 #43146571 未加载

评论 #43147900 未加载

评论 #43142268 未加载

评论 #43143284 未加载

评论 #43145009 未加载

评论 #43142770 未加载

评论 #43142283 未加载

评论 #43144320 未加载

jupp0r3 months ago

I personally don't think there's much value in writing assembly (vs using intrinsics), but it's been really helpful to read it. I have often used Compiler Explorer (<a href="https://godbolt.org/" rel="nofollow">https://godbolt.org/</a>) to look at the assembly generated and understand optimizations that compilers perform when optimizing for performance.

评论 #43146372 未加载

slicktux3 months ago

Kudos for the K&R reference! That was the book I bought to learn C and programming in general. I had initially tried C++ as my first language but I found it too abstract to learn because I kept asking what was going on underneath the hood.

lukaslalinsky3 months ago

This is perfect. I used to know the x86 assembly at the time of 386, but for the more advanced processors, it was too complex. I'd definitely like to learn more about SIMD on recent CPUs, so this seems like a great resource.

foresto3 months ago

> Note that the “q” suffix refers to the size of the pointer *(*i.e in C it represents *sizeof(*src) == 8 on 64-bit systems, and x86asm is smart enough to use 32-bit on 32-bit systems) but the underlying load is 128-bit.I find that sentence confusing.I assume that i.e is supposed to be i.e., but What is *(* supposed to mean? Shouldn't that be just an open parenthesis?In what context would *sizeof(*src) be considered valid? As far as I know, sizeof never yields a pointer.I get the impression that someone sprinkled random asterisks in that sentence, or maybe tried to mix asterisks-denoting-italics with C syntax.

评论 #43142964 未加载

评论 #43155455 未加载

评论 #43144158 未加载

wruza3 months ago

I don’t care about the split, just wanted to say that this guide is so good. I wish I had this back when I was interested in low-low-level.

imglorp3 months ago

Asm is 10x faster than C? That was definitely true at some point but is it still true today? Have compilers really stagnated so badly they can't come close to hand coded asm?

评论 #43141469 未加载

评论 #43141437 未加载

评论 #43141407 未加载

评论 #43141962 未加载

评论 #43141406 未加载

评论 #43141447 未加载

评论 #43141904 未加载

评论 #43141473 未加载

评论 #43144237 未加载

xuhu3 months ago

"Assembly language of FFmpeg" leads me to think of -filter_complex. It's not for human consumption even once you know many of its gotchas (-ss and keyframes, PTS, labeling and using chain outputs, fading, mixing resolutions etc).But then again no-one is adjusting timestamps manually in batch scripts, so a high-level script on top of filter_complex doesn't have much purpose.

评论 #43143216 未加载

评论 #43141320 未加载

agumonkey3 months ago

I remember kempf saying most of recent development on codecs is in raw asm. Only logical that they can write some tutorials :)

fracus3 months ago

I'm halfway through this tutorial and I'm really enjoying it. I haven't touched assembly since back in university decades ago. I've always had an urge to optimize processes for some reason. This scratches that itch. I was also more curious about SIMD since hearing about it on Digital Foundry.

thayne3 months ago

It doesn't mention the downsides of using assembly. The biggest of which is that your code is architecture specific, so for example you have to write different code for x86 and arm, and possibly even different code for x86_64. Unfortunately, for SIMD, there isn't really a great way to write portable code for it, at least in C. Rust is working on stabilizing a portable simd API, and zig has simd support, but I suspect ffmpeg would still complain they aren't quite as fast as they would like.One thing that confuses me is the opposition to inline asm. It seems like inline asm would be more efficient than having to make a function call to an asm function.

评论 #43142111 未加载

评论 #43141864 未加载

评论 #43142856 未加载

评论 #43142092 未加载

评论 #43144581 未加载

评论 #43142006 未加载

评论 #43143892 未加载

评论 #43143451 未加载

评论 #43151465 未加载

eachro3 months ago

This looks great! Is there going to be exercises or a project based component as well?

fulafel3 months ago

SIMD was introduced in the 80s but become ubiquitous when Intel got in on it in the 90s. It's interesting that (for x86), PLT is still stuck at hand-writing assembly 40 years later.

neallindsay3 months ago

Things that you would expect every software developer to know today will one day become niche, low-level knowledge.

评论 #43195707 未加载

krick3 months ago

Huh, I didn't even know ffmpeg still actively employs assembly in its source code.

Charon773 months ago

This is very approachable and beginner friendly. Kudos to authors.

jancsika3 months ago

What's the cost of shuttling data in and out of SIMD land?

评论 #43143475 未加载

评论 #43143510 未加载

henning3 months ago

This is what Hacker News should be about. Awesome. Thank you.

sylware3 months ago

A gigantic mistake was done in much of ffmpeg assembly:They are abusing nasm macro-preprocessor up to obscene levels...

评论 #43141563 未加载

belter3 months ago

Uhmmm...Lots of praise but these are just three small lessons covering basics. Exercises not uploaded yet. Looks like a work in progress or in the beginning?

ej13 months ago

This os a great article!

mkoubaa3 months ago

I'm shocked there still isn't a hardware accelerator for video decoding.

评论 #43145322 未加载

评论 #43142064 未加载

评论 #43141911 未加载

评论 #43142263 未加载

toisanji3 months ago

Just wondering, would it make sense to use LLMs to translate higher level languages to assembly or to directly write in assembly?

评论 #43144092 未加载

评论 #43147956 未加载

beebaween3 months ago

I'm kind of stunned we haven't gotten something better / more rust based than ffmpeg?Especially curious given the advent of apple metal etc.Does anyone have recommendations?

评论 #43145266 未加载

评论 #43145184 未加载

imchaz3 months ago

I'll be honest, I didn't read through much. Ffmpeg gives me severe ptsd. My first task out of college was to write a procedurally generated video using ffmpeg, conform to dash, and get it under 150kb/s while being readable. Docs were unusable. Dash was only a few months old. And stackoverflow was devoid of help. I kid you not, the only way to get any insight was some sketchy IRC channel. (2016 btw, well past IRCs prime)

评论 #43145531 未加载

netr0ute3 months ago

The only thing I don't like about this is the focus on x86 assembly, which is a sinking ship because RISC-V is coming to eat its lunch, FAST.

评论 #43141056 未加载

评论 #43141088 未加载

评论 #43140963 未加载

评论 #43141055 未加载

评论 #43141316 未加载

评论 #43144678 未加载

评论 #43141148 未加载