The Fastest FizzBuzz Implementation

261 点作者 Croftengea超过 3 年前

26 条评论

atorodius超过 3 年前

Relevant: <a href="https://news.ycombinator.com/item?id=29031488" rel="nofollow">https://news.ycombinator.com/item?id=29031488</a>

评论 #29420249 未加载

lifthrasiir超过 3 年前

> The developer behind the Assembler version goes by the handle "ais523".Alex Smith [1] is also known for winning the Wolfram 2,3 Turing Machine Research Prize [2].[1] <a href="https://www.cs.bham.ac.uk/~ais523/" rel="nofollow">https://www.cs.bham.ac.uk/~ais523/</a>[2] <a href="https://writings.stephenwolfram.com/2007/10/the-prize-is-won-the-simplest-universal-turing-machine-is-proved/" rel="nofollow">https://writings.stephenwolfram.com/2007/10/the-prize-is-won...</a>

buro9超过 3 年前

The simplest change for performance even in Python, JavaScript, etc is to avoid the modulo function.All of the top contenders on <a href="https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz/" rel="nofollow">https://codegolf.stackexchange.com/questions/215216/high-thr...</a> do exactly this.Yes there is a lot of optimization specific to languages, OS, CPU... but the takeaway for me isn't that you have to go that extent to build an order of magnitude improvement, you can still achieve many order of magnitudes of improvement by understanding what is happening when you use something like the modulo function and that if you have requirements where you can avoid using it then that's the win.For FizzBuzz you are told it "iterates from 1 to <some number>"... and modulo would be perfect for "take a single number and produce Fizz, Buzz or FizzBuzz if it's divisable by 3, 5, or both 3 and 5"... but as we're iterating, counters are better suited to the task.I love seeing epic code golf (and the solution outlined in the article is it), but the takeaway for the majority of engineers is that significant improvements can be made with a readable and maintainable solution just by understanding the requirements and comp sci fundamentals.

评论 #29414553 未加载

richardwhiuk超过 3 年前

> Before running the code, make sure your CPU does support AVX2. ... You should see "sse2" at a minimum.AVX2 is not SSE2 - SSE2 is much, much older.You want to check for the avx2 flag itself.

parhamn超过 3 年前

When I saw this a few weeks ago it set off a lot of thinking about compilers and ML.Even the unoptimized C version is 10x slower than the one that is painstakingly optimized.Some of this performance gain will certain be achievable by ML/AI assisted compilers and translators. In that case, even a 2x speedup would be game changing. Is anyone using ML in compilers yet? The part where you verify the code still does what you wanted it to do seems trickiest.

评论 #29415884 未加载

DeathArrow超过 3 年前

It seems there's a huge gap between what machine code compilers can generate and hand written optimized assembler. There's a lot of room left for compilers to be improved.

评论 #29414577 未加载

评论 #29415366 未加载

评论 #29414249 未加载

评论 #29414864 未加载

评论 #29414300 未加载

solmag超过 3 年前

I think the original author said that it was harder to make this than his masters thesis.

DeathArrow超过 3 年前

The most interesting thing is the C version is almost as fast as assembler version.

评论 #29414307 未加载

javier10e6超过 3 年前

Asm 56 GB/s (you are reading Sanskrit) C 41 GB/s (you are reading US Tax Code) Rust 30 GB/s (you are reading English) Python (why bother) (you are speed reading ) So many tools, so many languages, life is great.

opentokix超过 3 年前

Imagine writing this asm-implementation to the 22 y/o google recruiter and fail the interview. :D

pxeger1超过 3 年前

This looks like a pretty rubbish blogspam regurgitation of the original SE answer.

w0mbat超过 3 年前

I played with FizzBuzz when this was mentioned here a month ago, focusing on the algorithm not optimizations.I stopped once I could do it in simple code with no math except addition, 15 numbers at a time.<pre><code> for (int x = 0 ; x < 1000000 ; x+= 15) { printf("%d\n%d\nfizz\n%d\nbuzz\nfizz\n%d\n%d\nfizz\nbuzz\n%d\nfizz\n%d\n%d\nfizzbuzz\n", 1 + x, 2 + x, 4 + x, 7 + x, 8 + x, 11 + x, 13 + x, 14 + x); } </code></pre> That is simple enough and fast enough for me.

评论 #29420293 未加载

andi999超过 3 年前

I dont understand the FizzBuzz thingy. I mean isnt it just about who has the fastest print method? Correct me please, but this problem has period 15, so if you just write:for (i=1;i<n;i+=15) { print i; print i+1; print 'fizz' print i+3 ... } You have the fastest , dont you? (if the ending n is important then one can just write 15 of such loops/endgame depending on the modulus). Since there is no 'if' the branch predictor should be on your side.

评论 #29415400 未加载

评论 #29415268 未加载

Terretta超过 3 年前

YAWFB: Yet another wrong fizzbuzz.“fizz” % 3, “buzz” % 5, what is the % 15 for?You can blow a interviewee’s mind by saying, great, now “bang” % 7… and (hopefully!) watch the light dawn.// why so serious? :-)

评论 #29416714 未加载

trevyn超过 3 年前

I wonder how an M1 Max assembly version would fare.

评论 #29415649 未加载

评论 #29414689 未加载

评论 #29414386 未加载

jprupp超过 3 年前

Hold my beer. Now were is that OpenCL programming book when I need it?

sshb超过 3 年前

I wonder about FPGA and io_uring implementations performances

hackater超过 3 年前

How do you learn about this? Does anyone have any good recommendation on books, where you learn how to do these kind of very low level optimizations?

6510超过 3 年前

I would like to propose (or remind) that for this type of problem the fastest implementation is paper.

bXVsbGVy超过 3 年前

How did they tested it? Is the data actually being delivered, and read, in order?

Maursault超过 3 年前

No machine code implementation? I wonder how much faster than assembler it could be.

评论 #29419331 未加载

GnarfGnarf超过 3 年前

581 lines of AssemblerInteresting, but not very practical.

评论 #29415310 未加载

sillysaurusx超过 3 年前

> As of this writing, the 3rd fastest submission is written in Rust and produces out at a rate of 3 GB/s, 2nd is written in C and produces at a rate of 41 GB/s and the fastest is written in Assembler and produces at a rate of 56 GB/s.This makes me happy. The order is exactly as it should be, much to the disappointment of the Rustaceans.It's a nice reminder that for the ultimate performance, there is no substitute for knowing exactly what the assembler is doing.It's also a nice reminder that most of us will never want to do that.EDIT: Whoa. It's late, and I thought Rust was at 30 GB/s instead of 3, i.e. roughly neck and neck with C. I meant to make a good-natured joke. There must be something wrong with the Rust implementation to be 10x slower; I wonder what it is.As I mentioned further down in the comments, I'll donate $500 to Watsi if someone manages to displace the C implementation in Rust or any other language besides C/C++/ASM. Be sure to DM me on Twitter (<a href="https://twitter.com/theshawwn" rel="nofollow">https://twitter.com/theshawwn</a>) if you do, both so I'll see it and so that I can congratulate you. I'm sure it's possible, but it's sort of interesting that the activation energy is so high.

评论 #29414378 未加载

评论 #29414277 未加载

评论 #29413960 未加载

评论 #29413926 未加载

评论 #29414438 未加载

评论 #29415351 未加载

评论 #29414707 未加载

评论 #29414350 未加载

评论 #29414168 未加载

评论 #29416513 未加载

评论 #29417032 未加载

评论 #29414202 未加载

评论 #29420208 未加载

recentdarkness超过 3 年前

interesting for me was the fact, that the author talks about themselves in the third person. Very unusual for some content like this :D

评论 #29413942 未加载

评论 #29413948 未加载

isodev超过 3 年前

Come on now, are we really using fizzbuzz as a performance benchmark? Seriously, don't base a real-world decisions on this.PS. The Rust implementation is also not very optimal, for example, the stdout is not locked so... what are we even trying to show :-).

评论 #29415435 未加载

gumby超过 3 年前

Pfft. No machine learning was used?

26 条评论

atorodius超过 3 年前

Relevant: <a href="https://news.ycombinator.com/item?id=29031488" rel="nofollow">https://news.ycombinator.com/item?id=29031488</a>

评论 #29420249 未加载

lifthrasiir超过 3 年前

buro9超过 3 年前

评论 #29414553 未加载

richardwhiuk超过 3 年前

> Before running the code, make sure your CPU does support AVX2. ... You should see "sse2" at a minimum.AVX2 is not SSE2 - SSE2 is much, much older.You want to check for the avx2 flag itself.

parhamn超过 3 年前

评论 #29415884 未加载

DeathArrow超过 3 年前

It seems there's a huge gap between what machine code compilers can generate and hand written optimized assembler. There's a lot of room left for compilers to be improved.

评论 #29414577 未加载

评论 #29415366 未加载

评论 #29414249 未加载

评论 #29414864 未加载

评论 #29414300 未加载

solmag超过 3 年前

I think the original author said that it was harder to make this than his masters thesis.

DeathArrow超过 3 年前

The most interesting thing is the C version is almost as fast as assembler version.

评论 #29414307 未加载

javier10e6超过 3 年前

opentokix超过 3 年前

Imagine writing this asm-implementation to the 22 y/o google recruiter and fail the interview. :D

pxeger1超过 3 年前

This looks like a pretty rubbish blogspam regurgitation of the original SE answer.

w0mbat超过 3 年前

评论 #29420293 未加载

andi999超过 3 年前

评论 #29415400 未加载

评论 #29415268 未加载

Terretta超过 3 年前

评论 #29416714 未加载

trevyn超过 3 年前

I wonder how an M1 Max assembly version would fare.

评论 #29415649 未加载

评论 #29414689 未加载

评论 #29414386 未加载

jprupp超过 3 年前

Hold my beer. Now were is that OpenCL programming book when I need it?

sshb超过 3 年前

I wonder about FPGA and io_uring implementations performances

hackater超过 3 年前

How do you learn about this? Does anyone have any good recommendation on books, where you learn how to do these kind of very low level optimizations?

6510超过 3 年前

I would like to propose (or remind) that for this type of problem the fastest implementation is paper.

bXVsbGVy超过 3 年前

How did they tested it? Is the data actually being delivered, and read, in order?

Maursault超过 3 年前

No machine code implementation? I wonder how much faster than assembler it could be.

评论 #29419331 未加载

GnarfGnarf超过 3 年前

581 lines of AssemblerInteresting, but not very practical.

评论 #29415310 未加载

sillysaurusx超过 3 年前

评论 #29414378 未加载

评论 #29414277 未加载

评论 #29413960 未加载

评论 #29413926 未加载

评论 #29414438 未加载

评论 #29415351 未加载

评论 #29414707 未加载

评论 #29414350 未加载

评论 #29414168 未加载

评论 #29416513 未加载

评论 #29417032 未加载

评论 #29414202 未加载

评论 #29420208 未加载

recentdarkness超过 3 年前

interesting for me was the fact, that the author talks about themselves in the third person. Very unusual for some content like this :D