This doesn't seem to be comparing anything like the same thing. Does the Haskell version <i>really</i> do the same thing as the C version? Does it handle all of the same error cases, providing the same quality of error messages if they occur? Does it handle localization? If not, that makes the comparison <i>very</i> skewed, as unhammer already pointed out. Sure, if you strip out all of the things that the people who wrote wc actually spent all that time on then you can go faster, but failing to note the differences is simply dishonest.
So I just wrote the most naive version of wc I could think of in C, matching the capability of this Haskell version, and I <i>smoked</i> the system <i>wc</i> ... my code, unoptimised, was over twice as fast.<p>$ time wc Backups/Tera2/files.txt<p><pre><code> 1123699 2283439 161361844 Backups/Tera2/files.txt
real 0m2.010s
user 0m1.964s
sys 0m0.020s
</code></pre>
$ time naive Backups/Tera2/files.txt<p><pre><code> L: 1123699
W: 2283439
C: 161361844
real 0m0.864s
user 0m0.835s
sys 0m0.028s
</code></pre>
Smoked !!<p>(See my comment elsewhere[0] for a more sensible comment)<p>[0] <a href="https://news.ycombinator.com/item?id=22234673" rel="nofollow">https://news.ycombinator.com/item?id=22234673</a><p>--------<p>Update: I just compiled with -O3 and got user time of 0.24 secs. This version is 8 times faster than the system wc.
Not being great at reading Haskell, I have some questions I was hoping people here could answer:<p>* Does this cope with different whitespace, such as tabs?<p>* Does this cope with different settings of locale?<p>* Does this include the option of the "longest line"?<p>* Does this perform the character counts?<p>I'm pretty sure wc does all these, and that stripping them out would make it faster. If this Haskell version doesn't do that, and yet still compares against a fully-featured version of wc, the comparison hardly seems fair.
But how would Haskell version of wc compare with C version of wc running with LC_ALL=C environment variable? UTF-8 locale is much slower than C locale in coreutils, it's a well-known fact, and their Haskell version of wc is already using fixed 8-bit characters.
Write me a haskell program that will init DRAM on PinePhone, setup PMIC and eMMC, load 25MiB of linux kernel and initramfs from eMMC to DRAM and will fit in hard limit of 32kB, and will do all of this in 300ms max.<p>Then I'll consider C destroyed.
Destroying Haskell with 1 line of C: <a href="https://www.ioccc.org/2019/burton/prog.c" rel="nofollow">https://www.ioccc.org/2019/burton/prog.c</a>
I don't think it is very "hard" to "destroy" most of these programs, they were written a long time ago, they evolved with backwards compatibility or portability in mind, these can run on pretty much any system. It does seem a bit unfair to compare it to a quickly hacked together program that you test against one use case.<p>Like the other said, the second article will probably be more interesting.
I would really like to see fewer of these clickbait post titles. The author honestly admits here (props!) <a href="https://news.ycombinator.com/item?id=22235536" rel="nofollow">https://news.ycombinator.com/item?id=22235536</a> that they only used the title because it encourages discussion.<p>In that case, I would say that it's up to the community to not take clickbait titles like this seriously if we want to encourage reasoned, detailed content rather than borderline flaming with words like "destroy" and "smash". I personally don't enjoy this new Buzzfeed style of technical post at all.<p>Therefore, I have some comments about the actual code here, but going to keep them to myself so I don't encourage more people to follow OP's example.
Ah, the weekly "I beat C using X language".<p>I don't understand why LOC was even brought up. You can put all your code on a single line in most languages. The Github code linked even has 26 lines of Haskell which makes this even more nonsense.
Not unimpressive, but this a bit of a hot take. If I'm reading it right, a single file benchmark (Big O who?) against a fully fledged production C program is not a complete measurement
Related: I have an F# snippet i run from Linqpad to keep or remove lines (based on keywords) from huge log files and it takes just a few seconds to breeze through the log and it’s done
He is comparing a multi-threaded haskell version against a single-threaded C version???<p>"...There’s also a parallel version that relies on the monoidal structure of the problem a lot, and that one actually beats C"<p>coreutils wc is single-threaded, just checked.
> So we’ve managed to just smash a C program that was looked at by thousands of eyes of quite hardcore low-level Unix hackers over a few decades. We did this with a handful of lines of pure, mutation-less, idiomatic Haskell, achieving about 4 to 5 times of throughput of the C version and spending less than an hour on all the optimizations.<p>I've done many very arrogant things in my life, because I've been a strange guy with lack of self-esteem who doesn't have a clue how many things he doesn't know. I hope I've never been <i>this</i> arrogant, though.
The post is very insightful and well written. The title is provocative which is very common nowadays. However, there should be a "serious" section that puts things into perspective. As others have pointed out it leaves a bit of a taste otherwise.