TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Performance comparison: counting words in Python, C/C++, Awk, Rust, and more

368 pointsby goranmoominalmost 3 years ago

53 comments

mustache_kimonoalmost 3 years ago
Re: the Rust optimized implementation, I was able to get ~20-25% better performance by rewriting the for loops as iterators, using Rust&#x27;s byte range pattern matching, and a buffered writer, which seems crazy, but it&#x27;s true. I chalked it up to some crazy ILP&#x2F;SIMD tricks the compiler is doing.<p>I even submitted a PR[0], but Ben decided he was tired of maintaining and decided to archive the project (which fair enough!).<p>[0]: <a href="https:&#x2F;&#x2F;github.com&#x2F;benhoyt&#x2F;countwords&#x2F;pull&#x2F;115" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;benhoyt&#x2F;countwords&#x2F;pull&#x2F;115</a>
评论 #32220983 未加载
评论 #32220347 未加载
评论 #32221055 未加载
评论 #32215743 未加载
评论 #32218412 未加载
评论 #32231832 未加载
评论 #32232914 未加载
codefloalmost 3 years ago
For those curious about Rust optimization, the “optimized” version makes three changes compared to the “idiomatic” version:<p>* It uses byte strings instead of UTF-8 strings. In my opinion, that’s not an optimization, that’s changing the problem. Depending on the question you’re asking, only one of the two can be correct.<p>* It uses a faster hash algorithm. It’s not the first time this came up in a benchmark article. Rust’s decision to use a DOS-safe hash by default (and not provide a fast algorithm in the std, like other languages do) really seems to hurt it in that kind of microbenchmark.<p>* It uses get_mut+insert instead of the more convenient HashMap::entry method, because the latter would require redundantly allocating the key even in the repeat case. I’ve hit this problem in the past as well. Maybe the upcoming HashMap::raw_entry_mut will make this kind of optimization cleaner.
评论 #32215909 未加载
评论 #32216089 未加载
评论 #32217011 未加载
评论 #32232302 未加载
评论 #32216277 未加载
评论 #32215756 未加载
评论 #32217277 未加载
raffraffraffalmost 3 years ago
I love using the UNIX shell (pipes and programs that do &quot;one thing well&quot;) to the point where it hurts my ability to improve at other other languages. But sometimes all you need is a throwaway command to do something once, not a beautifully optimized function that will be written, improved, compiled, profiled, reoptimized and then executed a billion times. The utter tersity of a bash one-liner just tickles me.
评论 #32219357 未加载
评论 #32223587 未加载
compumikealmost 3 years ago
How about extending this to comparing programming language efficiency from a developer&#x27;s standpoint, looking at bytes of code, versus the execution time?<p>I took the source code file size from the repository, and the runtime from the blog post.<p>Then I made an arbitrary overall &quot;PAIN SCORE&quot; (lower is better) by multiplying code size * runtime. I suggest this is a worthwhile metric simply because lower is better on both axes, but of course, in the &quot;real world&quot; there will be different economic costs to CPU time and developer time depending on the use case. Here&#x27;s the sorted results, from least &quot;pain&quot; to most:<p><pre><code> LANGUAGE FILENAME CODE SIZE RUNTIME PAIN SCORE (LOWER IS BETTER) Shell optimized.sh 75 bytes 1.83 s 137.25 Crystal simple.cr 240 bytes 1.29 s 309.6 Nim simple.nim 424 bytes 0.77 s 326.48 Python simple.py 208 bytes 2.21 s 459.68 Ruby simple.rb 175 bytes 3.17 s 554.75 Go optimized.go 1514 bytes 0.40 s 605.6 Python optimized.py 464 bytes 1.33 s 617.12 Zig optimized.zig 2688 bytes 0.24 s 645.12 Go simple.go 688 bytes 1.12 s 770.56 Zig simple.zig 1394 bytes 0.55 s 766.7 Nim optimized.nim 1683 bytes 0.49 s 824.67 Shell simple.sh 60 bytes 14.81 s 888.6 Ruby optimized.rb 401 bytes 2.47 s 990.47 JavaScript simple.js 532 bytes 1.88 s 1000.16 C optimized.c 4360 bytes 0.23 s 1002.80 Rust optimized.rs 3065 bytes 0.43 s 1317.95 Swift simple.swift 317 bytes 4.23 s 1340.91 JavaScript optimized.js 1501 bytes 1.10 s 1651.1 C simple.c 2735 bytes 0.96 s 2625.6 Rust simple.rs 2239 bytes 1.38 s 3089.82 </code></pre> Sorting only by code size, the most concise implementations are: Shell, Ruby, Python, Crystal. Nobody was aiming to play code golf (i.e. minimize source code size), so these are fairly straightforward, idiomatic, readable implementations.<p>I am definitely a Crystal fan, in fact this afternoon I&#x27;m continuing to implement suggestions from my recent Show HN <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32081943" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32081943</a> comments. :)
评论 #32217787 未加载
评论 #32221252 未加载
评论 #32222055 未加载
评论 #32217348 未加载
评论 #32221276 未加载
评论 #32218433 未加载
评论 #32217229 未加载
评论 #32224629 未加载
评论 #32217879 未加载
HarHarVeryFunnyalmost 3 years ago
This is a rather meaningless comparison, since the differences are going to be dominated by:<p>1) The choice of libraries&#x2F;datatypes used for strings and word-&gt;count map<p>2) How the source file is split into words - probably library function again, although in C&#x2F;C++ one <i>could</i> choose to implement a super-optimized low level version that would blow the others away<p>IMO a performance comparison between languages is only meaningful if the runtime is not dominated by library functions, or if one admits it&#x27;s really a standard library performance comparison, not a language comparison.
评论 #32217610 未加载
评论 #32216348 未加载
评论 #32217632 未加载
评论 #32220036 未加载
评论 #32219721 未加载
评论 #32224369 未加载
anonymoushnalmost 3 years ago
Recently there was a contest to optimize the performance of a similar program[0] and a Zoom discussion of the optimizations[1].<p>The programs are not comparable in the the following ways:<p>- Case: TFA requires (at least) ASCII lowercasing but the contest problem required no lowercasing.<p>- Ordering: TFA does not require sorting, but the contest problem required sorting.<p>- Memory: TFA imposes a requirement phrased as &quot;don&#x27;t read the whole file into memory&quot; and this sounds like it&#x27;s a resource-saving constraint, but it&#x27;s usually a constraint that requires the program to spend additional resources. You could just mmap the file and store pointers into the mapped region. It costs extra to do copies instead of no copies.<p>- Text: TFA is unclear on what assumptions may be made about the lengths of words. For the contest problem, the Hungarian wikipedia input&#x27;s longest word is around 80k.<p>- Safe, Hashing, Stdlib: TFA imposes some restrictions on what constructs may be used that are not imposed in the contest problem.<p>For the contest version of this problem, it seems like you can tokenize, hash, and count strings at around 1GB&#x2F;s. Adapting a solution to solve TFA&#x27;s problem (but not to conform to its Safe&#x2F;Hashing&#x2F;Stdlib requirements) would probably not carry too large of a penalty, since it&#x27;s like 3 instructions to ASCII-lowercase 32 bytes and 1 string copy per unique string should take negligible time compared to the hash table lookups. So there is some room for the optimized solutions to go a little faster, if more optimizations are permitted.<p>[0]: <a href="https:&#x2F;&#x2F;easyperf.net&#x2F;blog&#x2F;2022&#x2F;05&#x2F;28&#x2F;Performance-analysis-and-tuning-contest-6" rel="nofollow">https:&#x2F;&#x2F;easyperf.net&#x2F;blog&#x2F;2022&#x2F;05&#x2F;28&#x2F;Performance-analysis-an...</a><p>[1]: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=R_yX0XjdSBY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=R_yX0XjdSBY</a>
jeffbeealmost 3 years ago
Funny how you can take the &quot;optimized&quot; C++ and make it significantly faster (vs. gnu&#x2F;linux standard libraries) with little effort. A lookup table for to_lower, instead of bit math, helps slightly. Building with the LLVM-libc string functions helps a lot, because the GNU std::string_view::operator== is calling an AVX-optmized memcmp via the PLT, which is a terrible strategy for short strings. LLVM-libc has an inlined bcmp (note: bcmp can be significantly faster than memcmp, but on GNU they are aliases) that is resolved for the target platform at build time instead of run time.<p>Edit:<p>Even if you ignore LLVM-libc, just slapping this into the optimized.cpp and replacing the critical `==` with `our_bcmp` makes it 10% faster. IFUNC calls to micro-optimized SIMD functions are counterproductive. It is far, far better that the compiler can see all the code at build time.<p><pre><code> int our_bcmp (const char* a, const char* b, size_t sz) { for (size_t i = 0; i &lt; sz; i++) { if (a[i] != b[i]) return 1; } return 0; }</code></pre>
djhworldalmost 3 years ago
EDIT: I think the use of the term &quot;allocation&quot; might be overloaded here from the original article, after thinking about this a bit more I think the author is referring to a different way of accessing the map. If you look in godbolt <a href="https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;Yf64rodW6" rel="nofollow">https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;Yf64rodW6</a> you can see that the pointer version uses<p><pre><code> CALL runtime.mapaccess2_fast64(SB) </code></pre> whereas the &#x27;raw&#x27; version uses<p><pre><code> CALL runtime.mapassign_fast64(SB) </code></pre> When reading the Go solution this bit stood out to me<p>&gt; To reduce the allocations, we’ll use a map[string]*int instead of map[string]int so we only have to allocate once per unique word, instead of for every increment<p>I just tried benchmarking this with a simple setup and I get zero allocations for both approaches, although the &quot;pointer&quot; approach is <i>slightly</i> faster<p><pre><code> package main import ( &quot;testing&quot; ) func BenchmarkIncrementMapRawInt(b *testing.B) { var data = make(map[int]int) b.ResetTimer() for i := 0; i &lt; b.N; i++ { key := i % 1000 data[key]++ } } func BenchmarkIncrementMapPointerInt(b *testing.B) { var data = make(map[int]*int) b.ResetTimer() for i := 0; i &lt; b.N; i++ { key := i % 1000 increment(data, key) } } func increment(counts map[int]*int, value int) { if p, ok := counts[value]; ok { *p++ return } n := 1 counts[value] = &amp;n } $ go test -bench=. -benchmem . goos: darwin goarch: arm64 BenchmarkIncrementMapRawInt-8 112288002 10.57 ns&#x2F;op 0 B&#x2F;op 0 allocs&#x2F;op BenchmarkIncrementMapPointerInt-8 139728302 8.586 ns&#x2F;op 0 B&#x2F;op 0 allocs&#x2F;op</code></pre>
评论 #32215548 未加载
yakubinalmost 3 years ago
I like the OCaml version. It&#x27;s both very easy to read and comes out really well in the benchmark.<p>I also looked at the &quot;simple&quot; Zig version, which came out really well in the benchmark, and to me it didn&#x27;t look simple at all. It seems you need to make a lot of low level details explicit.<p>But IMHO AWK takes the crown here. :)
评论 #32215820 未加载
评论 #32217051 未加载
gary17thealmost 3 years ago
Wow. Swift, touted as &quot;safe by design and (...) runs lightning-fast&quot;[1] is more of a screw-up than I thought. Almost twice as slow as Lua and behind even Pascal and Forth.<p>[1] <a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;swift&#x2F;" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;swift&#x2F;</a>
评论 #32215304 未加载
评论 #32215427 未加载
评论 #32215505 未加载
评论 #32215826 未加载
评论 #32215476 未加载
评论 #32223182 未加载
zX41ZdbWalmost 3 years ago
I&#x27;ve checked with ClickHouse and the result is better than I expect... it runs in 0.043 sec. on my machine, which is faster than any other result.<p>The code:<p>SELECT arrayJoin(splitByChar(&#x27; &#x27;, lower(line))) AS word, count() AS c FROM file(&#x27;kjvbible.txt&#x27;, LineAsString) WHERE notEmpty(word) GROUP BY word ORDER BY c DESC FORMAT Null<p>or:<p>clickhouse-local --query &quot;SELECT arrayJoin(splitByChar(&#x27; &#x27;, lower(line))) AS word, count() AS c FROM file(&#x27;kjvbible.txt&#x27;, LineAsString) WHERE notEmpty(word) GROUP BY word ORDER BY c DESC&quot; &gt; &#x2F;dev&#x2F;null<p>It is using only a single thread.
评论 #32220205 未加载
评论 #32221493 未加载
评论 #32220143 未加载
BiteCode_devalmost 3 years ago
In this type of blog post with comparisons including many languages, you usually see some unatural approach for your favorite language.<p>However, in this case, the Python example is idiomatic.<p>Since the articles calls for better approaches, I would suggest to take the opportunity to use the walrus operator and rsplit() for the optimized version. Something like this:<p><pre><code> reminding = &quot;&quot; c=Counter( ) while (chunk := sys.stdin.read(64 * 1024)): pre, post = chunk.lower().rsplit(&#x27;\n&#x27;, 1) c.update((reminding + pre ).split()) reminding = post </code></pre> It should not affect performance too much, and the code gets more expressive.<p>However, the performances will be quite different depending of the python version you use. Interestingly, Python 3.11 beta, which comes with a lot performance tweaks, is slower for this exercice, while being reported to be faster on real life tasks.
评论 #32216466 未加载
chrismorganalmost 3 years ago
&gt; <i>Incidentally, this problem set the scene for a wizard duel between two computer scientists several decades ago. In 1986, Jon Bentley asked Donald Knuth to show off “literate programming” with a solution to this problem, and he came up with an exquisite, ten-page Knuthian masterpiece. Then Doug McIlroy (the inventor of Unix pipelines) replied with a one-liner Unix shell version using tr, sort, and uniq.</i><p>Since this writing and other linked resources present only one side of the affair, I will mention: the presentation in <i>Pearls</i> was quite unfair to Knuth, and the conclusions commonly made of it somewhere between moderately and entirely unsound. (That is, as conclusions <i>from the paper</i>. I speak of unsoundness of logic only: these conclusions may be supportable from other sources.)<p>Knuth was challenged to demonstrate literate programming, and so <i>that was what he demonstrated</i>. He showed building something from the ground up assuming a very minimal environment, with problem analysis and all: of course that ended up more verbose than a Unix pipeline that glosses over problem analysis and starts with most of the tools you need to make a probably-acceptable solution!<p>McIlroy himself admitted a few years later that he had been “a little unfair” to criticise on engineering grounds what had only been an illustration of technique. Even in the paper, Bentley did admit a degree of culpability for this mismatch in his “criticism of programs” paragraph. (“He admires the execution of the solution, but faults the problem on engineering grounds. (That is, of course, my responsibility as problem assigner; Knuth solved the problem he was given on grounds that are important to most engineers-the paychecks provided by their problem assigners.)”)<p>But I do wish that Knuth had been given the opportunity to write a review of McIlroy’s review, for simultaneous publication.
评论 #32215351 未加载
评论 #32216232 未加载
评论 #32215458 未加载
mustache_kimonoalmost 3 years ago
... (2021). Great read and fun project. See the discussion from when it was originally posted by the author.[0]<p>[0]: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26463967" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26463967</a>
svatalmost 3 years ago
See also a different comparison (on ~the same problem) at StackExchange: <a href="https:&#x2F;&#x2F;codegolf.stackexchange.com&#x2F;questions&#x2F;188133&#x2F;bentleys-coding-challenge-k-most-frequent-words" rel="nofollow">https:&#x2F;&#x2F;codegolf.stackexchange.com&#x2F;questions&#x2F;188133&#x2F;bentleys...</a><p>(Copying from my comment the last time this was posted: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26467684" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=26467684</a>)<p>There&#x27;s also a nice book &quot;Exercises in Programming Style&quot; about just this problem.
评论 #32227044 未加载
评论 #32216307 未加载
ppeetteerralmost 3 years ago
Curious if anyone knows why the swift version is slow? Is it the casting from subsequence to a string?
评论 #32218759 未加载
评论 #32220587 未加载
Twirrimalmost 3 years ago
As they call out in the article, accurately counting words is a bit more complicated that splitting by space. You need to account for all sorts of punctuation etc.<p>When I&#x27;ve been asked this in programming interviews, I&#x27;ve almost always been expected to produce something like the code in the article (and do), but usually I&#x27;d point to something like the NLTK library as a better approach. It&#x27;s polished and highly capable, and handles probably just about every edge case there is.
staticassertionalmost 3 years ago
It&#x27;s a bit of a shame that Rust&#x27;s stdlib doesn&#x27;t have a &quot;fast but less safe&quot; hasher. It&#x27;s one of the easiest performance optimizations in my experience because it&#x27;s very rare that an attacker controls input to your hashmap in a way that can actually lead to a DOS.<p>Not that it doesn&#x27;t happen, it&#x27;s just not something I&#x27;ve run into.<p>Just today I was testing out the performance difference hashing some Scylla queries and got nearly 2x faster hashing moving to fxhash.
ribitalmost 3 years ago
These days I tend to use R for these type problems. It’s slow, but vectors as first-class data types simplify things a lot<p>readLines(file) |&gt; strsplit(“ +”) |&gt; unlist() |&gt; table() |&gt; sort(desc = TRUE)
jcelerieralmost 3 years ago
Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;jcelerier&#x2F;74dfd473bccec8f1bd5d78be5aada569" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;jcelerier&#x2F;74dfd473bccec8f1bd5d78be5a...</a> ; boost, fmt and <a href="https:&#x2F;&#x2F;github.com&#x2F;martinus&#x2F;robin-hood-hashing" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;martinus&#x2F;robin-hood-hashing</a><p><pre><code> $ g++ -I robin-hood-hashing&#x2F;src&#x2F;include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt $ time .&#x2F;a.out &lt; kjvbible_x10.txt &gt; &#x2F;dev&#x2F;null 0,19s user 0,01s system 99% cpu 0,197 total </code></pre> with the same build flags, optimize.cpp gives me<p><pre><code> 0,22s user 0,01s system 99% cpu 0,233 total</code></pre>
osigurdsonalmost 3 years ago
C# really needs a GetOrAdd method on the regular Dictionary class (as exists on ConcurrentDictionary). This common pattern (from the optimized version) requires the hash function on &quot;word&quot; to be evaluated twice (once for TryGetValue and once for Add). Then, if there is a collision on the 32 bit hash then the equality check needs to be run twice as well. Not good as all of these are O(N) operations (where N is the size of &quot;word&quot;). I suspect the optimized version would be materially faster if they implemented their own dictionary that includes this.<p><pre><code> if (!counts.TryGetValue(word, out var wordCountRef)) { counts.Add(word, new Ref&lt;int&gt;(1)); }</code></pre>
Karellenalmost 3 years ago
Doesn&#x27;t the Unix Shell implementation break the &quot;threading&quot; constraint, in that each program in the pipe will have a main thread each, and all the programs will run concurrently?<p>Or should the &quot;threading&quot; constraint really have been stated as &quot;you can use multiple threads, but only if they all have their own address space&quot;?<p>Alternatively, given the multi-core capabilities of even budget&#x2F;mobile&#x2F;small systems these days (even the Rasberry Pi 4 has a quad-core CPU), isn&#x27;t restricting the implementation to a single thread a weird artificial limitation in 2022? Also, a GPU-based implementation would be interesting, and GPUs most of their power from their massively parallel core architecture.
评论 #32217846 未加载
kramergeralmost 3 years ago
&gt; When optimizing this, the first thing to do is compile with optimizations enabled (g++ -O2). I kind of like the fact that with Go you don’t have to worry about this – optimizations are always on.<p>I think it is more fair to say that in Go optimizations are always OFF.
评论 #32224029 未加载
josefxalmost 3 years ago
No need to decode the C++ names when you use valgrind&#x2F;callgrind&#x2F;kcachegrind for C, that works just as well for C++.
gibsonf1almost 3 years ago
What code was used for Common Lisp? I&#x27;m guessing a well written version would be in the top contenders for speed.
评论 #32218201 未加载
yb303almost 3 years ago
I had a 1 minute look at the &quot;optimized&quot; C code - it&#x27;s calling malloc for every word.
评论 #32219606 未加载
评论 #32219449 未加载
Mllleralmost 3 years ago
Taking it in a slightly different direction – because in my work chunking is nearly never necessary (too little text or too much memory) whereas Unicode characters and punctuation nearly always is – I would tend to something like:<p><pre><code> from collections import Counter from re import finditer word_counts = Counter() with open(PATH, encoding = &#x27;utf-8&#x27;) as file: doc = file.read() word_counts.update(match.group().lower() for match in finditer(r&#x27;\w+&#x27;, doc)) </code></pre> In Python, this could be a fairly performant way of taking unicode into account, because it doesnʼt use Pythonʼs for-loop, and regular expressions are rather optimized compared with writing low-level-style Python. (Maybe it should use casefold instead of lower.)
gcanyonalmost 3 years ago
My language of choice, LiveCode, is a superset of HyperTalk. The concept of &quot;words&quot; has been built-in (not a library) since the &#x27;80s. Hash tables since the &#x27;90s.<p>A solution to every aspect of this problem but the line-by-line bit took just over a minute to write. Adding that took another minute.<p>The trade-off is that LC is slower than almost any of the languages listed, and optimization is almost impossible -- the idiomatic way <i>is</i> the fastest (almost). And if you don&#x27;t like the way LC parses words, there is almost no way to change it without going into the source code (a big pain).
unhammeralmost 3 years ago
Hm, I tried comparing the simple.hs vs simple.py (with ghc 8.10.7 which I happened to have here, their stack.yaml uses 8.10.4) and it&#x27;s 5.7s vs 1.9s, not quite as stark a difference. Maybe because I&#x27;m on Linux?
评论 #32231665 未加载
anonualmost 3 years ago
q solution in kdb, runs in about 2.5s on my machine vs simple.py which takes 4.7s. Rough interpolation with the results table puts this on par with Rust&#x2F;Go&#x2F;OCaml or other compiled languages.<p><pre><code> \t desc count each group `$lower &quot; &quot; vs raze read0 `:kjvbible_x10.txt </code></pre> It&#x27;s also 1 line!<p>I am sure q pros or k purists can optimize this even more...<p>EDIT: Moving the lower earlier brings this down to 1.9s<p><pre><code> desc count each group `$ &quot; &quot; vs lower raze read0 `:kjvbible_x10.txt</code></pre>
samuellalmost 3 years ago
Fun stuff! Did run a similar thing with a simple bioinformatics problem before (calculating the ratio of G and Cs against A+G+C+T), also with a whole bunch of contributors:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;samuell&#x2F;gccontent-benchmark#readme" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;samuell&#x2F;gccontent-benchmark#readme</a><p>Really hard - or impossible - to arrive at a definitive single number for one language, but the whole exercise is a lot of fun and quite informative IMO :)
评论 #32228377 未加载
lremalmost 3 years ago
This site makes Chrome (Android, current) stop responding.
评论 #32216289 未加载
benreesmanalmost 3 years ago
This was always my approach when I had do “leetcode whiteboard interviews”.<p>Start with something not too far north of FizzBuzz but with a ton of scope to progressively enrich the problem until the clock runs out.<p>At $BIG_CO du jour people tend to talk about how much “signal” an interview produces, and starting somewhere that a meaningful number of candidates can’t do anything with offers very little “signal”, likewise very little Shannon information.<p>Great article!
thaynealmost 3 years ago
I&#x27;m curious why go is faster than rust in the benchmarks. My best theory is that the program doesn&#x27;t run long enough to trigger a GC, so rust is paying the cost of freeing memory, but go isn&#x27;t. (Using an arena allocator in rust would probably produce similar behavior).<p>Or maybe the difference is just noise? It&#x27;s hard to tell without more details on how the benchmarks were run.
评论 #32232402 未加载
评论 #32220414 未加载
评论 #32230544 未加载
zmmmmmalmost 3 years ago
It seems like this is just measuring a single pass at reading a file with about 100k words in it. I ran the benchmark and it finished almost instantly. So languages with startup overhead and warmup time (eg: JIT etc) will totally underperform here. Makes sense then why Java is almost identical between optimised and non-optimised versions.
评论 #32228930 未加载
评论 #32222949 未加载
ijidakalmost 3 years ago
&gt; We usually think of I&#x2F;O as expensive, but I&#x2F;O isn’t the bottleneck here...The tokenization and hash table operations are the bottleneck by a long shot<p>Interesting. This is something I&#x27;ll have to keep in mind.<p>Best practice is always been to consider I&#x2F;O the slowest.<p>But, it&#x27;s true... times have changed.<p>Maybe we shouldn&#x27;t make this assumption anymore.
评论 #32218475 未加载
评论 #32219640 未加载
nr2xalmost 3 years ago
Some of these efficiency gains are remarkable. Most of the code I write is just the quickest way to write a mundane function. Really curious if you went into a huge tech co and devoted ~10% of SWE time solely to optimization of code base what the energy savings would be.
评论 #32217083 未加载
kramergeralmost 3 years ago
For performance comparison between languages, this article is not very useful. Nothing against the author, this task is simply impossible :)<p>Om the other hand the article is gold mine for learning performance analysis in different languages.
RantyDavealmost 3 years ago
Interesting, it confirms what I&#x27;ve suspected for a while: that by using Python and not C++ for &quot;general text mashing&quot; tasks I&#x27;m not losing that much performance. Like, 2x, 5x, perhaps 10x. But not 100x.
gxtalmost 3 years ago
&gt; a simple interview problem<p>I&#x27;ve been asking different versions of that question since I started interviewing and it is a surprisingly good filter.<p>But with articles like this, now I need new questions..
bsaulalmost 3 years ago
surprised to see swift 4 times as slow as non-optimized go.<p>Anyone has an explanation ?<p>I know the swift string type is really complex, but i always assumed it was at least performing well..
评论 #32216136 未加载
评论 #32216334 未加载
评论 #32220620 未加载
habiburalmost 3 years ago
You need to measure memory usage with it, take the product of the two (memory and speed), and thus get a very good number on performance for comparison.
fallingfrogalmost 3 years ago
Your Perl implementation is almost as fast as c++? Something weird is going on here. Maybe you are bottlenecked on disk read speeds or something.
评论 #32217472 未加载
评论 #32222913 未加载
评论 #32218140 未加载
yb303almost 3 years ago
Another 1 minute lookin... multiplication on every byte.. really? If this is supposed to be fast, get one big buffer for the words. Get another if you run out. The hash function can be the word itself for words up to length 8. Longer words would need a mult per 8 bytes. You can do branchless strchr. Roll your own memmove - just always copy 24 bytes and terminate at the length you already know. The whole point of using C or C++ is that you can do non idiomatic things when you have to.
lxealmost 3 years ago
&gt; I tried running it using the PyPy optimizing compiler, but it’s significantly slower for some reason<p>I wonder why that is?
githubholobeatalmost 3 years ago
Nice 3rd place for Zig implementation. But is it missing `defer arena.deinit();`?
claytongulickalmost 3 years ago
My takeaway is that unoptimized c++ is pretty close to the same performance as JavaScript.<p>Wild.
评论 #32216342 未加载
ncmncmalmost 3 years ago
You can tell it will be BS just from the title treating C and C++ as if they were one language, not two very different languages.<p>Other comments here reveal how, exactly.
评论 #32232412 未加载
manv1almost 3 years ago
This is the executive version of this test: &quot;I gave this problem to my team and it took them two weeks to figure out a solution.&quot;
svnpennalmost 3 years ago
You included Awk in the title, but omitted Go, really?
评论 #32216180 未加载
fractalbalmost 3 years ago
grep is indeed fast!
jerryjerryjerryalmost 3 years ago
Our distributed NewSQL database project actually uses three of them, Go, Rust, and C++, for different layers. There are pros and cons for each of them, but it still challenge to maintain three languages in a one system development. Performance wise, C++ and Rust are good, coding wise Go is easiest but Rust is kind of having best practice.
FpUseralmost 3 years ago
I would not call it performance comparison at all. When Python call functions written in C it is not Python&#x27;s performance. Write those functions using plain Python and then see the results. Sure for this basic example in the article it does not matter from a practical standpoint. But when you need to step away from canned cases suddenly Python&#x27;s performance sucks big time.
评论 #32215893 未加载
评论 #32215620 未加载
评论 #32219659 未加载
评论 #32234009 未加载
评论 #32220557 未加载