Performance comparison: counting words in Python, C/C++, Awk, Rust, and more

368 pointsby goranmoominalmost 3 years ago

53 comments

Re: the Rust optimized implementation, I was able to get ~20-25% better performance by rewriting the for loops as iterators, using Rust's byte range pattern matching, and a buffered writer, which seems crazy, but it's true. I chalked it up to some crazy ILP/SIMD tricks the compiler is doing.I even submitted a PR[0], but Ben decided he was tired of maintaining and decided to archive the project (which fair enough!).[0]: <a href="https://github.com/benhoyt/countwords/pull/115" rel="nofollow">https://github.com/benhoyt/countwords/pull/115</a>

评论 #32220983 未加载

评论 #32220347 未加载

评论 #32221055 未加载

评论 #32215743 未加载

评论 #32218412 未加载

评论 #32231832 未加载

评论 #32232914 未加载

codefloalmost 3 years ago

For those curious about Rust optimization, the “optimized” version makes three changes compared to the “idiomatic” version:* It uses byte strings instead of UTF-8 strings. In my opinion, that’s not an optimization, that’s changing the problem. Depending on the question you’re asking, only one of the two can be correct.* It uses a faster hash algorithm. It’s not the first time this came up in a benchmark article. Rust’s decision to use a DOS-safe hash by default (and not provide a fast algorithm in the std, like other languages do) really seems to hurt it in that kind of microbenchmark.* It uses get_mut+insert instead of the more convenient HashMap::entry method, because the latter would require redundantly allocating the key even in the repeat case. I’ve hit this problem in the past as well. Maybe the upcoming HashMap::raw_entry_mut will make this kind of optimization cleaner.

评论 #32215909 未加载

评论 #32216089 未加载

评论 #32217011 未加载

评论 #32232302 未加载

评论 #32216277 未加载

评论 #32215756 未加载

评论 #32217277 未加载

raffraffraffalmost 3 years ago

I love using the UNIX shell (pipes and programs that do "one thing well") to the point where it hurts my ability to improve at other other languages. But sometimes all you need is a throwaway command to do something once, not a beautifully optimized function that will be written, improved, compiled, profiled, reoptimized and then executed a billion times. The utter tersity of a bash one-liner just tickles me.

评论 #32219357 未加载

评论 #32223587 未加载

compumikealmost 3 years ago

How about extending this to comparing programming language efficiency from a developer's standpoint, looking at bytes of code, versus the execution time?I took the source code file size from the repository, and the runtime from the blog post.Then I made an arbitrary overall "PAIN SCORE" (lower is better) by multiplying code size * runtime. I suggest this is a worthwhile metric simply because lower is better on both axes, but of course, in the "real world" there will be different economic costs to CPU time and developer time depending on the use case. Here's the sorted results, from least "pain" to most:<pre><code> LANGUAGE FILENAME CODE SIZE RUNTIME PAIN SCORE (LOWER IS BETTER) Shell optimized.sh 75 bytes 1.83 s 137.25 Crystal simple.cr 240 bytes 1.29 s 309.6 Nim simple.nim 424 bytes 0.77 s 326.48 Python simple.py 208 bytes 2.21 s 459.68 Ruby simple.rb 175 bytes 3.17 s 554.75 Go optimized.go 1514 bytes 0.40 s 605.6 Python optimized.py 464 bytes 1.33 s 617.12 Zig optimized.zig 2688 bytes 0.24 s 645.12 Go simple.go 688 bytes 1.12 s 770.56 Zig simple.zig 1394 bytes 0.55 s 766.7 Nim optimized.nim 1683 bytes 0.49 s 824.67 Shell simple.sh 60 bytes 14.81 s 888.6 Ruby optimized.rb 401 bytes 2.47 s 990.47 JavaScript simple.js 532 bytes 1.88 s 1000.16 C optimized.c 4360 bytes 0.23 s 1002.80 Rust optimized.rs 3065 bytes 0.43 s 1317.95 Swift simple.swift 317 bytes 4.23 s 1340.91 JavaScript optimized.js 1501 bytes 1.10 s 1651.1 C simple.c 2735 bytes 0.96 s 2625.6 Rust simple.rs 2239 bytes 1.38 s 3089.82 </code></pre> Sorting only by code size, the most concise implementations are: Shell, Ruby, Python, Crystal. Nobody was aiming to play code golf (i.e. minimize source code size), so these are fairly straightforward, idiomatic, readable implementations.I am definitely a Crystal fan, in fact this afternoon I'm continuing to implement suggestions from my recent Show HN <a href="https://news.ycombinator.com/item?id=32081943" rel="nofollow">https://news.ycombinator.com/item?id=32081943</a> comments. :)

评论 #32217787 未加载

评论 #32221252 未加载

评论 #32222055 未加载

评论 #32217348 未加载

评论 #32221276 未加载

评论 #32218433 未加载

评论 #32217229 未加载

评论 #32224629 未加载

评论 #32217879 未加载

HarHarVeryFunnyalmost 3 years ago

This is a rather meaningless comparison, since the differences are going to be dominated by:1) The choice of libraries/datatypes used for strings and word->count map2) How the source file is split into words - probably library function again, although in C/C++ one could choose to implement a super-optimized low level version that would blow the others awayIMO a performance comparison between languages is only meaningful if the runtime is not dominated by library functions, or if one admits it's really a standard library performance comparison, not a language comparison.

评论 #32217610 未加载

评论 #32216348 未加载

评论 #32217632 未加载

评论 #32220036 未加载

评论 #32219721 未加载

评论 #32224369 未加载

anonymoushnalmost 3 years ago

Recently there was a contest to optimize the performance of a similar program[0] and a Zoom discussion of the optimizations[1].The programs are not comparable in the the following ways:- Case: TFA requires (at least) ASCII lowercasing but the contest problem required no lowercasing.- Ordering: TFA does not require sorting, but the contest problem required sorting.- Memory: TFA imposes a requirement phrased as "don't read the whole file into memory" and this sounds like it's a resource-saving constraint, but it's usually a constraint that requires the program to spend additional resources. You could just mmap the file and store pointers into the mapped region. It costs extra to do copies instead of no copies.- Text: TFA is unclear on what assumptions may be made about the lengths of words. For the contest problem, the Hungarian wikipedia input's longest word is around 80k.- Safe, Hashing, Stdlib: TFA imposes some restrictions on what constructs may be used that are not imposed in the contest problem.For the contest version of this problem, it seems like you can tokenize, hash, and count strings at around 1GB/s. Adapting a solution to solve TFA's problem (but not to conform to its Safe/Hashing/Stdlib requirements) would probably not carry too large of a penalty, since it's like 3 instructions to ASCII-lowercase 32 bytes and 1 string copy per unique string should take negligible time compared to the hash table lookups. So there is some room for the optimized solutions to go a little faster, if more optimizations are permitted.[0]: <a href="https://easyperf.net/blog/2022/05/28/Performance-analysis-and-tuning-contest-6" rel="nofollow">https://easyperf.net/blog/2022/05/28/Performance-analysis-an...</a>[1]: <a href="https://www.youtube.com/watch?v=R_yX0XjdSBY" rel="nofollow">https://www.youtube.com/watch?v=R_yX0XjdSBY</a>

jeffbeealmost 3 years ago

Funny how you can take the "optimized" C++ and make it significantly faster (vs. gnu/linux standard libraries) with little effort. A lookup table for to_lower, instead of bit math, helps slightly. Building with the LLVM-libc string functions helps a lot, because the GNU std::string_view::operator== is calling an AVX-optmized memcmp via the PLT, which is a terrible strategy for short strings. LLVM-libc has an inlined bcmp (note: bcmp can be significantly faster than memcmp, but on GNU they are aliases) that is resolved for the target platform at build time instead of run time.Edit:Even if you ignore LLVM-libc, just slapping this into the optimized.cpp and replacing the critical `==` with `our_bcmp` makes it 10% faster. IFUNC calls to micro-optimized SIMD functions are counterproductive. It is far, far better that the compiler can see all the code at build time.<pre><code> int our_bcmp (const char* a, const char* b, size_t sz) { for (size_t i = 0; i < sz; i++) { if (a[i] != b[i]) return 1; } return 0; }</code></pre>

djhworldalmost 3 years ago

EDIT: I think the use of the term "allocation" might be overloaded here from the original article, after thinking about this a bit more I think the author is referring to a different way of accessing the map. If you look in godbolt <a href="https://godbolt.org/z/Yf64rodW6" rel="nofollow">https://godbolt.org/z/Yf64rodW6</a> you can see that the pointer version uses<pre><code> CALL runtime.mapaccess2_fast64(SB) </code></pre> whereas the 'raw' version uses<pre><code> CALL runtime.mapassign_fast64(SB) </code></pre> When reading the Go solution this bit stood out to me> To reduce the allocations, we’ll use a map[string]*int instead of map[string]int so we only have to allocate once per unique word, instead of for every incrementI just tried benchmarking this with a simple setup and I get zero allocations for both approaches, although the "pointer" approach is slightly faster<pre><code> package main import ( "testing" ) func BenchmarkIncrementMapRawInt(b *testing.B) { var data = make(map[int]int) b.ResetTimer() for i := 0; i < b.N; i++ { key := i % 1000 data[key]++ } } func BenchmarkIncrementMapPointerInt(b *testing.B) { var data = make(map[int]*int) b.ResetTimer() for i := 0; i < b.N; i++ { key := i % 1000 increment(data, key) } } func increment(counts map[int]*int, value int) { if p, ok := counts[value]; ok { *p++ return } n := 1 counts[value] = &n } $ go test -bench=. -benchmem . goos: darwin goarch: arm64 BenchmarkIncrementMapRawInt-8 112288002 10.57 ns/op 0 B/op 0 allocs/op BenchmarkIncrementMapPointerInt-8 139728302 8.586 ns/op 0 B/op 0 allocs/op</code></pre>

评论 #32215548 未加载

yakubinalmost 3 years ago

I like the OCaml version. It's both very easy to read and comes out really well in the benchmark.I also looked at the "simple" Zig version, which came out really well in the benchmark, and to me it didn't look simple at all. It seems you need to make a lot of low level details explicit.But IMHO AWK takes the crown here. :)

评论 #32215820 未加载

评论 #32217051 未加载

gary17thealmost 3 years ago

Wow. Swift, touted as "safe by design and (...) runs lightning-fast"[1] is more of a screw-up than I thought. Almost twice as slow as Lua and behind even Pascal and Forth.[1] <a href="https://developer.apple.com/swift/" rel="nofollow">https://developer.apple.com/swift/</a>

评论 #32215304 未加载

评论 #32215427 未加载

评论 #32215505 未加载

评论 #32215826 未加载

评论 #32215476 未加载

评论 #32223182 未加载

zX41ZdbWalmost 3 years ago

I've checked with ClickHouse and the result is better than I expect... it runs in 0.043 sec. on my machine, which is faster than any other result.The code:SELECT arrayJoin(splitByChar(' ', lower(line))) AS word, count() AS c FROM file('kjvbible.txt', LineAsString) WHERE notEmpty(word) GROUP BY word ORDER BY c DESC FORMAT Nullor:clickhouse-local --query "SELECT arrayJoin(splitByChar(' ', lower(line))) AS word, count() AS c FROM file('kjvbible.txt', LineAsString) WHERE notEmpty(word) GROUP BY word ORDER BY c DESC" > /dev/nullIt is using only a single thread.

评论 #32220205 未加载

评论 #32221493 未加载

评论 #32220143 未加载

BiteCode_devalmost 3 years ago

In this type of blog post with comparisons including many languages, you usually see some unatural approach for your favorite language.However, in this case, the Python example is idiomatic.Since the articles calls for better approaches, I would suggest to take the opportunity to use the walrus operator and rsplit() for the optimized version. Something like this:<pre><code> reminding = "" c=Counter( ) while (chunk := sys.stdin.read(64 * 1024)): pre, post = chunk.lower().rsplit('\n', 1) c.update((reminding + pre ).split()) reminding = post </code></pre> It should not affect performance too much, and the code gets more expressive.However, the performances will be quite different depending of the python version you use. Interestingly, Python 3.11 beta, which comes with a lot performance tweaks, is slower for this exercice, while being reported to be faster on real life tasks.

评论 #32216466 未加载

chrismorganalmost 3 years ago

> Incidentally, this problem set the scene for a wizard duel between two computer scientists several decades ago. In 1986, Jon Bentley asked Donald Knuth to show off “literate programming” with a solution to this problem, and he came up with an exquisite, ten-page Knuthian masterpiece. Then Doug McIlroy (the inventor of Unix pipelines) replied with a one-liner Unix shell version using tr, sort, and uniq.Since this writing and other linked resources present only one side of the affair, I will mention: the presentation in Pearls was quite unfair to Knuth, and the conclusions commonly made of it somewhere between moderately and entirely unsound. (That is, as conclusions from the paper. I speak of unsoundness of logic only: these conclusions may be supportable from other sources.)Knuth was challenged to demonstrate literate programming, and so that was what he demonstrated. He showed building something from the ground up assuming a very minimal environment, with problem analysis and all: of course that ended up more verbose than a Unix pipeline that glosses over problem analysis and starts with most of the tools you need to make a probably-acceptable solution!McIlroy himself admitted a few years later that he had been “a little unfair” to criticise on engineering grounds what had only been an illustration of technique. Even in the paper, Bentley did admit a degree of culpability for this mismatch in his “criticism of programs” paragraph. (“He admires the execution of the solution, but faults the problem on engineering grounds. (That is, of course, my responsibility as problem assigner; Knuth solved the problem he was given on grounds that are important to most engineers-the paychecks provided by their problem assigners.)”)But I do wish that Knuth had been given the opportunity to write a review of McIlroy’s review, for simultaneous publication.

评论 #32215351 未加载

评论 #32216232 未加载

评论 #32215458 未加载

mustache_kimonoalmost 3 years ago

... (2021). Great read and fun project. See the discussion from when it was originally posted by the author.[0][0]: <a href="https://news.ycombinator.com/item?id=26463967" rel="nofollow">https://news.ycombinator.com/item?id=26463967</a>

svatalmost 3 years ago

See also a different comparison (on ~the same problem) at StackExchange: <a href="https://codegolf.stackexchange.com/questions/188133/bentleys-coding-challenge-k-most-frequent-words" rel="nofollow">https://codegolf.stackexchange.com/questions/188133/bentleys...</a>(Copying from my comment the last time this was posted: <a href="https://news.ycombinator.com/item?id=26467684" rel="nofollow">https://news.ycombinator.com/item?id=26467684</a>)There's also a nice book "Exercises in Programming Style" about just this problem.

评论 #32227044 未加载

评论 #32216307 未加载

ppeetteerralmost 3 years ago

Curious if anyone knows why the swift version is slow? Is it the casting from subsequence to a string?

评论 #32218759 未加载

评论 #32220587 未加载

Twirrimalmost 3 years ago

As they call out in the article, accurately counting words is a bit more complicated that splitting by space. You need to account for all sorts of punctuation etc.When I've been asked this in programming interviews, I've almost always been expected to produce something like the code in the article (and do), but usually I'd point to something like the NLTK library as a better approach. It's polished and highly capable, and handles probably just about every edge case there is.

staticassertionalmost 3 years ago

It's a bit of a shame that Rust's stdlib doesn't have a "fast but less safe" hasher. It's one of the easiest performance optimizations in my experience because it's very rare that an attacker controls input to your hashmap in a way that can actually lead to a DOS.Not that it doesn't happen, it's just not something I've run into.Just today I was testing out the performance difference hashing some Scylla queries and got nearly 2x faster hashing moving to fxhash.

ribitalmost 3 years ago

These days I tend to use R for these type problems. It’s slow, but vectors as first-class data types simplify things a lotreadLines(file) |> strsplit(“ +”) |> unlist() |> table() |> sort(desc = TRUE)

jcelerieralmost 3 years ago

Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - <a href="https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5aada569" rel="nofollow">https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5a...</a> ; boost, fmt and <a href="https://github.com/martinus/robin-hood-hashing" rel="nofollow">https://github.com/martinus/robin-hood-hashing</a><pre><code> $ g++ -I robin-hood-hashing/src/include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt $ time ./a.out < kjvbible_x10.txt > /dev/null 0,19s user 0,01s system 99% cpu 0,197 total </code></pre> with the same build flags, optimize.cpp gives me<pre><code> 0,22s user 0,01s system 99% cpu 0,233 total</code></pre>

osigurdsonalmost 3 years ago

C# really needs a GetOrAdd method on the regular Dictionary class (as exists on ConcurrentDictionary). This common pattern (from the optimized version) requires the hash function on "word" to be evaluated twice (once for TryGetValue and once for Add). Then, if there is a collision on the 32 bit hash then the equality check needs to be run twice as well. Not good as all of these are O(N) operations (where N is the size of "word"). I suspect the optimized version would be materially faster if they implemented their own dictionary that includes this.<pre><code> if (!counts.TryGetValue(word, out var wordCountRef)) { counts.Add(word, new Ref<int>(1)); }</code></pre>

Karellenalmost 3 years ago

Doesn't the Unix Shell implementation break the "threading" constraint, in that each program in the pipe will have a main thread each, and all the programs will run concurrently?Or should the "threading" constraint really have been stated as "you can use multiple threads, but only if they all have their own address space"?Alternatively, given the multi-core capabilities of even budget/mobile/small systems these days (even the Rasberry Pi 4 has a quad-core CPU), isn't restricting the implementation to a single thread a weird artificial limitation in 2022? Also, a GPU-based implementation would be interesting, and GPUs most of their power from their massively parallel core architecture.

评论 #32217846 未加载

kramergeralmost 3 years ago

> When optimizing this, the first thing to do is compile with optimizations enabled (g++ -O2). I kind of like the fact that with Go you don’t have to worry about this – optimizations are always on.I think it is more fair to say that in Go optimizations are always OFF.

评论 #32224029 未加载

josefxalmost 3 years ago

No need to decode the C++ names when you use valgrind/callgrind/kcachegrind for C, that works just as well for C++.

gibsonf1almost 3 years ago

What code was used for Common Lisp? I'm guessing a well written version would be in the top contenders for speed.

评论 #32218201 未加载

yb303almost 3 years ago

I had a 1 minute look at the "optimized" C code - it's calling malloc for every word.

评论 #32219606 未加载

评论 #32219449 未加载

Mllleralmost 3 years ago

Taking it in a slightly different direction – because in my work chunking is nearly never necessary (too little text or too much memory) whereas Unicode characters and punctuation nearly always is – I would tend to something like:<pre><code> from collections import Counter from re import finditer word_counts = Counter() with open(PATH, encoding = 'utf-8') as file: doc = file.read() word_counts.update(match.group().lower() for match in finditer(r'\w+', doc)) </code></pre> In Python, this could be a fairly performant way of taking unicode into account, because it doesnʼt use Pythonʼs for-loop, and regular expressions are rather optimized compared with writing low-level-style Python. (Maybe it should use casefold instead of lower.)

gcanyonalmost 3 years ago

My language of choice, LiveCode, is a superset of HyperTalk. The concept of "words" has been built-in (not a library) since the '80s. Hash tables since the '90s.A solution to every aspect of this problem but the line-by-line bit took just over a minute to write. Adding that took another minute.The trade-off is that LC is slower than almost any of the languages listed, and optimization is almost impossible -- the idiomatic way is the fastest (almost). And if you don't like the way LC parses words, there is almost no way to change it without going into the source code (a big pain).

unhammeralmost 3 years ago

Hm, I tried comparing the simple.hs vs simple.py (with ghc 8.10.7 which I happened to have here, their stack.yaml uses 8.10.4) and it's 5.7s vs 1.9s, not quite as stark a difference. Maybe because I'm on Linux?

评论 #32231665 未加载

anonualmost 3 years ago

q solution in kdb, runs in about 2.5s on my machine vs simple.py which takes 4.7s. Rough interpolation with the results table puts this on par with Rust/Go/OCaml or other compiled languages.<pre><code> \t desc count each group `$lower " " vs raze read0 `:kjvbible_x10.txt </code></pre> It's also 1 line!I am sure q pros or k purists can optimize this even more...EDIT: Moving the lower earlier brings this down to 1.9s<pre><code> desc count each group `$ " " vs lower raze read0 `:kjvbible_x10.txt</code></pre>

samuellalmost 3 years ago

Fun stuff! Did run a similar thing with a simple bioinformatics problem before (calculating the ratio of G and Cs against A+G+C+T), also with a whole bunch of contributors:<a href="https://github.com/samuell/gccontent-benchmark#readme" rel="nofollow">https://github.com/samuell/gccontent-benchmark#readme</a>Really hard - or impossible - to arrive at a definitive single number for one language, but the whole exercise is a lot of fun and quite informative IMO :)

评论 #32228377 未加载

lremalmost 3 years ago

This site makes Chrome (Android, current) stop responding.

评论 #32216289 未加载

benreesmanalmost 3 years ago

This was always my approach when I had do “leetcode whiteboard interviews”.Start with something not too far north of FizzBuzz but with a ton of scope to progressively enrich the problem until the clock runs out.At $BIG_CO du jour people tend to talk about how much “signal” an interview produces, and starting somewhere that a meaningful number of candidates can’t do anything with offers very little “signal”, likewise very little Shannon information.Great article!

thaynealmost 3 years ago

I'm curious why go is faster than rust in the benchmarks. My best theory is that the program doesn't run long enough to trigger a GC, so rust is paying the cost of freeing memory, but go isn't. (Using an arena allocator in rust would probably produce similar behavior).Or maybe the difference is just noise? It's hard to tell without more details on how the benchmarks were run.

评论 #32232402 未加载

评论 #32220414 未加载

评论 #32230544 未加载

zmmmmmalmost 3 years ago

It seems like this is just measuring a single pass at reading a file with about 100k words in it. I ran the benchmark and it finished almost instantly. So languages with startup overhead and warmup time (eg: JIT etc) will totally underperform here. Makes sense then why Java is almost identical between optimised and non-optimised versions.

评论 #32228930 未加载

评论 #32222949 未加载

ijidakalmost 3 years ago

> We usually think of I/O as expensive, but I/O isn’t the bottleneck here...The tokenization and hash table operations are the bottleneck by a long shotInteresting. This is something I'll have to keep in mind.Best practice is always been to consider I/O the slowest.But, it's true... times have changed.Maybe we shouldn't make this assumption anymore.

评论 #32218475 未加载

评论 #32219640 未加载

nr2xalmost 3 years ago

Some of these efficiency gains are remarkable. Most of the code I write is just the quickest way to write a mundane function. Really curious if you went into a huge tech co and devoted ~10% of SWE time solely to optimization of code base what the energy savings would be.

评论 #32217083 未加载

kramergeralmost 3 years ago

For performance comparison between languages, this article is not very useful. Nothing against the author, this task is simply impossible :)Om the other hand the article is gold mine for learning performance analysis in different languages.

RantyDavealmost 3 years ago

Interesting, it confirms what I've suspected for a while: that by using Python and not C++ for "general text mashing" tasks I'm not losing that much performance. Like, 2x, 5x, perhaps 10x. But not 100x.

gxtalmost 3 years ago

> a simple interview problemI've been asking different versions of that question since I started interviewing and it is a surprisingly good filter.But with articles like this, now I need new questions..

bsaulalmost 3 years ago

surprised to see swift 4 times as slow as non-optimized go.Anyone has an explanation ?I know the swift string type is really complex, but i always assumed it was at least performing well..

评论 #32216136 未加载

评论 #32216334 未加载

评论 #32220620 未加载

habiburalmost 3 years ago

You need to measure memory usage with it, take the product of the two (memory and speed), and thus get a very good number on performance for comparison.

fallingfrogalmost 3 years ago

Your Perl implementation is almost as fast as c++? Something weird is going on here. Maybe you are bottlenecked on disk read speeds or something.

评论 #32217472 未加载

评论 #32222913 未加载

评论 #32218140 未加载

yb303almost 3 years ago

Another 1 minute lookin... multiplication on every byte.. really? If this is supposed to be fast, get one big buffer for the words. Get another if you run out. The hash function can be the word itself for words up to length 8. Longer words would need a mult per 8 bytes. You can do branchless strchr. Roll your own memmove - just always copy 24 bytes and terminate at the length you already know. The whole point of using C or C++ is that you can do non idiomatic things when you have to.

lxealmost 3 years ago

> I tried running it using the PyPy optimizing compiler, but it’s significantly slower for some reasonI wonder why that is?

githubholobeatalmost 3 years ago

Nice 3rd place for Zig implementation. But is it missing `defer arena.deinit();`?

claytongulickalmost 3 years ago

My takeaway is that unoptimized c++ is pretty close to the same performance as JavaScript.Wild.

评论 #32216342 未加载

ncmncmalmost 3 years ago

You can tell it will be BS just from the title treating C and C++ as if they were one language, not two very different languages.Other comments here reveal how, exactly.

评论 #32232412 未加载

manv1almost 3 years ago

This is the executive version of this test: "I gave this problem to my team and it took them two weeks to figure out a solution."

svnpennalmost 3 years ago

You included Awk in the title, but omitted Go, really?

评论 #32216180 未加载

fractalbalmost 3 years ago

grep is indeed fast!

jerryjerryjerryalmost 3 years ago

Our distributed NewSQL database project actually uses three of them, Go, Rust, and C++, for different layers. There are pros and cons for each of them, but it still challenge to maintain three languages in a one system development. Performance wise, C++ and Rust are good, coding wise Go is easiest but Rust is kind of having best practice.

FpUseralmost 3 years ago

I would not call it performance comparison at all. When Python call functions written in C it is not Python's performance. Write those functions using plain Python and then see the results. Sure for this basic example in the article it does not matter from a practical standpoint. But when you need to step away from canned cases suddenly Python's performance sucks big time.

评论 #32215893 未加载

评论 #32215620 未加载

评论 #32219659 未加载

评论 #32234009 未加载

评论 #32220557 未加载

53 comments

mustache_kimonoalmost 3 years ago

评论 #32220983 未加载

评论 #32220347 未加载

评论 #32221055 未加载

评论 #32215743 未加载

评论 #32218412 未加载

评论 #32231832 未加载

评论 #32232914 未加载

codefloalmost 3 years ago

评论 #32215909 未加载

评论 #32216089 未加载

评论 #32217011 未加载

评论 #32232302 未加载

评论 #32216277 未加载

评论 #32215756 未加载

评论 #32217277 未加载

raffraffraffalmost 3 years ago

评论 #32219357 未加载

评论 #32223587 未加载

compumikealmost 3 years ago

评论 #32217787 未加载

评论 #32221252 未加载

评论 #32222055 未加载

评论 #32217348 未加载

评论 #32221276 未加载

评论 #32218433 未加载

评论 #32217229 未加载

评论 #32224629 未加载

评论 #32217879 未加载

HarHarVeryFunnyalmost 3 years ago

评论 #32217610 未加载

评论 #32216348 未加载

评论 #32217632 未加载

评论 #32220036 未加载

评论 #32219721 未加载

评论 #32224369 未加载

anonymoushnalmost 3 years ago

jeffbeealmost 3 years ago

djhworldalmost 3 years ago

评论 #32215548 未加载

yakubinalmost 3 years ago

评论 #32215820 未加载

评论 #32217051 未加载

gary17thealmost 3 years ago

评论 #32215304 未加载

评论 #32215427 未加载

评论 #32215505 未加载

评论 #32215826 未加载

评论 #32215476 未加载

评论 #32223182 未加载

zX41ZdbWalmost 3 years ago

评论 #32220205 未加载

评论 #32221493 未加载

评论 #32220143 未加载

BiteCode_devalmost 3 years ago

评论 #32216466 未加载

chrismorganalmost 3 years ago

评论 #32215351 未加载

评论 #32216232 未加载

评论 #32215458 未加载

mustache_kimonoalmost 3 years ago

svatalmost 3 years ago

评论 #32227044 未加载

评论 #32216307 未加载

ppeetteerralmost 3 years ago

Curious if anyone knows why the swift version is slow? Is it the casting from subsequence to a string?

评论 #32218759 未加载

评论 #32220587 未加载

Twirrimalmost 3 years ago

staticassertionalmost 3 years ago

ribitalmost 3 years ago

jcelerieralmost 3 years ago

osigurdsonalmost 3 years ago

Karellenalmost 3 years ago

评论 #32217846 未加载

kramergeralmost 3 years ago

评论 #32224029 未加载

josefxalmost 3 years ago

No need to decode the C++ names when you use valgrind/callgrind/kcachegrind for C, that works just as well for C++.

gibsonf1almost 3 years ago

What code was used for Common Lisp? I'm guessing a well written version would be in the top contenders for speed.

评论 #32218201 未加载

yb303almost 3 years ago

I had a 1 minute look at the "optimized" C code - it's calling malloc for every word.

评论 #32219606 未加载

评论 #32219449 未加载

Mllleralmost 3 years ago

gcanyonalmost 3 years ago

unhammeralmost 3 years ago

评论 #32231665 未加载

anonualmost 3 years ago

samuellalmost 3 years ago

评论 #32228377 未加载

lremalmost 3 years ago

This site makes Chrome (Android, current) stop responding.

评论 #32216289 未加载

benreesmanalmost 3 years ago

thaynealmost 3 years ago

评论 #32232402 未加载

评论 #32220414 未加载

评论 #32230544 未加载

zmmmmmalmost 3 years ago

评论 #32228930 未加载

评论 #32222949 未加载

ijidakalmost 3 years ago

评论 #32218475 未加载

评论 #32219640 未加载

nr2xalmost 3 years ago

评论 #32217083 未加载

kramergeralmost 3 years ago

RantyDavealmost 3 years ago

gxtalmost 3 years ago

bsaulalmost 3 years ago

surprised to see swift 4 times as slow as non-optimized go.Anyone has an explanation ?I know the swift string type is really complex, but i always assumed it was at least performing well..

评论 #32216136 未加载

评论 #32216334 未加载

评论 #32220620 未加载

habiburalmost 3 years ago

You need to measure memory usage with it, take the product of the two (memory and speed), and thus get a very good number on performance for comparison.

fallingfrogalmost 3 years ago

Your Perl implementation is almost as fast as c++? Something weird is going on here. Maybe you are bottlenecked on disk read speeds or something.

评论 #32217472 未加载

评论 #32222913 未加载

评论 #32218140 未加载

yb303almost 3 years ago

lxealmost 3 years ago

> I tried running it using the PyPy optimizing compiler, but it’s significantly slower for some reasonI wonder why that is?

githubholobeatalmost 3 years ago

Nice 3rd place for Zig implementation. But is it missing `defer arena.deinit();`?

claytongulickalmost 3 years ago

My takeaway is that unoptimized c++ is pretty close to the same performance as JavaScript.Wild.

评论 #32216342 未加载

ncmncmalmost 3 years ago

You can tell it will be BS just from the title treating C and C++ as if they were one language, not two very different languages.Other comments here reveal how, exactly.

评论 #32232412 未加载

manv1almost 3 years ago

This is the executive version of this test: "I gave this problem to my team and it took them two weeks to figure out a solution."