The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions

499 点作者 nalgeon大约 1 年前

30 条评论

bbkane大约 1 年前

I found this super interesting - especially as all the data I've written code to manipulate has been small enough that I haven't needed to optimize my code, so I've never had to think in this direction.I think my favorite part was the very first section, where he got baseline measurements with `cat`, `wc`, and friends. I wouldn't have thought to do that and its such an easy way to get a perspective on what's "reasonable".

评论 #39581977 未加载

评论 #39579597 未加载

评论 #39578835 未加载

faizshah大约 1 年前

I was curious how long it would take with Polars (for scale), apparently 33s: <a href="https://github.com/Butch78/1BillionRowChallenge/tree/main">https://github.com/Butch78/1BillionRowChallenge/tree/main</a>I’m kind of interested in the opposite problem, what is the simplest solution using a well known library/db that approaches the fastest hand optimized solution to this problem?

评论 #39579093 未加载

评论 #39579212 未加载

评论 #39579559 未加载

评论 #39579796 未加载

michae2大约 1 年前

For anyone looking for more examples of 1BRC in Go, we had a friendly competition at work and collected the results here: <a href="https://github.com/dhartunian/1brcgo/">https://github.com/dhartunian/1brcgo/</a>In addition to the loop-unrolling and bit-twiddling tricks that also show up in the fastest Java and C++ versions, some Go-specific things I learned were:- unsafe.Pointer can be used to read memory without bounds checks- many functions in the bytes and bits packages in the standard library are written in assembly- debug.SetGCPercent and SetMemoryLimit to turn off GC- runtime.LockOSThread to lock a goroutine to a thread- print is slightly faster than fmt.Printf (but writes to stderr)

评论 #39579265 未加载

评论 #39584951 未加载

hoten大约 1 年前

> I find this report confusing. Why does if items[hashIndex].key == nil show as taking 5.01s, but the call to bytes.Equal shows as only 390ms. Surely a slice lookup is much cheaper than a function call? If you are a Go performance expert and can help me interpret it, I’m all ears!These two lines are both conditionals, so the time reported is sensitive to branch mispredictions. If the timings are not intuitive based on the complexity of the associated lines, then it may be explained by the data being not very predictable and the branch predictor having a bad time.

评论 #39579875 未加载

camgunz大约 1 年前

I love the nerdery around 1BRC. My axe to grind is that unless you do dangerous stuff DBs are just as fast, less complicated, and more resilient to data updates than application code [0]. Do more in the database!0: <a href="https://geraldonit.com/2024/01/31/1-billion-row-challenge-in-sql-and-oracle-database/" rel="nofollow">https://geraldonit.com/2024/01/31/1-billion-row-challenge-in...</a>

评论 #39582958 未加载

评论 #39582915 未加载

thangalin大约 1 年前

Back in 2010, I used PostgreSQL for a web app that queried 270 million rows of climate data from Environment Canada:<a href="https://www.youtube.com/watch?v=10KEr3sEG80" rel="nofollow">https://www.youtube.com/watch?v=10KEr3sEG80</a>I wanted to see how the temperature was changing over time for specific regions using a map-based interface. The following chart was particularly eye-opening:<a href="https://www.youtube.com/watch?v=iEtvf9xzRB4&t=164s" rel="nofollow">https://www.youtube.com/watch?v=iEtvf9xzRB4&t=164s</a>The software won a couple of awards and was heavily optimized to produce reports in under a minute. Kudos to the author for getting a parse time of a billion records down to mere seconds.

评论 #39579644 未加载

fizx大约 1 年前

It's worth noting that if you're messing around with large text files from the CLI, awk, grep, etc will be an order-of-magnitude faster if you opt out of unicode parsing.I'm pretty confident adding LC_ALL=C to the awk solution would get it easily under a minute.

评论 #39579947 未加载

verytrivial大约 1 年前

I just want to go on record that given the simplistic (i.e fun) problem, a shell developer would have had the answer to a first, specific set of billion rows done while all the other language were still putting on their shoes.

jonahx大约 1 年前

Nice post. Interesting that the fastest Java beats the fastest Go, though they are close:<pre><code> AY fastest Go version 2.90s 36.2 TW fastest Java version 0.953s 110 </code></pre> I would have expected Go to win. That JVM works pretty good...

评论 #39579018 未加载

评论 #39583606 未加载

评论 #39578675 未加载

评论 #39579013 未加载

评论 #39578716 未加载

评论 #39586275 未加载

评论 #39578689 未加载

评论 #39578827 未加载

评论 #39578839 未加载

评论 #39582823 未加载

michalsustr大约 1 年前

> the fastest, heavily-optimised Java solution runs in just under a second on my machineI don’t understand how this is possible. The file in question has 13GB, while the fastest commonly available SSDs are 12400 MB/s. Am I missing something?

评论 #39579608 未加载

评论 #39579595 未加载

评论 #39579747 未加载

评论 #39580807 未加载

评论 #39579592 未加载

评论 #39579586 未加载

评论 #39579598 未加载

评论 #39582223 未加载

WesolyKubeczek大约 1 年前

I love the author’s step-by-step approach as very often it so happens that a hyper-optimized solution may be overfitted to the exact dataset it’s operating on. In each step, the tradeoffs are being explained: what we gain, but also what we lose by stripping the functionality away.

JensRantil大约 1 年前

Second article I'm reading on implementing this in Go. Since the temperatures are in the range [-99.9, 99.9] with a tenth of precision (~2k values), I am surprised why no one has implemented a parsing of the numbers using a prepopulated lookup table. Should probably speed things up.I submitted a github issue on this for the other implementation I looked at here[1].[1] <a href="https://github.com/shraddhaag/1brc/issues/2">https://github.com/shraddhaag/1brc/issues/2</a>

评论 #39578754 未加载

评论 #39578954 未加载

评论 #39579714 未加载

nottorp大约 1 年前

I have a feeling that a naive implementation in Java would be a lot worse than a naive implementation in Go so optimizing matters more there.Had to parse csvs in Java on a very memory constrained system once... we ended up cutting out a feature because it wasn't worth it.

评论 #39579406 未加载

评论 #39579155 未加载

评论 #39579954 未加载

评论 #39579058 未加载

评论 #39579614 未加载

avinassh大约 1 年前

> I’m in the same ballpark as Alexander Yastrebov’s Go version. His solution looks similar to mine: break the file into chunks, use a custom hash table (he even uses FNV hashing), and parse temperatures as integers. However, he uses memory-mapped files, which I’d ruled out for portability reasons – I’m guessing that’s why his is a bit faster.I am curious, can it be made even faster than this?

评论 #39578649 未加载

评论 #39578678 未加载

评论 #39579622 未加载

评论 #39578732 未加载

Alifatisk大约 1 年前

Ia there a collection on all the languages that have attempted this challenge? I know comparing and competing languages is somewhat useless but it is still interesting to me.

评论 #39585160 未加载

评论 #39589218 未加载

blue_pants大约 1 年前

There's a nodejs version which takes 23s<a href="https://github.com/1brc/nodejs">https://github.com/1brc/nodejs</a>

评论 #39580636 未加载

sireat大约 1 年前

Those super optimized solutions are fun to read about.However, in real life I would never assume a millions rows of text all have all valid data in a specific format, much less a billion rows.Thus a slower but more robust solution would be more realistic.

worldwidelies大约 1 年前

I’d like to see a 1 trillion row challenge.

评论 #39582316 未加载

afiodorov大约 1 年前

My first instinct would be to spin up a local Postgres and keep station data there. A lot of the solutions assume we have enough ram to keep the stats per station, however that's a bad assumption when dealing with a lot of data.

评论 #39579436 未加载

评论 #39580653 未加载

Beltalowda大约 1 年前

How did you generate the timings on: <a href="https://benhoyt.com/images/go-1brc-profile-r9-source.png" rel="nofollow">https://benhoyt.com/images/go-1brc-profile-r9-source.png</a> ?

评论 #39583188 未加载

nicois大约 1 年前

It would be interesting to know how effective Profile Guided Optimisation is here.

评论 #39579121 未加载

评论 #39579096 未加载

llmblockchain大约 1 年前

One "trick" not tried in the article that I have used is to gzip the data (so you have a .csv.gz) and stream that instead of the raw file. I find it reduces a large amount of the disk read (as you have 6-10x less to read).

mohamedattahri大约 1 年前

There’s something cool about the fact that parallel code in Go is still idiomatic Go.

CyanLite2大约 1 年前

Meanwhile the C# (.net) implementation is 4x faster.

fullstackchris大约 1 年前

performance is great but i would imagine the paralellized version requires a significantly higher minimum amount of RAM than the non paralellized ways... he claims that each more perfomant solution is less compute cost than the one before it, but in the case of paralellization its just the same amount of compute in just a shorter amount of time, right?

satvikpendem大约 1 年前

Nice. I wonder how Rust would fare, given that it has no garbage collector.

评论 #39581093 未加载

评论 #39579585 未加载

1vuio0pswjnm7大约 1 年前

How about kdb+ or shakti

tonymet大约 1 年前

i saw the custom hashtable, but why was Map slow?

评论 #39579908 未加载

m3kw9大约 1 年前

I’d just pay my way to 4s by upgrading hw

评论 #39579144 未加载

评论 #39579146 未加载

neonsunset大约 1 年前

The effort and long time it took Go to get to something that 3-6x times slower than other, better languages should be an important reminder to everyone assuming it belongs to the same weight class as Rust, C# or Java.

评论 #39579411 未加载

评论 #39578987 未加载

评论 #39578959 未加载