TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Thread-Per-Core Buffer Management for a modern storage system

118 pointsby arjunnarayanover 4 years ago

7 comments

bob1029over 4 years ago
More threads (i.e. shared state) is a huge mistake if you are trying to maintain a storage subsystem with synchronous access semantics.<p>I am starting to think you can handle all storage requests for a single logical node on just one core&#x2F;thread. I have been pushing 5~10 million JSON-serialized entities to disk per second with a single managed thread in .NET Core (using a Samsung 970 Pro for testing). This <i>includes</i> indexing and sequential integer key assignment. This testing will completely saturate the drive (over 1 gigabyte per second steady-state). Just getting an increment of a 64 bit integer over a million times per second across an arbitrary number of threads is a big ask. This is the difference you can see when you double down on single threaded ideology for this type of problem domain.<p>The technical trick to my success is to run all of the database operations in micro batches (10~1000 microseconds per). I use LMAX Disruptor, so the batches are formed naturally based on throughput conditions. Selecting data structures and algorithms that work well in this type of setup is critical. Append-only is a must with flash and makes orders of magnitude difference in performance. Everything else (b-tree algorithms, etc) follows from this realization.<p>Put another way, If you find yourself using Task or async&#x2F;await primitives when trying to talk to something as fast as NVMe flash, you need to rethink your approach. The overhead with multiple threads, task parallel abstractions, et. al. is going to cripple any notion of high throughput in a synchronous storage domain.
评论 #25401029 未加载
评论 #25400873 未加载
评论 #25401976 未加载
zinclozengeover 4 years ago
If anybody&#x27;s interested, there&#x27;s a Seastar inspired library for Rust that is being developed <a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;glommio" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;glommio</a>
lrossiover 4 years ago
Disappointed to see that you spent 25% of article space to describe in detail all the ways in which computer hardware got faster, then you promised to show how your project is taking advantage of this, but you are not showing any performance measurements at all. Just a very fancy architecture.<p>Correct me if I’m wrong, but the only number that I can find is a guarantee that you do not exceed 500 us of latency when handling a request. And it’s not clear if this is a guarantee at all, since you say just that the system will throw a traceback in case of latency spikes.<p>I would have liked to see the how latency varies under load, how much throughput you can achieve, how the latency long tail looks like on a long-running production load, and comparisons with off-the-shelf systems tuned reasonably.
评论 #25407468 未加载
dotnwatover 4 years ago
Noah here, developer at Vectorized. Happy to answer any questions.
评论 #25401262 未加载
评论 #25401828 未加载
eisover 4 years ago
I&#x27;d be interested in the write amplification since Redpanda went pretty low level in the IO layer. How do you guarantee atomic writes when virtually no disk provides guarantees other than on a page level which could result in destroying already written data if a write to the same page fails - at least in theory - and so one has to resort to writing data multiple times.
mirekrusinover 4 years ago
What is the point of talking performance-by-thread-per-core if raft sits in front of it, ie. only one will do the work at any time anyway?
评论 #25401709 未加载
matthewtovbinover 4 years ago
@arjunnarayan Have you evaluated the performance against vanilla Kafka &#x2F; Confluent Cloud? Where can I see the results?