TechEcho

14 comments

danlarkalmost 6 years ago

We tried mimalloc in ClickHouse and it is two times slower than jemalloc in our common use case <a href="https://github.com/microsoft/mimalloc/issues/11" rel="nofollow">https://github.com/microsoft/mimalloc/issues/11</a>

评论 #20251821 未加载

评论 #20306843 未加载

评论 #20251823 未加载

nh2almost 6 years ago

Are there functions available with which I can at run-time query how much OS memory is used, how much handed out in allocations, how many mmap()ed pools are used, and so on?I find that one of the most important features of a malloc library to debug memory usage.glibc has these functions (like malloc_info()) -- they are very bugged in that they return wrong results, but after patching them to be correct, they are super useful.

评论 #20252080 未加载

评论 #20253333 未加载

c-smilealmost 6 years ago

Looks like the same idea as Konstantin Knizhnik's thread_alloc:<a href="http://www.garret.ru/threadalloc/readme.html" rel="nofollow">http://www.garret.ru/threadalloc/readme.html</a>At least the same architecture of allocated chunks management.

shereadsthenewsalmost 6 years ago

I always find comparisons with tcmalloc hard to parse, since it has a million knobs and the defaults are terrible. If they are running with 16 threads I would normally advise increasing the thread cache size far above the default 3MiB. also interesting would be jemalloc in per-CPU mode.As always the thing to do is build and run your own workload and see the results.

huhtenbergalmost 6 years ago

The tricky part with allocators is always the multi-threaded setups.Even something as simple as a bunch of threads doing malloc-free in a loop will drop performance of a lot of allocators to the floor, due to some sort of central locking or excessive cache thrashing. This is typically solved by adding per-thread block pools, free lists or some such.If you go further down the rabbit hole, there's a case when blocks are allocated in one thread and freed in another, your very typical producer-consumer setup. This too further complicates things with the pool/freelist setup and requires periodic rebalancing of freelists and pools.So once all this is accommodated, a well-tuned allocator inevitably converges to a model with central slabs/pools/freelists and per-thread caches of the same, which are periodically flushed into the former. Then it all comes down to routine code optimization to make fastpaths fast, through lock-free data structures, some clever tricks and what not.In other words, it's always nice to read through someone's allocator code, but in the end this is a very well-explored area and there's basically a single stable point once all common scenarios are considered.

评论 #20250530 未加载

评论 #20250159 未加载

评论 #20250183 未加载

评论 #20250760 未加载

评论 #20250128 未加载

评论 #20252046 未加载

评论 #20250203 未加载

评论 #20251562 未加载

评论 #20251719 未加载

评论 #20250206 未加载

fwipalmost 6 years ago

The benchmarks are very impressive! I am excited to read through this code and think on it.Edit: They do mention they're all from AMD's EPYC chip, which is a little idiosyncratic. Speculation: perhaps page locality is more important on this architecture.

评论 #20250380 未加载

longcommonnamealmost 6 years ago

Just a general question in regards to using memory allocators, in the consideration of a C only application.The problems I encounter with allocator and heap manager are almost never solved by these types of frameworks. These problems include:1. Improper usage of the memory returned that contradict implementation. 2. Pool allocators that don't have separation between individual blocks (performance reasons). 3. Specifying the lifetime of the memory to a thread or until specific events happen. 4. Difficult to diagnose corruption, with any tool available.Here's a specific scenario I deal with very often: There are N persistent worker threads. These worker threads have their own pool of memory, and prior to getting work we know this pool is clean. After the work is finished and before more work is recieved the memory is cleaned. Any excess requested memory is returned to the global-pool, and any memory that is "unmanaged" is dealt with properly.This means that people can do whatever heap management call you use (void * obtainMemory(size_t);) in the scope of business logic without having to worry about infrastructure concerns.Having a faster malloc/calloc doesn't benefit me as much as making the usage of memory easier, and the understanding of what happens easier.

civilityalmost 6 years ago

Is anyone aware of a good/fast single threaded allocator for cases where you don't need/want to pay for thread safety?

评论 #20338391 未加载

评论 #20252539 未加载

评论 #20252248 未加载

评论 #20254440 未加载

m0zgalmost 6 years ago

The important thing about all this is to measure perf on realistic workloads before and after. I don't really believe in allocators that have "excellent performance" on everything.

ksecalmost 6 years ago

The Dev at Discourse also try it with Ruby, the result aren't as good as jemalloc. [1][1] <a href="https://twitter.com/samsaffron/status/1143048590555697152" rel="nofollow">https://twitter.com/samsaffron/status/1143048590555697152</a>

tuananhalmost 6 years ago

the redis benchmark is interesting! Maybe antirez can make sth out of it

john-ajalmost 6 years ago

I never like names that require a “pronounced like” note, but cool project regardless.

评论 #20251643 未加载

评论 #20254203 未加载

评论 #20252996 未加载

PieUseralmost 6 years ago

they should not have tested on AWS

brian_herman__almost 6 years ago

I should create memealloc which converts all memory allocations to base64 encoded gifs

评论 #20264031 未加载

14 comments

danlarkalmost 6 years ago

评论 #20251821 未加载

评论 #20306843 未加载

评论 #20251823 未加载

nh2almost 6 years ago

评论 #20252080 未加载

评论 #20253333 未加载

c-smilealmost 6 years ago

shereadsthenewsalmost 6 years ago

huhtenbergalmost 6 years ago

评论 #20250530 未加载

评论 #20250159 未加载

评论 #20250183 未加载

评论 #20250760 未加载

评论 #20250128 未加载

评论 #20252046 未加载

评论 #20250203 未加载

评论 #20251562 未加载

评论 #20251719 未加载

评论 #20250206 未加载

fwipalmost 6 years ago

评论 #20250380 未加载

longcommonnamealmost 6 years ago

civilityalmost 6 years ago

Is anyone aware of a good/fast single threaded allocator for cases where you don't need/want to pay for thread safety?

评论 #20338391 未加载

评论 #20252539 未加载

评论 #20252248 未加载

评论 #20254440 未加载

m0zgalmost 6 years ago

The important thing about all this is to measure perf on realistic workloads before and after. I don't really believe in allocators that have "excellent performance" on everything.

ksecalmost 6 years ago

tuananhalmost 6 years ago

the redis benchmark is interesting! Maybe antirez can make sth out of it

john-ajalmost 6 years ago

I never like names that require a “pronounced like” note, but cool project regardless.

评论 #20251643 未加载

评论 #20254203 未加载

评论 #20252996 未加载

PieUseralmost 6 years ago

they should not have tested on AWS

brian_herman__almost 6 years ago

I should create memealloc which converts all memory allocations to base64 encoded gifs

评论 #20264031 未加载

Mimalloc – A compact general-purpose allocator

14 comments

Mimalloc – A compact general-purpose allocator

14 comments