科技回声

9 条评论

You can also optimize memory and cpu management through linux control groups. Oracle published a pretty good description (see: example 1: NUMA Pinning) of how to assign dedicated cpus and memory to a process or group of processes [1], but you can also read about the supporting cpuset & memory cgroups subsystems too [2, 3].p.s. I can recently created a screencast about control groups (cgroups) for anyone interested @ <a href="http://sysadmincasts.com/episodes/14-introduction-to-linux-control-groups-cgroups" rel="nofollow">http://sysadmincasts.com/episodes/14-introduction-to-linux-c...</a>[1] <a href="http://www.oracle.com/technetwork/articles/servers-storage-admin/resource-controllers-linux-1506602.html" rel="nofollow">http://www.oracle.com/technetwork/articles/servers-storage-a...</a>[2] <a href="https://www.kernel.org/doc/Documentation/cgroups/memory.txt" rel="nofollow">https://www.kernel.org/doc/Documentation/cgroups/memory.txt</a>[3] <a href="https://www.kernel.org/doc/Documentation/cgroups/cpusets.txt" rel="nofollow">https://www.kernel.org/doc/Documentation/cgroups/cpusets.txt</a>

MichaelGG超过 11 年前

Very interesting note about the reclaiming. Yet another warning when transparently using a NUMA system.NUMA can be a real pain. You can get a 40% hit on direct memory access, and far worse if you're modifying a cacheline in another processor. On one of our VoIP workloads, we noticed major (250%+) increase in performance and CPU stability after splitting a very thread-intensive process into multiple processes, each set with affinity to a particular core.OSes try to help you, but it seems like they're primarily concerned with multiple processes, not huge processes like databases. Such processes should become NUMA aware and handle things themselves for best performance.It might even make sense to ask if you can split the machine on NUMA boundaries and just act like they're separate systems. RAM's getting very cheap, and RAM/core is going up faster than CPU power is (it seems to me, anyways).Also, is there a reason not to use large pages directly for the mmap'd sets if you know you're going to have them hot at all times? (I assume they read the entire file on start?)

评论 #6517383 未加载

评论 #6519289 未加载

introspectif超过 11 年前

"after rolling out our optimizations, we saw our error rates (ie. the proportion of slow or timed out queries) drop by up to 400%"There is some good shared knowledge in the post (unlike this comment, to be fair), but what does drop by 400% mean?If a rate drops by 100% it becomes zero. I get that.If it increases by 400%, the outcome is slightly ambiguous (do we add 400% for 500% total or do we multiply up to 400% of the original value).But a rate decreasing by 400% - am I the only person who finds that (not uncommon) expression hard to conceptualize?

评论 #6517599 未加载

评论 #6517551 未加载

caf超过 11 年前

In regard to conclusion 2, there is another approach here - when you're finished with an old segment, posix_fadvise(..., POSIX_FADV_DONTNEED) can be used to drop it from the page cache.

评论 #6519120 未加载

Erwin超过 11 年前

I was hit by the transparent huge pages on RHEL 6.2 in my workload. If you find our ordinary processes randomly taking up huge amounts of CPU time -- system CPU time -- when doing apparently ordinary tasks, you might be affected too. That was a real pain to diagnose when you're used to trusting the kernel not doing anything that weird. Running "perf top" helped to narrow down what the system was REALLY doing.I didn't have LI-size databases -- just a dozen Python processes allocating each perhaps 300MB and all restarting at the same time were enough to trigger it, taking 10 minutes rather than 2 seconds to start up.

krakensden超过 11 年前

According to LWN, this is probably going to be automatic in the future:<a href="http://lwn.net/Articles/568870/" rel="nofollow">http://lwn.net/Articles/568870/</a> (subscriber-only now, will be free in a week)

dllthomas超过 11 年前

"we saw our error rates (ie. the proportion of slow or timed out queries) drop by up to 400%."Should that be 80%?Edited to add: Apparently it should be 75%, per comments elsewhere.

评论 #6517984 未加载

vosper超过 11 年前

Does the information in this article apply to VMs (specifically AWS) or is it only relevant when you're running directly on hardware?

评论 #6522552 未加载

dhruvbird超过 11 年前

> On small setting for Linux, one dramatic performance improvement for LinkedIn!should be...you know what it should be ;)

9 条评论

WestCoastJustin超过 11 年前

MichaelGG超过 11 年前

评论 #6517383 未加载

评论 #6519289 未加载

introspectif超过 11 年前

评论 #6517599 未加载

评论 #6517551 未加载

caf超过 11 年前

In regard to conclusion 2, there is another approach here - when you're finished with an old segment, posix_fadvise(..., POSIX_FADV_DONTNEED) can be used to drop it from the page cache.

评论 #6519120 未加载

Erwin超过 11 年前

krakensden超过 11 年前

dllthomas超过 11 年前

"we saw our error rates (ie. the proportion of slow or timed out queries) drop by up to 400%."Should that be 80%?Edited to add: Apparently it should be 75%, per comments elsewhere.

评论 #6517984 未加载

vosper超过 11 年前

Does the information in this article apply to VMs (specifically AWS) or is it only relevant when you're running directly on hardware?

评论 #6522552 未加载

dhruvbird超过 11 年前

> On small setting for Linux, one dramatic performance improvement for LinkedIn!should be...you know what it should be ;)

Optimizing Linux Memory Management for Low-latency, High-throughput Databases

9 条评论

Optimizing Linux Memory Management for Low-latency, High-throughput Databases

9 条评论