Non-uniform memory access meets the OOM killer

109 pointsby r4umabout 7 years ago

13 comments

adrianmonkabout 7 years ago

The OOM uses a heuristic to figure out what to kill. If the primary purpose of your system is to run some process that hangs on to a lot of RAM, that heuristic is exactly the opposite of what you need, so it would a good idea to disable it or exempt that process.Also, while I'm talking prophylactics, if you have (which you should) monitoring and alerting in your production environment, it seems like there should be an alert for whenever the OOM killer activates. Assuming you are allocating resources carefully enough that you expect everything to fit, if it fires, it's almost always a sign that things are not going according to plan and need to be investigated sooner rather than later.

评论 #16723698 未加载

saagarjhaabout 7 years ago

A lot of time seems to go into tricking the watchdogs on single purpose machines. I heard a story once of a guy who wanted to get some computation done, but the process was being deproritized by the scheduler because it seemed like it was a hung process that kept asking for CPU time. The solution he came up with was voluntarily relinquishing compute accesd right before anyone would would check up on it, making it appear as if the process was great at sharing time with others. By doing this, he could get that one process’s instructions running something like 99% of the time.

评论 #16722857 未加载

评论 #16722395 未加载

评论 #16727549 未加载

cthalupaabout 7 years ago

>This new version also had this wacky little "feature" where it tried to bind itself to a single NUMA node.This is 100% a feature. If you care at all about memory access latency, you want to remain local to the NUMA node. Foreign memory access is significantly slower. If you have NUMA enabled and your applications are not NUMA aware, and there are shared pages being access by applications running on both nodes, the NUMA rebalancing can actually cause even worse performance as it constantly moves the pages from one node to the other.Any application that cares about memory access latency should 100% be written to be NUMA aware, and if it is not, you should be using numactl to bind the application to the proper node.This also goes for PCI-E devices (including nvme drives!) as they are going to be bound to a NUMA node as well. If you have an application that is accessing an nvme volume, or using a GPU, you should 100% make sure that it is running on the same node as the pci-e bus for that device.

smarksabout 7 years ago

Time to re-up this classic from Andries Brouwer:<a href="https://lwn.net/Articles/104185/" rel="nofollow">https://lwn.net/Articles/104185/</a>

speedplaneabout 7 years ago

It's not commonplace for even medium size companies to run dozens of servers. Memory resources (as well as disk and CPU) are always being stretched. The OOM may have been sufficient for single server environments, where you could always provision an extra 40%, but it's far too blunt of a tool.Most environments I've worked with have to define an instance size (in memory and CPU), and determine how many parallel threads/processes will run on it. Plus you need to determine when and how to scale up to more instances. To reduce costs, the goal is to 100% utilization, but also with the capability deal with spikes in traffic an workload, and all with an acceptable error rate.Unfortunately, doing this type sizing/scaling analysis is incredibly difficult. The opaque effects of the OOM make it even more difficult. I'm sure the OOM uses a deterministic algorithm, but it's complex enough that most don't know it, or handle for it. In a server environment, if the OOM kills a service, your app and all other services are likely hosed. It would be far more preferable if the OOM had a straightforward, consistent, and deterministic method to dealing with low memory. This way programmers would know to look out for it, and could handle it more consistently.

ParrotyErrorabout 7 years ago

The OOM killer was a misfeature when it was designed. Why is it still in the kernel? Solaris solved this problem 20 years ago.

评论 #16723111 未加载

评论 #16722897 未加载

评论 #16722811 未加载

评论 #16723295 未加载

评论 #16724087 未加载

n_tabout 7 years ago

That's why one needs to be aware of memory and other load characteristics of system, particularly if it is an enterprise system. Various process should be put in different cgroups with defined resources. cgroups also provides memory pressure notification and other goodies too. If it is an embedded system, probably it is best to turn off overcommit. Finally, for critical processes, use oom.priority so that process can be excluded from being killed.

StreamBrightabout 7 years ago

This is the reason i am big fan of running any software with separate users and setting ulimit to a low value so that something stupid like this cannot impact the production service. I would be super keen to try to replicate this scenario on my test cluster and see if my settings catching it. Does anybody know if the software in question is an opensource tool?

评论 #16724163 未加载

jschwartziabout 7 years ago

At my last job I wrote a build system that build maybe 30 or 40 executables from several hundred source files. Sometimes when I'd run make -j with no constraint my desktop environment would crash.It turned out that the OOM killer was triggering because I was filling up memory with compiler invocations.I was really proud of that bug.

评论 #16724181 未加载

评论 #16725197 未加载

ameliusabout 7 years ago

Reminds me of:<a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/" rel="nofollow">https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...</a>

ben_baiabout 7 years ago

What happened to good old returning NULL when no memory is available?No let's do overcommit (malloc always works) and OOM-kill some random process when under memory pressure!

评论 #16722610 未加载

评论 #16722895 未加载

评论 #16722541 未加载

评论 #16724306 未加载

评论 #16723440 未加载

评论 #16722706 未加载

dis-sysabout 7 years ago

Being able to write NUMA aware applications like the one described in the article is a luxury for ALL Go users. The current Go runtime doesn't have any NUMA awareness.As of today, you can get a two NUMA nodes processor (AMD threadripper 1900X) for as little as $449.

BrainInAJarabout 7 years ago

Memory overcommit is the most hostile, idiotic misfeature to ever ship in any mainstream operating system. It's such a great example of why one should pay absolutely no concern to Linus