From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 9D0F36B0006 for ; Tue, 24 Apr 2018 11:13:27 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id e14so13410507pfi.9 for ; Tue, 24 Apr 2018 08:13:27 -0700 (PDT) Received: from mail.windriver.com (mail.windriver.com. [147.11.1.11]) by mx.google.com with ESMTPS id t2-v6si13947715plo.235.2018.04.24.08.13.25 for (version=TLS1_1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 24 Apr 2018 08:13:25 -0700 (PDT) Message-ID: <5ADF498C.1@windriver.com> Date: Tue, 24 Apr 2018 11:13:16 -0400 From: Chris Friesen MIME-Version: 1.0 Subject: Re: per-NUMA memory limits in mem cgroup? References: <5ADA26AB.6080209@windriver.com> <20180422124648.GD17484@dhcp22.suse.cz> <5ADDFBD1.7010009@windriver.com> <20180424132721.GF17484@dhcp22.suse.cz> In-Reply-To: <20180424132721.GF17484@dhcp22.suse.cz> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: "cgroups@vger.kernel.org" , linux-mm@kvack.org, Johannes Weiner , Vladimir Davydov On 04/24/2018 09:27 AM, Michal Hocko wrote: > On Mon 23-04-18 11:29:21, Chris Friesen wrote: >> On 04/22/2018 08:46 AM, Michal Hocko wrote: >>> On Fri 20-04-18 11:43:07, Chris Friesen wrote: >> >>>> The specific scenario I'm considering is that of a hypervisor host. I have >>>> system management stuff running on the host that may need more than one >>>> core, and currently these host tasks might be affined to cores from multiple >>>> NUMA nodes. I'd like to put a cap on how much memory the host tasks can >>>> allocate from each NUMA node in order to ensure that there is a guaranteed >>>> amount of memory available for VMs on each NUMA node. >>>> >>>> Is this possible, or are the knobs just not there? >>> >>> Not possible right now. What would be the policy when you reach the >>> limit on one node? Fallback to other nodes? What if those hit the limit >>> as well? OOM killer or an allocation failure? >> >> I'd envision it working exactly the same as the current memory cgroup, but >> with the ability to specify optional per-NUMA-node limits in addition to >> system-wide. > > OK, so you would have a per numa percentage of the hard limit? I think it'd make more sense as a hard limit per NUMA node. > But more > importantly, note that the page allocation is done way before the charge > so we do not have any control over where the memory get allocated from > so we would have to play nasty tricks in the reclaim to somehow balance > NUMA charge pools. Reading the docs on the memory controller it does seem a bit tricky. I had envisioned some sort of "is there memory left in this group" check before "approving" the memory allocation, but it seems it doesn't really work that way. Chris