From: Michal Hocko <mhocko@suse.com>
To: Gang Li <ligang.bdlg@bytedance.com>
Cc: Waiman Long <longman@redhat.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org, rientjes@google.com,
Zefan Li <lizefan.x@bytedance.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] mm: oom: introduce cpuset oom
Date: Tue, 11 Apr 2023 16:36:26 +0200 [thread overview]
Message-ID: <ZDVwaqzOBNTpuR1w@dhcp22.suse.cz> (raw)
In-Reply-To: <20230411065816.9798-1-ligang.bdlg@bytedance.com>
On Tue 11-04-23 14:58:15, Gang Li wrote:
> Cpusets constrain the CPU and Memory placement of tasks.
> `CONSTRAINT_CPUSET` type in oom has existed for a long time, but
> has never been utilized.
>
> When a process in cpuset which constrain memory placement triggers
> oom, it may kill a completely irrelevant process on other numa nodes,
> which will not release any memory for this cpuset.
>
> We can easily achieve node aware oom by using `CONSTRAINT_CPUSET` and
> selecting victim from cpusets with the same mems_allowed as the
> current one.
I believe it still wouldn't hurt to be more specific here.
CONSTRAINT_CPUSET is rather obscure. Looking at this just makes my head
spin.
/* Check this allocation failure is caused by cpuset's wall function */
for_each_zone_zonelist_nodemask(zone, z, oc->zonelist,
highest_zoneidx, oc->nodemask)
if (!cpuset_zone_allowed(zone, oc->gfp_mask))
cpuset_limited = true;
Does this even work properly and why? prepare_alloc_pages sets
oc->nodemask to current->mems_allowed but the above gives us
cpuset_limited only if there is at least one zone/node that is not
oc->nodemask compatible. So it seems like this wouldn't ever get set
unless oc->nodemask got reset somewhere. This is a maze indeed. Is there
any reason why we cannot rely on __GFP_HARWALL here? Or should we
instead rely on the fact the nodemask should be same as
current->mems_allowed?
I do realize that this is not directly related to your patch but
considering this has been mostly doing nothing maybe we want to document
it better or even rework it at this occasion.
> Example:
>
> Create two processes named mem_on_node0 and mem_on_node1 constrained
> by cpusets respectively. These two processes alloc memory on their
> own node. Now node0 has run out of memory, OOM will be invokled by
> mem_on_node0.
Don't you have an actual real life example with a properly partitioned
system which clearly misbehaves and this patch addresses that?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-04-11 14:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 6:58 Gang Li
2023-04-11 12:23 ` Michal Koutný
2023-04-11 13:04 ` Gang Li
2023-04-11 13:12 ` Michal Hocko
2023-04-11 13:17 ` Gang Li
2023-04-11 15:08 ` Michal Koutný
2023-04-11 14:36 ` Michal Hocko [this message]
2023-08-17 8:40 ` Gang Li
2023-08-17 16:45 ` Waiman Long
2023-08-22 6:31 ` Gang Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDVwaqzOBNTpuR1w@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=cgroups@vger.kernel.org \
--cc=ligang.bdlg@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=longman@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox