From: Gang Li <ligang.bdlg@bytedance.com>
To: David Rientjes <rientjes@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>, Tejun Heo <tj@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: Re: [RFC PATCH v1] mm: oom: introduce cpuset oom
Date: Mon, 26 Sep 2022 11:38:10 +0800 [thread overview]
Message-ID: <b3ff8456-fe0e-95a0-cccd-e94025a82560@bytedance.com> (raw)
In-Reply-To: <18621b07-256b-7da1-885a-c96dfc8244b6@google.com>
On 2022/9/23 03:18, David Rientjes wrote:
> On Wed, 21 Sep 2022, Gang Li wrote:
>
>> cpuset confine processes to processor and memory node subsets.
>> When a process in cpuset triggers oom, it may kill a completely
>> irrelevant process on another numa node, which will not release any
>> memory for this cpuset.
>>
>> It seems that `CONSTRAINT_CPUSET` is not really doing much these
>> days. Using CONSTRAINT_CPUSET, we can easily achieve node aware oom
>> killing by selecting victim from the cpuset which triggers oom.
>>
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>> Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
>
> Hmm, is this the right approach?
>
> If a cpuset results in a oom condition, is there a reason why we'd need to
> find a process from within that cpuset to kill? I think the idea is to
> free memory on the oom set of nodes (cpuset.mems) and that can happen by
> killing a process that is not a member of this cpuset.
>
Hi,
My last patch implemented this idea[1][2]. But it needs to inc/dec a per
mm_struct counter on every page allocation/release/migration.
As the Unixbench show, this takes 0%-3% performance loss on different
workloads[2]. So Michal Hocko inspired me to use cpuset[3].
[1].
https://lore.kernel.org/all/20220512044634.63586-1-ligang.bdlg@bytedance.com/
[2].
https://lore.kernel.org/all/20220708082129.80115-1-ligang.bdlg@bytedance.com/
[3]. https://lore.kernel.org/all/YoJ%2FioXwGTdCywUE@dhcp22.suse.cz/
> I understand the challenges of creating a NUMA aware oom killer to target
> memory that is actually resident on an oom node, but this approach doesn't
> seem right and could actually lead to pathological cases where a small
> process trying to fork in an otherwise empty cpuset is repeatedly oom
> killing when we'd actually prefer to kill a single large process.
>
I think there are three ways to achieve NUMA aware oom killer:
1. Count every page operations, which cause performance loss[2].
2. Iterate over pages(like show_numa_map) for all processes, which may
stuck oom.
3. Select victim in a cpuset, which may leads to pathological kill.(this
patch)
None of them are perfect and I'm getting stuck, do you have any ideas?
prev parent reply other threads:[~2022-09-26 3:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-21 6:47 Gang Li
2022-09-22 19:18 ` David Rientjes
2022-09-23 7:45 ` Michal Hocko
2022-09-29 3:15 ` Gang Li
2022-09-26 3:38 ` Gang Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b3ff8456-fe0e-95a0-cccd-e94025a82560@bytedance.com \
--to=ligang.bdlg@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@suse.com \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox