linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Jinjiang Tu <tujinjiang@huawei.com>
Cc: rientjes@google.com, shakeel.butt@linux.dev,
	akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com,
	matthew.brost@intel.com, joshua.hahnjy@gmail.com,
	rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	linux-mm@kvack.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm/oom_kill: kill current in OOM when binding to cpu-less nodes
Date: Mon, 8 Sep 2025 13:26:07 +0200	[thread overview]
Message-ID: <aL69T2Dbdw-l2hUS@tiehlicka> (raw)
In-Reply-To: <ed3823fb-01fb-498e-8248-f4628d258276@huawei.com>

On Mon 08-09-25 19:13:52, Jinjiang Tu wrote:
> 
> 在 2025/9/8 17:11, Michal Hocko 写道:
> > On Mon 08-09-25 16:16:38, Jinjiang Tu wrote:
> > > 在 2025/9/8 15:46, Michal Hocko 写道:
> > > > On Sat 06-09-25 09:56:16, Jinjiang Tu wrote:
> > > > > In our use case, movable nodes are in all cpusets, so that movable nodes can be
> > > > > used by all tasks. Even though we move tasks into cpusets that only allow to allocate
> > > > > from movable nodes, oom_cpuset_eligible()->cpuset_mems_allowed_intersects() returns true for
> > > > > all tasks.
> > > > Right but this is because you allowed _all_ tasks to allocate from those
> > > > movable nodes so why would that be an unexpected behavior?
> > > > 
> > > > > Maybe when oc->nodemask == movable nodes, only select tasks whose mempolicy intersects with oc->nodemask.
> > > > > Like the following:
> > > > > 
> > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > > index eb83cff7db8c..e56b6de836a6 100644
> > > > > --- a/mm/mempolicy.c
> > > > > +++ b/mm/mempolicy.c
> > > > > @@ -2328,6 +2328,9 @@ bool mempolicy_in_oom_domain(struct task_struct *tsk,
> > > > >           if (!mask)
> > > > >                   return ret;
> > > > > +       if (!nodes_intersects(*oc->nodemask, node_states[N_CPU]))
> > > > > +               ret = false;
> > > > > +
> > > > Nope, this doesn't really make much sense TBH. I believe you should stop
> > > > special casing cpuless nodes and look into the actual configuration and
> > > > check how to make cpuset based OOM tasks selection. Your underlying
> > > > problem is not about no CPUs assigned to a numa node but an allocation
> > > > constrain based on movability of allocations so you need to find a
> > > > solution that is dealing with that constrain.
> > > Many tasks are in the root cpuset, systemd for example. The root cpuset
> > > contains all nodes, we couldn't exclude cpu-less nodes.
> > > 
> > > If we reply on cpuset based OOM tasks selection, tasks in root cpuset may
> > > still be selected.
> > If you start by killing tasks from the cpuset of the currently
> > allocating task then this shouldn't really happen, right?
> 
> Do you mean we should put the tasks into the same cpuset, and then limit the max usage
> of the memcg, make it only trigger memcg OOM, to select tasks from the same memcg?

No I mean that you should partition your system by cpusets and if there
is a mempolicy OOM situation then you select oom victim from the cpuset
the current task is allocating from. You can imploy memcg cgroup
controller as well but this is orthogonal thing.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2025-09-08 11:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-04 13:44 Jinjiang Tu
2025-09-04 14:25 ` Michal Hocko
2025-09-05  1:56   ` Jinjiang Tu
2025-09-05  8:08     ` Michal Hocko
2025-09-05  8:18       ` Jinjiang Tu
2025-09-05  9:10         ` Michal Hocko
2025-09-05  9:25           ` Jinjiang Tu
2025-09-05  9:42             ` Michal Hocko
2025-09-06  1:56               ` Jinjiang Tu
2025-09-08  7:46                 ` Michal Hocko
2025-09-08  8:16                   ` Jinjiang Tu
2025-09-08  9:11                     ` Michal Hocko
2025-09-08 11:07                       ` Jinjiang Tu
2025-09-08 11:13                       ` Jinjiang Tu
2025-09-08 11:26                         ` Michal Hocko [this message]
2025-09-05  9:13   ` Michal Hocko
2025-09-04 14:26 ` Joshua Hahn
2025-09-04 14:36   ` Michal Hocko
2025-09-04 14:43     ` Joshua Hahn
2025-09-05  2:05       ` Jinjiang Tu
2025-09-08 17:50 ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aL69T2Dbdw-l2hUS@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=rakie.kim@sk.com \
    --cc=rientjes@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tujinjiang@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox