在 2025/9/8 17:11, Michal Hocko 写道: > On Mon 08-09-25 16:16:38, Jinjiang Tu wrote: >> 在 2025/9/8 15:46, Michal Hocko 写道: >>> On Sat 06-09-25 09:56:16, Jinjiang Tu wrote: >>>> In our use case, movable nodes are in all cpusets, so that movable nodes can be >>>> used by all tasks. Even though we move tasks into cpusets that only allow to allocate >>>> from movable nodes, oom_cpuset_eligible()->cpuset_mems_allowed_intersects() returns true for >>>> all tasks. >>> Right but this is because you allowed _all_ tasks to allocate from those >>> movable nodes so why would that be an unexpected behavior? >>> >>>> Maybe when oc->nodemask == movable nodes, only select tasks whose mempolicy intersects with oc->nodemask. >>>> Like the following: >>>> >>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >>>> index eb83cff7db8c..e56b6de836a6 100644 >>>> --- a/mm/mempolicy.c >>>> +++ b/mm/mempolicy.c >>>> @@ -2328,6 +2328,9 @@ bool mempolicy_in_oom_domain(struct task_struct *tsk, >>>> if (!mask) >>>> return ret; >>>> + if (!nodes_intersects(*oc->nodemask, node_states[N_CPU])) >>>> + ret = false; >>>> + >>> Nope, this doesn't really make much sense TBH. I believe you should stop >>> special casing cpuless nodes and look into the actual configuration and >>> check how to make cpuset based OOM tasks selection. Your underlying >>> problem is not about no CPUs assigned to a numa node but an allocation >>> constrain based on movability of allocations so you need to find a >>> solution that is dealing with that constrain. >> Many tasks are in the root cpuset, systemd for example. The root cpuset >> contains all nodes, we couldn't exclude cpu-less nodes. >> >> If we reply on cpuset based OOM tasks selection, tasks in root cpuset may >> still be selected. > If you start by killing tasks from the cpuset of the currently > allocating task then this shouldn't really happen, right? Yes, indeed.