On Thu, 4 Sep 2025 16:36:28 +0200 Michal Hocko <mhocko@suse.com> wrote:On Thu 04-09-25 07:26:25, Joshua Hahn wrote:On Thu, 4 Sep 2025 21:44:31 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote: Hello Jinjiang, I hope you are doing well, thank you for this patchset!out_of_memory() selects tasks without considering mempolicy. Assuming a cpu-less NUMA Node, ordinary process that don't set mempolicy don't allocate memory from this cpu-less Node, unless other NUMA Nodes are below low watermark. If a task binds to this cpu-less Node and triggers OOM, many tasks may be killed wrongly that don't occupy memory from this Node.I am wondeirng whether you have seen this happen in practice, or if this is just based on inspecting the code. I have a feeling that the case you are concerned about may already be covered in select_bad_process. out_of_memory(oc) select_bad_process(oc) oom_evaluate_task(p, oc) oom_cpuset_eligible(task, oc) [...snip...] for_each_thread(start, tsk) { if (mask) { ret = mempolicy_in_oom_domain(tsk, mask); } else { ret = cpuset_mems_allowed_intersects(current, tsk) } } While iterating through the list of candidate processes, we check whether oc->nodemask exists, and if not, we check if the nodemasks intersects. It seems like these are the two checks that you add in the helper function. With that said, I might be missing something obvious -- please feel to correct me if I am misunderstanding your patch or if I'm missing something in the existing oom target selection : -)The thing with mempolicy_in_oom_domain is that it doesn't really do what you might be thinking it is doing ;) as it will true also for tasks without any NUMA affinity because those intersect with the given mask by definition as they can allocate from any node. So they are eligible and that is what Jinjiang Tu is considered about I believe.Hello Michal! Thank you for your insights : -) Looking back, I made the mistake of thinking that we cared about the !oc->nodemask case, where Jinjiang's patch cares about the oc->nodemask == True case. So I was checking that cpuset_mems_allowed_intersects was the same as nodes_intersects, whereas I should have been checking if mempolicy_in_oom_domain is correct.
Most tasks don't mbind to specific nodes. In our use case, as described in the reply to Michal, ordinary tasks are unlikely to allocate from these cpu-less NUMA Nodes.
Looking into it, everything you said is correct and I think I defintely overlooked what the patch was trying to do. Thank you for clarifying these points for me! I hope you have a great day, Joshua-- Michal Hocko SUSE Labs