在 2025/9/4 22:43, Joshua Hahn 写道: > On Thu, 4 Sep 2025 16:36:28 +0200 Michal Hocko wrote: > >> On Thu 04-09-25 07:26:25, Joshua Hahn wrote: >>> On Thu, 4 Sep 2025 21:44:31 +0800 Jinjiang Tu wrote: >>> >>> Hello Jinjiang, >>> >>> I hope you are doing well, thank you for this patchset! >>> >>>> out_of_memory() selects tasks without considering mempolicy. Assuming a >>>> cpu-less NUMA Node, ordinary process that don't set mempolicy don't >>>> allocate memory from this cpu-less Node, unless other NUMA Nodes are below >>>> low watermark. If a task binds to this cpu-less Node and triggers OOM, many >>>> tasks may be killed wrongly that don't occupy memory from this Node. >>> I am wondeirng whether you have seen this happen in practice, or if this is >>> just based on inspecting the code. I have a feeling that the case you are >>> concerned about may already be covered in select_bad_process. >>> >>> out_of_memory(oc) >>> select_bad_process(oc) >>> oom_evaluate_task(p, oc) >>> oom_cpuset_eligible(task, oc) >>> >>> [...snip...] >>> >>> for_each_thread(start, tsk) { >>> if (mask) { >>> ret = mempolicy_in_oom_domain(tsk, mask); >>> } else { >>> ret = cpuset_mems_allowed_intersects(current, tsk) >>> } >>> } >>> >>> While iterating through the list of candidate processes, we check whether >>> oc->nodemask exists, and if not, we check if the nodemasks intersects. It seems >>> like these are the two checks that you add in the helper function. >>> >>> With that said, I might be missing something obvious -- please feel to >>> correct me if I am misunderstanding your patch or if I'm missing something >>> in the existing oom target selection : -) >> The thing with mempolicy_in_oom_domain is that it doesn't really do what >> you might be thinking it is doing ;) as it will true also for tasks >> without any NUMA affinity because those intersect with the given mask by >> definition as they can allocate from any node. So they are eligible and >> that is what Jinjiang Tu is considered about I believe. > Hello Michal! Thank you for your insights : -) > > Looking back, I made the mistake of thinking that we cared about the > !oc->nodemask case, where Jinjiang's patch cares about the oc->nodemask == True > case. So I was checking that cpuset_mems_allowed_intersects was the same as > nodes_intersects, whereas I should have been checking if mempolicy_in_oom_domain > is correct. Most tasks don't mbind to specific nodes. In our use case, as described in the reply to Michal, ordinary tasks are unlikely to allocate from these cpu-less NUMA Nodes. > > Looking into it, everything you said is correct and I think I defintely > overlooked what the patch was trying to do. Thank you for clarifying these > points for me! > > I hope you have a great day, > Joshua > >> -- >> Michal Hocko >> SUSE Labs