在 2025/12/15 8:04, Andrew Morton 写道: > On Sat, 13 Dec 2025 16:29:11 +0800 Jinjiang Tu wrote: > >> When mempolicy is rebound due to the process moves to a different cpuset >> context, or the set of nodes allowed by current cpuset context changes, >> mpol_rebind_nodemask() remaps the nodemask according to the old and new >> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed >> to store the old nodemask allowed by cpuset. >> >> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's >> passed nodemask and nodes allowed by new cpuset context. >> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that >> are relative to the set of node IDs allowed by the process's current >> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed >> nodemask. >> >> commit bda420b98505 ("numa balancing: migrate on fault among multiple >> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing >> for MPOL_BIND, the behaviour of rebinding should be same with default >> befaviour. However, mpol_store_user_nodemask() returns true for >> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores >> the user's passed nodemask instead of cpuset_current_mems_allowed, and >> mpol_rebind_nodemask() remaps wrongly. > Thanks. > > I find the changelog hard to follow, unfortunately. It's odd that the > problem description comes in the final paragraph! > > I cheekily changed that and then fed the text into Gemini, which > I think helped. What do you think of the below? > Thanks, it describes the problem more clearly. I will update the commit log in the next version. > I won't merge the patch at this time - I'll await reviewer input. > > > ## Bug Fix: Corrected `MPOL_BIND` Rebinding with `MPOL_F_NUMA_BALANCING` > > ### Problem > > The commit `bda420b98505` ("numa balancing: migrate on fault among > multiple bound nodes") introduced the new flag > **`MPOL_F_NUMA_BALANCING`** to enable NUMA balancing for the > **`MPOL_BIND`** memory policy. > > The intended behavior was for the rebinding logic to remain the same as > the default `MPOL_BIND` behavior. However, the function > `mpol_store_user_nodemask()` was incorrectly returning `true` for > policies containing `MPOL_F_NUMA_BALANCING`. > > This led to a bug where: > > 1. `mempolicy.w.cpuset_mems_allowed` stored the **user's passed > nodemask** instead of the actual nodemask allowed by the current > cpuset context (`cpuset_current_mems_allowed`). > > 2. Consequently, **`mpol_rebind_nodemask()` performed incorrect > remapping** when the mempolicy was rebound. > > ### Analysis of Correct Rebinding Logic > > When a memory policy is rebound (e.g., because the process moves to a > different cpuset context, or the allowed nodes within the current > cpuset change), `mpol_rebind_nodemask()`, by default, remaps the > policy's nodemask based on the transition between the **old** and > **new** `cpuset_mems_allowed` sets. > > To support this mechanism correctly, `mempolicy.w.cpuset_mems_allowed` > **must store the old nodemask allowed by the cpuset** before the > transition. > > ### Context for Other Flags > > * **`MPOL_F_STATIC_NODES`**: This flag suppresses the node remap and > simply intersects the user's passed nodemask with the nodes allowed > by the new cpuset context. > > * **`MPOL_F_RELATIVE_NODES`**: For this policy, the user's passed > nodemask represents node IDs **relative** to the set of node IDs > allowed by the process's current cpuset. Therefore, > `mempolicy.w.user_nodemask` is correctly used to store the user's > original relative nodemask. > > ### Proposed Fix > > Ensure that `mpol_store_user_nodemask()` handles > `MPOL_F_NUMA_BALANCING` correctly so that > `mempolicy.w.cpuset_mems_allowed` stores the correct cpuset-allowed > nodemask, thereby restoring the proper remapping behavior in > `mpol_rebind_nodemask()`. >