* [PATCH] mm/mempolicy: fix lock contention on mems_allowed
@ 2022-08-09 10:49 Abel Wu
2022-08-09 12:11 ` Michal Hocko
0 siblings, 1 reply; 4+ messages in thread
From: Abel Wu @ 2022-08-09 10:49 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Michal Hocko, Mel Gorman, Muchun Song
Cc: linux-mm, linux-kernel, Abel Wu
The mems_allowed field can be modified by other tasks, so it
isn't safe to access it with alloc_lock unlocked even in the
current process context.
Fixes: 78b132e9bae9 ("mm/mempolicy: remove or narrow the lock on current")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
mm/mempolicy.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..ae422e44affb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -855,12 +855,14 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags,
goto out;
}
+ task_lock(current);
ret = mpol_set_nodemask(new, nodes, scratch);
if (ret) {
+ task_unlock(current);
mpol_put(new);
goto out;
}
- task_lock(current);
+
old = current->mempolicy;
current->mempolicy = new;
if (new && new->mode == MPOL_INTERLEAVE)
@@ -1295,7 +1297,9 @@ static long do_mbind(unsigned long start, unsigned long len,
NODEMASK_SCRATCH(scratch);
if (scratch) {
mmap_write_lock(mm);
+ task_lock(current);
err = mpol_set_nodemask(new, nmask, scratch);
+ task_unlock(current);
if (err)
mmap_write_unlock(mm);
} else
--
2.31.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/mempolicy: fix lock contention on mems_allowed
2022-08-09 10:49 [PATCH] mm/mempolicy: fix lock contention on mems_allowed Abel Wu
@ 2022-08-09 12:11 ` Michal Hocko
2022-08-11 8:43 ` Abel Wu
0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2022-08-09 12:11 UTC (permalink / raw)
To: Abel Wu
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Muchun Song,
linux-mm, linux-kernel
On Tue 09-08-22 18:49:27, Abel Wu wrote:
> The mems_allowed field can be modified by other tasks, so it
> isn't safe to access it with alloc_lock unlocked even in the
> current process context.
It would be useful to describe the racing scenario and the effect it
would have. 78b132e9bae9 hasn't really explained thinking behind and why
it was considered safe to drop the lock. I assume it was based on the
fact that the operation happens on the current task but this is hard to
tell.
> Fixes: 78b132e9bae9 ("mm/mempolicy: remove or narrow the lock on current")
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
> mm/mempolicy.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index d39b01fd52fe..ae422e44affb 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -855,12 +855,14 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags,
> goto out;
> }
>
> + task_lock(current);
> ret = mpol_set_nodemask(new, nodes, scratch);
> if (ret) {
> + task_unlock(current);
> mpol_put(new);
> goto out;
> }
> - task_lock(current);
> +
> old = current->mempolicy;
> current->mempolicy = new;
> if (new && new->mode == MPOL_INTERLEAVE)
> @@ -1295,7 +1297,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> NODEMASK_SCRATCH(scratch);
> if (scratch) {
> mmap_write_lock(mm);
> + task_lock(current);
> err = mpol_set_nodemask(new, nmask, scratch);
> + task_unlock(current);
> if (err)
> mmap_write_unlock(mm);
> } else
> --
> 2.31.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/mempolicy: fix lock contention on mems_allowed
2022-08-09 12:11 ` Michal Hocko
@ 2022-08-11 8:43 ` Abel Wu
2022-08-11 9:09 ` Michal Hocko
0 siblings, 1 reply; 4+ messages in thread
From: Abel Wu @ 2022-08-11 8:43 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Muchun Song,
linux-mm, linux-kernel, Abel Wu
On 8/9/22 8:11 PM, Michal Hocko Wrote:
> On Tue 09-08-22 18:49:27, Abel Wu wrote:
>> The mems_allowed field can be modified by other tasks, so it
>> isn't safe to access it with alloc_lock unlocked even in the
>> current process context.
>
> It would be useful to describe the racing scenario and the effect it
> would have. 78b132e9bae9 hasn't really explained thinking behind and why
> it was considered safe to drop the lock. I assume it was based on the
> fact that the operation happens on the current task but this is hard to
> tell.
>
Sorry for my poor description. Say there are two tasks: A from cpusetA
is performing set_mempolicy(2), and B is changing cpusetA's cpuset.mems.
A (set_mempolicy) B (echo xx > cpuset.mems)
pol = mpol_new();
update_tasks_nodemask(cpusetA) {
foreach t in cpusetA {
cpuset_change_task_nodemask(t) {
task_lock(t); // t could be A
mpol_set_nodemask(pol) {
new = f(A->mems_allowed);
update t->mems_allowed;
pol.create(pol, new);
}
task_unlock(t);
task_lock(A);
A->mempolicy = pol;
task_unlock(A);
}
}
}
In this case A's pol->nodes is computed by old mems_allowed, and could
be inconsistent with A's new mems_allowed.
While it is different when replacing vmas' policy: the pol->nodes is
gone wild only when current_cpuset_is_being_rebound():
A (mbind) B (echo xx > cpuset.mems)
cpuset_being_rebound = cpusetA;
update_tasks_nodemask(cpusetA) {
foreach t in cpusetA {
cpuset_change_task_nodemask(t) {
task_lock(t); // t could be A
pol = mpol_new();
mmap_write_lock(A->mm);
mpol_set_nodemask(pol) {
mask = f(A->mems_allowed);
update t->mems_allowed;
pol.create(pol, mask);
}
task_unlock(t);
}
foreach v in A->mm {
if (current_cpuset_is_being_rebound())
pol.rebind(pol, cpuset.mems);
v->vma_policy = pol;
}
mmap_write_unlock(A->mm);
mmap_write_lock(t->mm);
mpol_rebind_mm(t->mm);
mmap_write_unlock(t->mm);
}
}
cpuset_being_rebound = NULL;
In this case, the cpuset.mems, which has already done updating, is
finally used for calculating pol->nodes, rather than A->mems_allowed.
So it is OK to call mpol_set_nodemask() with alloc_lock unlocked when
doing mbind(2).
Best Regards,
Abel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/mempolicy: fix lock contention on mems_allowed
2022-08-11 8:43 ` Abel Wu
@ 2022-08-11 9:09 ` Michal Hocko
0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2022-08-11 9:09 UTC (permalink / raw)
To: Abel Wu
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Muchun Song,
linux-mm, linux-kernel
On Thu 11-08-22 16:43:28, Abel Wu wrote:
> On 8/9/22 8:11 PM, Michal Hocko Wrote:
> > On Tue 09-08-22 18:49:27, Abel Wu wrote:
> > > The mems_allowed field can be modified by other tasks, so it
> > > isn't safe to access it with alloc_lock unlocked even in the
> > > current process context.
> >
> > It would be useful to describe the racing scenario and the effect it
> > would have. 78b132e9bae9 hasn't really explained thinking behind and why
> > it was considered safe to drop the lock. I assume it was based on the
> > fact that the operation happens on the current task but this is hard to
> > tell.
> >
>
> Sorry for my poor description. Say there are two tasks: A from cpusetA
> is performing set_mempolicy(2), and B is changing cpusetA's cpuset.mems.
>
> A (set_mempolicy) B (echo xx > cpuset.mems)
>
> pol = mpol_new();
> update_tasks_nodemask(cpusetA) {
> foreach t in cpusetA {
> cpuset_change_task_nodemask(t) {
> task_lock(t); // t could be A
> mpol_set_nodemask(pol) {
> new = f(A->mems_allowed);
> update t->mems_allowed;
> pol.create(pol, new);
> }
> task_unlock(t);
> task_lock(A);
> A->mempolicy = pol;
> task_unlock(A);
> }
> }
> }
>
> In this case A's pol->nodes is computed by old mems_allowed, and could
> be inconsistent with A's new mems_allowed.
>
> While it is different when replacing vmas' policy: the pol->nodes is
> gone wild only when current_cpuset_is_being_rebound():
>
> A (mbind) B (echo xx > cpuset.mems)
>
> cpuset_being_rebound = cpusetA;
> update_tasks_nodemask(cpusetA) {
> foreach t in cpusetA {
> cpuset_change_task_nodemask(t) {
> task_lock(t); // t could be A
> pol = mpol_new();
> mmap_write_lock(A->mm);
> mpol_set_nodemask(pol) {
> mask = f(A->mems_allowed);
> update t->mems_allowed;
> pol.create(pol, mask);
> }
> task_unlock(t);
> }
> foreach v in A->mm {
> if (current_cpuset_is_being_rebound())
> pol.rebind(pol, cpuset.mems);
> v->vma_policy = pol;
> }
> mmap_write_unlock(A->mm);
> mmap_write_lock(t->mm);
> mpol_rebind_mm(t->mm);
> mmap_write_unlock(t->mm);
> }
> }
> cpuset_being_rebound = NULL;
>
> In this case, the cpuset.mems, which has already done updating, is
> finally used for calculating pol->nodes, rather than A->mems_allowed.
> So it is OK to call mpol_set_nodemask() with alloc_lock unlocked when
> doing mbind(2).
Please add this to the patch changelog.
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-11 9:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09 10:49 [PATCH] mm/mempolicy: fix lock contention on mems_allowed Abel Wu
2022-08-09 12:11 ` Michal Hocko
2022-08-11 8:43 ` Abel Wu
2022-08-11 9:09 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox