[PATCH] kernel/fork: Be more careful about dup

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] kernel/fork: Be more careful about dup_mmap() failures
@ 2025-01-23 20:58 Liam R. Howlett
  2025-01-24 11:15 ` Lorenzo Stoakes
  0 siblings, 1 reply; 4+ messages in thread
From: Liam R. Howlett @ 2025-01-23 20:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: maple-tree, linux-mm, linux-kernel, Liam R. Howlett, Jann Horn,
	Lorenzo Stoakes, Peter Zijlstra, Michal Hocko, Peng Zhang

From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

In the even that there is a failure during dup_mmap(), the maple tree
can be left in an unsafe state for other iterators besides the exit
path.

The unsafe state is created after the tree is cloned, but before the
vmas are replaced; if a vma allocation fails (for instance), then the
tree will have a marker (XA_ZERO_ENTRY) to denote where to stop
destroying vmas on the exit path.  This marker replaces a vma in the
tree and may be treated as a pointer to a vma in iterators besides the
special case exit_mmap() iterator.

All the locks are dropped before the exit_mmap() call, but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.

Up to this point, there have been no issues with being able to find an
mm_sturct that was only partially initialised.  Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has
been proven unsafe to use the mm_sturct that hasn't been initialised, as
referenced in the link below.

Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.

This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
the oom side (even though this is extremely unlikely to be selected as
an oom victim in the race window), and sets MMF_UNSTABLE to avoid other
potential users from using a partially initialised mm_struct.

Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Fixes: d240629148377 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
 kernel/fork.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index ded49f18cd95c..20b2120f019ca 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -760,7 +760,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		mt_set_in_rcu(vmi.mas.tree);
 		ksm_fork(mm, oldmm);
 		khugepaged_fork(mm, oldmm);
-	} else if (mpnt) {
+	} else {
+
 		/*
 		 * The entire maple tree has already been duplicated. If the
 		 * mmap duplication fails, mark the failure point with
@@ -768,8 +769,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		 * stop releasing VMAs that have not been duplicated after this
 		 * point.
 		 */
-		mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
-		mas_store(&vmi.mas, XA_ZERO_ENTRY);
+		if (mpnt) {
+			mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
+			mas_store(&vmi.mas, XA_ZERO_ENTRY);
+			/* Avoid OOM iterating a broken tree */
+			set_bit(MMF_OOM_SKIP, &mm->flags);
+		}
+		/*
+		 * The mm_struct is going to exit, but the locks will be dropped
+		 * first.  Set the mm_struct as unstable is advisable as it is
+		 * not fully initialised.
+		 */
+		set_bit(MMF_UNSTABLE, &mm->flags);
 	}
 out:
 	mmap_write_unlock(mm);
-- 
2.43.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] kernel/fork: Be more careful about dup_mmap() failures
  2025-01-23 20:58 [PATCH] kernel/fork: Be more careful about dup_mmap() failures Liam R. Howlett
@ 2025-01-24 11:15 ` Lorenzo Stoakes
  2025-01-24 18:19   ` Liam R. Howlett
  0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Stoakes @ 2025-01-24 11:15 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, Jann Horn,
	Peter Zijlstra, Michal Hocko, Peng Zhang

On Thu, Jan 23, 2025 at 03:58:49PM -0500, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>
> In the even that there is a failure during dup_mmap(), the maple tree
> can be left in an unsafe state for other iterators besides the exit
> path.
>
> The unsafe state is created after the tree is cloned, but before the
> vmas are replaced; if a vma allocation fails (for instance), then the
> tree will have a marker (XA_ZERO_ENTRY) to denote where to stop
> destroying vmas on the exit path.  This marker replaces a vma in the
> tree and may be treated as a pointer to a vma in iterators besides the
> special case exit_mmap() iterator.

Thanks for addresing this.

In the original issue, the problem arose in vm_area_dup() which I tracked
down to vma_iter_load() -> mas_walk() (see [0]).

I'm not sure if your change would actually have addressed this? Would the
MMF_UNSTABLE flag prevent this code path from being taken (if so where?),
if not don't we need to add a check for this somewhere?

[0]:https://lore.kernel.org/all/aa2c1930-becc-4bc5-adfb-96e88290acc7@lucifer.local/

>
> All the locks are dropped before the exit_mmap() call, but the
> incomplete mm_struct can be reached through (at least) the rmap finding
> the vmas which have a pointer back to the mm_struct.
>
> Up to this point, there have been no issues with being able to find an
> mm_sturct that was only partially initialised.  Syzbot was able to make
> the incomplete mm_struct fail with recent forking changes, so it has
> been proven unsafe to use the mm_sturct that hasn't been initialised, as
> referenced in the link below.
>
> Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
> invalid mm") fixed the uprobe access, it does not completely remove the
> race.
>
> This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
> the oom side (even though this is extremely unlikely to be selected as
> an oom victim in the race window), and sets MMF_UNSTABLE to avoid other
> potential users from using a partially initialised mm_struct.

Good to have it set also I think.

We were concerned that _somehow_ MMF_UNSTABLE might cause some issue in
OOM, which was why when I first thought of this we decided maybe not just
to be safe, but I am confident this will not be an issue as indicating that
something is about to be torn down and is unstable when it is in fact
unstable and about to be torn down is correct, plus I checked all the paths
where this impacted and found no issues.

And in any case, setting this flag avoids the problem...!

>
> Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
> Fixes: d240629148377 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
> Cc: Jann Horn <jannh@google.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Peng Zhang <zhangpeng.00@bytedance.com>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
>  kernel/fork.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index ded49f18cd95c..20b2120f019ca 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -760,7 +760,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		mt_set_in_rcu(vmi.mas.tree);
>  		ksm_fork(mm, oldmm);
>  		khugepaged_fork(mm, oldmm);
> -	} else if (mpnt) {
> +	} else {
> +
>  		/*
>  		 * The entire maple tree has already been duplicated. If the
>  		 * mmap duplication fails, mark the failure point with
> @@ -768,8 +769,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		 * stop releasing VMAs that have not been duplicated after this
>  		 * point.
>  		 */
> -		mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> -		mas_store(&vmi.mas, XA_ZERO_ENTRY);
> +		if (mpnt) {
> +			mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> +			mas_store(&vmi.mas, XA_ZERO_ENTRY);
> +			/* Avoid OOM iterating a broken tree */
> +			set_bit(MMF_OOM_SKIP, &mm->flags);
> +		}
> +		/*
> +		 * The mm_struct is going to exit, but the locks will be dropped
> +		 * first.  Set the mm_struct as unstable is advisable as it is
> +		 * not fully initialised.
> +		 */
> +		set_bit(MMF_UNSTABLE, &mm->flags);

Do we still need to do this if !mpnt?

Also 'mpnt' is in the running for the worst variable name in mm...

>  	}
>  out:
>  	mmap_write_unlock(mm);
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] kernel/fork: Be more careful about dup_mmap() failures
  2025-01-24 11:15 ` Lorenzo Stoakes
@ 2025-01-24 18:19   ` Liam R. Howlett
  2025-01-24 19:09     ` Lorenzo Stoakes
  0 siblings, 1 reply; 4+ messages in thread
From: Liam R. Howlett @ 2025-01-24 18:19 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, Jann Horn,
	Peter Zijlstra, Michal Hocko, Peng Zhang, Oleg Nesterov

Adding Oleg, who I forgot on the start of this patch..


* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250124 06:15]:
> On Thu, Jan 23, 2025 at 03:58:49PM -0500, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > In the even that there is a failure during dup_mmap(), the maple tree
> > can be left in an unsafe state for other iterators besides the exit
> > path.
> >
> > The unsafe state is created after the tree is cloned, but before the
> > vmas are replaced; if a vma allocation fails (for instance), then the
> > tree will have a marker (XA_ZERO_ENTRY) to denote where to stop
> > destroying vmas on the exit path.  This marker replaces a vma in the
> > tree and may be treated as a pointer to a vma in iterators besides the
> > special case exit_mmap() iterator.
> 
> Thanks for addresing this.

Thanks for looking at the patch.

> 
> In the original issue, the problem arose in vm_area_dup() which I tracked
> down to vma_iter_load() -> mas_walk() (see [0]).
> 
> I'm not sure if your change would actually have addressed this? Would the
> MMF_UNSTABLE flag prevent this code path from being taken (if so where?),
> if not don't we need to add a check for this somewhere?
> 
> [0]:https://lore.kernel.org/all/aa2c1930-becc-4bc5-adfb-96e88290acc7@lucifer.local/

Remote vma accesses use check_stable_address_space() to avoid races
with the oom killer, which checks MMF_UNSTABLE.

Yeah, uprobe needs to do this and doesn't today.

Oleg suggested this be done in build_map_info(), but the MMF_UNSTABLE
bit is set under the mmap read lock and that function uses the read lock
on the rmap, takes an mm references and records the address -  then
revisits the mm later to get the vma in register_for_each_vma().

I think we need to check in register_for_each_vma(), prior to looking up
the vma.  This way we hold the write lock on the mm and will ensure the
vma is stable from both the fork setup and a potential OOM events.  The
concern is that we have an invalid vma or the prev/next is invalid and
that will only be a concern when operating on the vma itself, so this
would be the safest place imo.

The problem that has been created is that the next vma may be invalid.
This arises when the rmap is used to find a vma then changes that vma.
In this scenario, the mmap lock of the vma needs to be held and that mm
needs to be checked as stable/not oom'ed.  So we can limit the checking
of the code to the scenario where another mm's vma is found and
modified - which may cause a merge and look at the next vma.

I haven't found anywhere else that looks up another mm_struct's vma and
makes a change to the vma.  Considering what that would mean, it makes
sense that this is rare.

So, I'll just do this:

--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1260,6 +1260,9 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
                 * returns NULL in find_active_uprobe_rcu().
                 */
                mmap_write_lock(mm);
+               if (check_stable_address_space(mm)
+                               goto unlock;
+




> 
> >
> > All the locks are dropped before the exit_mmap() call, but the
> > incomplete mm_struct can be reached through (at least) the rmap finding
> > the vmas which have a pointer back to the mm_struct.
> >
> > Up to this point, there have been no issues with being able to find an
> > mm_sturct that was only partially initialised.  Syzbot was able to make
> > the incomplete mm_struct fail with recent forking changes, so it has
> > been proven unsafe to use the mm_sturct that hasn't been initialised, as
> > referenced in the link below.
> >
> > Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
> > invalid mm") fixed the uprobe access, it does not completely remove the
> > race.
> >
> > This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
> > the oom side (even though this is extremely unlikely to be selected as
> > an oom victim in the race window), and sets MMF_UNSTABLE to avoid other
> > potential users from using a partially initialised mm_struct.
> 
> Good to have it set also I think.
> 
> We were concerned that _somehow_ MMF_UNSTABLE might cause some issue in
> OOM, which was why when I first thought of this we decided maybe not just
> to be safe, but I am confident this will not be an issue as indicating that
> something is about to be torn down and is unstable when it is in fact
> unstable and about to be torn down is correct, plus I checked all the paths
> where this impacted and found no issues.
> 
> And in any case, setting this flag avoids the problem...!
> 
> >
> > Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
> > Fixes: d240629148377 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Peng Zhang <zhangpeng.00@bytedance.com>
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > ---
> >  kernel/fork.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index ded49f18cd95c..20b2120f019ca 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -760,7 +760,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> >  		mt_set_in_rcu(vmi.mas.tree);
> >  		ksm_fork(mm, oldmm);
> >  		khugepaged_fork(mm, oldmm);
> > -	} else if (mpnt) {
> > +	} else {
> > +
> >  		/*
> >  		 * The entire maple tree has already been duplicated. If the
> >  		 * mmap duplication fails, mark the failure point with
> > @@ -768,8 +769,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> >  		 * stop releasing VMAs that have not been duplicated after this
> >  		 * point.
> >  		 */
> > -		mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > -		mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > +		if (mpnt) {
> > +			mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > +			mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > +			/* Avoid OOM iterating a broken tree */
> > +			set_bit(MMF_OOM_SKIP, &mm->flags);
> > +		}
> > +		/*
> > +		 * The mm_struct is going to exit, but the locks will be dropped
> > +		 * first.  Set the mm_struct as unstable is advisable as it is
> > +		 * not fully initialised.
> > +		 */
> > +		set_bit(MMF_UNSTABLE, &mm->flags);
> 
> Do we still need to do this if !mpnt?

No, we probably don't need this now.  But it still makes sense to do
this considering what we are saying.  There is a failed mm, let's mark
it as unstable because it's going away in a moment - it's not stable,
it's not even a complete mm.

> 
> Also 'mpnt' is in the running for the worst variable name in mm...
> 

mpnt is the oldMm vma Pointer, Now isn'T it?  I think that's self
documenting.

Thanks,
Liam


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] kernel/fork: Be more careful about dup_mmap() failures
  2025-01-24 18:19   ` Liam R. Howlett
@ 2025-01-24 19:09     ` Lorenzo Stoakes
  0 siblings, 0 replies; 4+ messages in thread
From: Lorenzo Stoakes @ 2025-01-24 19:09 UTC (permalink / raw)
  To: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, Jann Horn, Peter Zijlstra, Michal Hocko,
	Peng Zhang, Oleg Nesterov

On Fri, Jan 24, 2025 at 01:19:45PM -0500, Liam R. Howlett wrote:
> Adding Oleg, who I forgot on the start of this patch..
>
>
> * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250124 06:15]:
> > On Thu, Jan 23, 2025 at 03:58:49PM -0500, Liam R. Howlett wrote:
> > > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> > >
> > > In the even that there is a failure during dup_mmap(), the maple tree
> > > can be left in an unsafe state for other iterators besides the exit
> > > path.
> > >
> > > The unsafe state is created after the tree is cloned, but before the
> > > vmas are replaced; if a vma allocation fails (for instance), then the
> > > tree will have a marker (XA_ZERO_ENTRY) to denote where to stop
> > > destroying vmas on the exit path.  This marker replaces a vma in the
> > > tree and may be treated as a pointer to a vma in iterators besides the
> > > special case exit_mmap() iterator.
> >
> > Thanks for addresing this.
>
> Thanks for looking at the patch.
>
> >
> > In the original issue, the problem arose in vm_area_dup() which I tracked
> > down to vma_iter_load() -> mas_walk() (see [0]).
> >
> > I'm not sure if your change would actually have addressed this? Would the
> > MMF_UNSTABLE flag prevent this code path from being taken (if so where?),
> > if not don't we need to add a check for this somewhere?
> >
> > [0]:https://lore.kernel.org/all/aa2c1930-becc-4bc5-adfb-96e88290acc7@lucifer.local/
>
> Remote vma accesses use check_stable_address_space() to avoid races
> with the oom killer, which checks MMF_UNSTABLE.
>
> Yeah, uprobe needs to do this and doesn't today.
>
> Oleg suggested this be done in build_map_info(), but the MMF_UNSTABLE
> bit is set under the mmap read lock and that function uses the read lock
> on the rmap, takes an mm references and records the address -  then
> revisits the mm later to get the vma in register_for_each_vma().
>
> I think we need to check in register_for_each_vma(), prior to looking up
> the vma.  This way we hold the write lock on the mm and will ensure the
> vma is stable from both the fork setup and a potential OOM events.  The
> concern is that we have an invalid vma or the prev/next is invalid and
> that will only be a concern when operating on the vma itself, so this
> would be the safest place imo.
>
> The problem that has been created is that the next vma may be invalid.
> This arises when the rmap is used to find a vma then changes that vma.
> In this scenario, the mmap lock of the vma needs to be held and that mm
> needs to be checked as stable/not oom'ed.  So we can limit the checking
> of the code to the scenario where another mm's vma is found and
> modified - which may cause a merge and look at the next vma.
>
> I haven't found anywhere else that looks up another mm_struct's vma and
> makes a change to the vma.  Considering what that would mean, it makes
> sense that this is rare.
>
> So, I'll just do this:
>
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1260,6 +1260,9 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
>                  * returns NULL in find_active_uprobe_rcu().
>                  */
>                 mmap_write_lock(mm);
> +               if (check_stable_address_space(mm)
> +                               goto unlock;
> +
>
>

Yeah I think on reflection this is sensible, at least as a proximate
solution. It certainly doesn't harm to apply the MMF_UNSTABLE flag on top
of this and we can always expand where we check/what we do with this.

>
>
> >
> > >
> > > All the locks are dropped before the exit_mmap() call, but the
> > > incomplete mm_struct can be reached through (at least) the rmap finding
> > > the vmas which have a pointer back to the mm_struct.
> > >
> > > Up to this point, there have been no issues with being able to find an
> > > mm_sturct that was only partially initialised.  Syzbot was able to make
> > > the incomplete mm_struct fail with recent forking changes, so it has
> > > been proven unsafe to use the mm_sturct that hasn't been initialised, as
> > > referenced in the link below.
> > >
> > > Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
> > > invalid mm") fixed the uprobe access, it does not completely remove the
> > > race.
> > >
> > > This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
> > > the oom side (even though this is extremely unlikely to be selected as
> > > an oom victim in the race window), and sets MMF_UNSTABLE to avoid other
> > > potential users from using a partially initialised mm_struct.
> >
> > Good to have it set also I think.
> >
> > We were concerned that _somehow_ MMF_UNSTABLE might cause some issue in
> > OOM, which was why when I first thought of this we decided maybe not just
> > to be safe, but I am confident this will not be an issue as indicating that
> > something is about to be torn down and is unstable when it is in fact
> > unstable and about to be torn down is correct, plus I checked all the paths
> > where this impacted and found no issues.
> >
> > And in any case, setting this flag avoids the problem...!
> >
> > >
> > > Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
> > > Fixes: d240629148377 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Peng Zhang <zhangpeng.00@bytedance.com>
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > > ---
> > >  kernel/fork.c | 17 ++++++++++++++---
> > >  1 file changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index ded49f18cd95c..20b2120f019ca 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -760,7 +760,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >  		mt_set_in_rcu(vmi.mas.tree);
> > >  		ksm_fork(mm, oldmm);
> > >  		khugepaged_fork(mm, oldmm);
> > > -	} else if (mpnt) {
> > > +	} else {
> > > +
> > >  		/*
> > >  		 * The entire maple tree has already been duplicated. If the
> > >  		 * mmap duplication fails, mark the failure point with
> > > @@ -768,8 +769,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >  		 * stop releasing VMAs that have not been duplicated after this
> > >  		 * point.
> > >  		 */
> > > -		mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > > -		mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > > +		if (mpnt) {
> > > +			mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
> > > +			mas_store(&vmi.mas, XA_ZERO_ENTRY);
> > > +			/* Avoid OOM iterating a broken tree */
> > > +			set_bit(MMF_OOM_SKIP, &mm->flags);
> > > +		}
> > > +		/*
> > > +		 * The mm_struct is going to exit, but the locks will be dropped
> > > +		 * first.  Set the mm_struct as unstable is advisable as it is
> > > +		 * not fully initialised.
> > > +		 */
> > > +		set_bit(MMF_UNSTABLE, &mm->flags);
> >
> > Do we still need to do this if !mpnt?
>
> No, we probably don't need this now.  But it still makes sense to do
> this considering what we are saying.  There is a failed mm, let's mark
> it as unstable because it's going away in a moment - it's not stable,
> it's not even a complete mm.
>
> >
> > Also 'mpnt' is in the running for the worst variable name in mm...
> >
>
> mpnt is the oldMm vma Pointer, Now isn'T it?  I think that's self
> documenting.

:)))

>
> Thanks,
> Liam


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-24 19:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-23 20:58 [PATCH] kernel/fork: Be more careful about dup_mmap() failures Liam R. Howlett
2025-01-24 11:15 ` Lorenzo Stoakes
2025-01-24 18:19   ` Liam R. Howlett
2025-01-24 19:09     ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox