linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Question/problem with mmu_notifier_unregister.
@ 2013-01-15 16:29 Robin Holt
  2013-01-16 20:00 ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-15 16:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm

Andrea,

On a community or upcoming distro based kernel, I have one XPMEM test
which is failing.  The following patch (with lots of context) fixes it.
I do not yet understand the cause of the race condition.  I did hope it
would be obvious to you and save me the time of investigating further.

The first test finds mn->hlist is on the chain.  The new second test
does not.  I have verified that neither xpmem nor gru is calling to
unregister the same mmu_notifier twice.  It is very difficult to find the
problem as the test case requires a very long time to trip.  To increase
the likelihood, I run many copies in parallel.  Each is 2 processes each
with one pthread.  I run two per socket on an 8 core per socket system
with 256 socket system.  When one job hits the null pointer deref, I can
have many thousands of lines of debug before the other jobs stop running.
Each process has /dev/gru mapped into their address space.  They have
used xpmem to attach part of the address space of the other process.
The tasks are exiting at approximately the same time.  The race seems to
be with one pthread hitting the final mmput at about the same time as the
other task is getting into the filp_close->flush() callout.

I have a couple other things to work on today, and will likely try to
chase this failure more tomorrow.  With this patch, I have not been able
to trigger any other issues in many hours of testing.  The test likewise
runs for many hours on a RHEL 6.3 based system and a SLES11 SP2 based
system so it might have something to do with the change in srcu locking.

Thanks,
Robin Holt


diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 8a5ac8c..e2c9827 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -297,37 +299,38 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
        if (!hlist_unhashed(&mn->hlist)) {
                /*
                 * SRCU here will force exit_mmap to wait ->release to finish
                 * before freeing the pages.
                 */
                int id;
 
                id = srcu_read_lock(&srcu);
                /*
                 * exit_mmap will block in mmu_notifier_release to
                 * guarantee ->release is called before freeing the
                 * pages.
                 */
                if (mn->ops->release)
                        mn->ops->release(mn, mm);
                srcu_read_unlock(&srcu, id);
 
                spin_lock(&mm->mmu_notifier_mm->lock);
-               hlist_del_rcu(&mn->hlist);
+               if (!hlist_unhashed(&mn->hlist))
+                       hlist_del_rcu(&mn->hlist);
                spin_unlock(&mm->mmu_notifier_mm->lock);
        }
 
        /*
         * Wait any running method to finish, of course including
         * ->release if it was run by mmu_notifier_relase instead of us.
         */
        synchronize_srcu(&srcu);
 
        BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
        mmdrop(mm);
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
 
 static int __init mmu_notifier_init(void)
 {
        return init_srcu_struct(&srcu);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question/problem with mmu_notifier_unregister.
  2013-01-15 16:29 Question/problem with mmu_notifier_unregister Robin Holt
@ 2013-01-16 20:00 ` Robin Holt
  2013-01-16 21:01   ` [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-16 20:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm

On Tue, Jan 15, 2013 at 10:29:56AM -0600, Robin Holt wrote:
> Andrea,
> 
> On a community or upcoming distro based kernel, I have one XPMEM test
> which is failing.  The following patch (with lots of context) fixes it.
> I do not yet understand the cause of the race condition.  I did hope it
> would be obvious to you and save me the time of investigating further.
> 
> The first test finds mn->hlist is on the chain.  The new second test
> does not.  I have verified that neither xpmem nor gru is calling to
> unregister the same mmu_notifier twice.  It is very difficult to find the
> problem as the test case requires a very long time to trip.  To increase
> the likelihood, I run many copies in parallel.  Each is 2 processes each
> with one pthread.  I run two per socket on an 8 core per socket system
> with 256 socket system.  When one job hits the null pointer deref, I can
> have many thousands of lines of debug before the other jobs stop running.
> Each process has /dev/gru mapped into their address space.  They have
> used xpmem to attach part of the address space of the other process.
> The tasks are exiting at approximately the same time.  The race seems to
> be with one pthread hitting the final mmput at about the same time as the
> other task is getting into the filp_close->flush() callout.
> 
> I have a couple other things to work on today, and will likely try to
> chase this failure more tomorrow.  With this patch, I have not been able
> to trigger any other issues in many hours of testing.  The test likewise
> runs for many hours on a RHEL 6.3 based system and a SLES11 SP2 based
> system so it might have something to do with the change in srcu locking.

Using code inspection, I think I understand this issue some more.

Assume two tasks, one calling mmu_notifier_unregister() as a result
of a filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).

		A				B
t1						srcu_read_lock()
t2		if (!hlist_unhashed())
t3		srcu_read_lock()
t4						srcu_read_unlock()
t5						hlist_del...()
t6						synchronize_srcu()
t7		srcu_read_unlock()
t8		hlist_del_rcu()  <--- NULL pointer deref.


Does this seem plausible?  Am I missing something?

My earlier patch feels wrong as I think we will see two ->release()
calls are made, but I am not sure of that.

Thanks,
Robin


> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 8a5ac8c..e2c9827 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -297,37 +299,38 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
>         if (!hlist_unhashed(&mn->hlist)) {
>                 /*
>                  * SRCU here will force exit_mmap to wait ->release to finish
>                  * before freeing the pages.
>                  */
>                 int id;
>  
>                 id = srcu_read_lock(&srcu);
>                 /*
>                  * exit_mmap will block in mmu_notifier_release to
>                  * guarantee ->release is called before freeing the
>                  * pages.
>                  */
>                 if (mn->ops->release)
>                         mn->ops->release(mn, mm);
>                 srcu_read_unlock(&srcu, id);
>  
>                 spin_lock(&mm->mmu_notifier_mm->lock);
> -               hlist_del_rcu(&mn->hlist);
> +               if (!hlist_unhashed(&mn->hlist))
> +                       hlist_del_rcu(&mn->hlist);
>                 spin_unlock(&mm->mmu_notifier_mm->lock);
>         }
>  
>         /*
>          * Wait any running method to finish, of course including
>          * ->release if it was run by mmu_notifier_relase instead of us.
>          */
>         synchronize_srcu(&srcu);
>  
>         BUG_ON(atomic_read(&mm->mm_count) <= 0);
>  
>         mmdrop(mm);
>  }
>  EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
>  
>  static int __init mmu_notifier_init(void)
>  {
>         return init_srcu_struct(&srcu);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-16 20:00 ` Robin Holt
@ 2013-01-16 21:01   ` Robin Holt
  2013-01-17  2:45     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-16 21:01 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins, Marcelo Tosatti,
	Xiao Guangrong, Sagi Grimberg, Haggai Eran


There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().

Assume two tasks, one calling mmu_notifier_unregister() as a result
of a filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).

                A                               B
t1                                              srcu_read_lock()
t2              if (!hlist_unhashed())
t3                                              srcu_read_unlock()
t4              srcu_read_lock()
t5                                              hlist_del_init_rcu()
t6                                              synchronize_srcu()
t7              srcu_read_unlock()
t8              hlist_del_rcu()  <--- NULL pointer deref.

Tested with this patch applied.  My test case which was failing
approximately every 300th iteration passed 25,000 tests.

Signed-off-by: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Sagi Grimberg <sagig@mellanox.co.il>
Cc: Haggai Eran <haggaie@mellanox.com>
---
 mm/mmu_notifier.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 8a5ac8c..b873598 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -55,7 +55,6 @@ void __mmu_notifier_release(struct mm_struct *mm)
 		 */
 		if (mn->ops->release)
 			mn->ops->release(mn, mm);
-	srcu_read_unlock(&srcu, id);
 
 	spin_lock(&mm->mmu_notifier_mm->lock);
 	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
@@ -72,6 +71,8 @@ void __mmu_notifier_release(struct mm_struct *mm)
 	}
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 
+	srcu_read_unlock(&srcu, id);
+
 	/*
 	 * synchronize_srcu here prevents mmu_notifier_release to
 	 * return to exit_mmap (which would proceed freeing all pages
@@ -292,16 +293,24 @@ void __mmu_notifier_mm_destroy(struct mm_struct *mm)
  */
 void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
 {
+	int id;
+
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
+	if (hlist_unhashed(&mn->hlist))
+		goto released;
+
+	/*
+	 * SRCU here will force exit_mmap to wait ->release to finish
+	 * before freeing the pages.
+	 */
+	id = srcu_read_lock(&srcu);
+
+	spin_lock(&mm->mmu_notifier_mm->lock);
 	if (!hlist_unhashed(&mn->hlist)) {
-		/*
-		 * SRCU here will force exit_mmap to wait ->release to finish
-		 * before freeing the pages.
-		 */
-		int id;
+		hlist_del_rcu(&mn->hlist);
+		spin_unlock(&mm->mmu_notifier_mm->lock);
 
-		id = srcu_read_lock(&srcu);
 		/*
 		 * exit_mmap will block in mmu_notifier_release to
 		 * guarantee ->release is called before freeing the
@@ -309,16 +318,16 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
 		 */
 		if (mn->ops->release)
 			mn->ops->release(mn, mm);
-		srcu_read_unlock(&srcu, id);
 
-		spin_lock(&mm->mmu_notifier_mm->lock);
-		hlist_del_rcu(&mn->hlist);
+	} else
 		spin_unlock(&mm->mmu_notifier_mm->lock);
-	}
 
+	srcu_read_unlock(&srcu, id);
+
+released:
 	/*
 	 * Wait any running method to finish, of course including
-	 * ->release if it was run by mmu_notifier_relase instead of us.
+	 * ->release if it was run by __mmu_notifier_release instead of us.
 	 */
 	synchronize_srcu(&srcu);
 
-- 
1.8.0.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-16 21:01   ` [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix Robin Holt
@ 2013-01-17  2:45     ` Xiao Guangrong
  2013-01-17 11:12       ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2013-01-17  2:45 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On 01/17/2013 05:01 AM, Robin Holt wrote:
> 
> There is a race condition between mmu_notifier_unregister() and
> __mmu_notifier_release().
> 
> Assume two tasks, one calling mmu_notifier_unregister() as a result
> of a filp_close() ->flush() callout (task A), and the other calling
> mmu_notifier_release() from an mmput() (task B).
> 
>                 A                               B
> t1                                              srcu_read_lock()
> t2              if (!hlist_unhashed())
> t3                                              srcu_read_unlock()
> t4              srcu_read_lock()
> t5                                              hlist_del_init_rcu()
> t6                                              synchronize_srcu()
> t7              srcu_read_unlock()
> t8              hlist_del_rcu()  <--- NULL pointer deref.

The detailed code here is:
	hlist_del_rcu(&mn->hlist);

Can mn be NULL? I do not think so since mn is always the embedded struct
of the caller, it be freed after calling mmu_notifier_unregister.

> 
> Tested with this patch applied.  My test case which was failing
> approximately every 300th iteration passed 25,000 tests.

Could you please share your test case?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-17  2:45     ` Xiao Guangrong
@ 2013-01-17 11:12       ` Robin Holt
  2013-01-17 12:19         ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-17 11:12 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Robin Holt, Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity,
	Hugh Dickins, Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
> On 01/17/2013 05:01 AM, Robin Holt wrote:
> > 
> > There is a race condition between mmu_notifier_unregister() and
> > __mmu_notifier_release().
> > 
> > Assume two tasks, one calling mmu_notifier_unregister() as a result
> > of a filp_close() ->flush() callout (task A), and the other calling
> > mmu_notifier_release() from an mmput() (task B).
> > 
> >                 A                               B
> > t1                                              srcu_read_lock()
> > t2              if (!hlist_unhashed())
> > t3                                              srcu_read_unlock()
> > t4              srcu_read_lock()
> > t5                                              hlist_del_init_rcu()
> > t6                                              synchronize_srcu()
> > t7              srcu_read_unlock()
> > t8              hlist_del_rcu()  <--- NULL pointer deref.
> 
> The detailed code here is:
> 	hlist_del_rcu(&mn->hlist);
> 
> Can mn be NULL? I do not think so since mn is always the embedded struct
> of the caller, it be freed after calling mmu_notifier_unregister.

If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
it attempts to update *hlist->pprev = hlist->next and that is where it
takes the NULL pointer deref.

> 
> > 
> > Tested with this patch applied.  My test case which was failing
> > approximately every 300th iteration passed 25,000 tests.
> 
> Could you please share your test case?

I could but it would be very useless.  It depends upon having a SGI
UV system with GRUs and and xpmem kernel module loaded.  If you would
really like all the bits, I could provide them, but you will not be able
to reproduce the failure.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-17 11:12       ` Robin Holt
@ 2013-01-17 12:19         ` Xiao Guangrong
  2013-01-17 13:45           ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2013-01-17 12:19 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On 01/17/2013 07:12 PM, Robin Holt wrote:
> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>>>
>>> There is a race condition between mmu_notifier_unregister() and
>>> __mmu_notifier_release().
>>>
>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>>> of a filp_close() ->flush() callout (task A), and the other calling
>>> mmu_notifier_release() from an mmput() (task B).
>>>
>>>                 A                               B
>>> t1                                              srcu_read_lock()
>>> t2              if (!hlist_unhashed())
>>> t3                                              srcu_read_unlock()
>>> t4              srcu_read_lock()
>>> t5                                              hlist_del_init_rcu()
>>> t6                                              synchronize_srcu()
>>> t7              srcu_read_unlock()
>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>>
>> The detailed code here is:
>> 	hlist_del_rcu(&mn->hlist);
>>
>> Can mn be NULL? I do not think so since mn is always the embedded struct
>> of the caller, it be freed after calling mmu_notifier_unregister.
> 
> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
> it attempts to update *hlist->pprev = hlist->next and that is where it
> takes the NULL pointer deref.

Yes, sorry for my careless. So, That can not be fixed by using
hlist_del_init_rcu instead?

> 
>>
>>>
>>> Tested with this patch applied.  My test case which was failing
>>> approximately every 300th iteration passed 25,000 tests.
>>
>> Could you please share your test case?
> 
> I could but it would be very useless.  It depends upon having a SGI
> UV system with GRUs and and xpmem kernel module loaded.  If you would
> really like all the bits, I could provide them, but you will not be able
> to reproduce the failure.

Oh, i see. :)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-17 12:19         ` Xiao Guangrong
@ 2013-01-17 13:45           ` Robin Holt
  2013-01-18  2:42             ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-17 13:45 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Robin Holt, Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity,
	Hugh Dickins, Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
> On 01/17/2013 07:12 PM, Robin Holt wrote:
> > On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
> >> On 01/17/2013 05:01 AM, Robin Holt wrote:
> >>>
> >>> There is a race condition between mmu_notifier_unregister() and
> >>> __mmu_notifier_release().
> >>>
> >>> Assume two tasks, one calling mmu_notifier_unregister() as a result
> >>> of a filp_close() ->flush() callout (task A), and the other calling
> >>> mmu_notifier_release() from an mmput() (task B).
> >>>
> >>>                 A                               B
> >>> t1                                              srcu_read_lock()
> >>> t2              if (!hlist_unhashed())
> >>> t3                                              srcu_read_unlock()
> >>> t4              srcu_read_lock()
> >>> t5                                              hlist_del_init_rcu()
> >>> t6                                              synchronize_srcu()
> >>> t7              srcu_read_unlock()
> >>> t8              hlist_del_rcu()  <--- NULL pointer deref.
> >>
> >> The detailed code here is:
> >> 	hlist_del_rcu(&mn->hlist);
> >>
> >> Can mn be NULL? I do not think so since mn is always the embedded struct
> >> of the caller, it be freed after calling mmu_notifier_unregister.
> > 
> > If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
> > which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
> > it attempts to update *hlist->pprev = hlist->next and that is where it
> > takes the NULL pointer deref.
> 
> Yes, sorry for my careless. So, That can not be fixed by using
> hlist_del_init_rcu instead?

The problem is the race described above.  Thread 'A' has checked to see
if n->pprev != NULL.  Based upon that, it did called the mn->release()
method.  While it was trying to call the release method, thread 'B' ended
up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
'A' got to run again and now it tries to do the hlist_del_rcu() which, as
part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
and then *pprev = n->next; hits the NULL pointer deref hits.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-17 13:45           ` Robin Holt
@ 2013-01-18  2:42             ` Xiao Guangrong
  2013-01-18  2:48               ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2013-01-18  2:42 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On 01/17/2013 09:45 PM, Robin Holt wrote:
> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
>> On 01/17/2013 07:12 PM, Robin Holt wrote:
>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>>>>>
>>>>> There is a race condition between mmu_notifier_unregister() and
>>>>> __mmu_notifier_release().
>>>>>
>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>>>>> of a filp_close() ->flush() callout (task A), and the other calling
>>>>> mmu_notifier_release() from an mmput() (task B).
>>>>>
>>>>>                 A                               B
>>>>> t1                                              srcu_read_lock()
>>>>> t2              if (!hlist_unhashed())
>>>>> t3                                              srcu_read_unlock()
>>>>> t4              srcu_read_lock()
>>>>> t5                                              hlist_del_init_rcu()
>>>>> t6                                              synchronize_srcu()
>>>>> t7              srcu_read_unlock()
>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>>>>
>>>> The detailed code here is:
>>>> 	hlist_del_rcu(&mn->hlist);
>>>>
>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
>>>> of the caller, it be freed after calling mmu_notifier_unregister.
>>>
>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
>>> it attempts to update *hlist->pprev = hlist->next and that is where it
>>> takes the NULL pointer deref.
>>
>> Yes, sorry for my careless. So, That can not be fixed by using
>> hlist_del_init_rcu instead?
> 
> The problem is the race described above.  Thread 'A' has checked to see
> if n->pprev != NULL.  Based upon that, it did called the mn->release()
> method.  While it was trying to call the release method, thread 'B' ended
> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
> and then *pprev = n->next; hits the NULL pointer deref hits.

I mean using hlist_del_init_rcu instead of hlist_del_rcu in
mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18  2:42             ` Xiao Guangrong
@ 2013-01-18  2:48               ` Robin Holt
  2013-01-18  3:04                 ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-18  2:48 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Robin Holt, Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity,
	Hugh Dickins, Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
> On 01/17/2013 09:45 PM, Robin Holt wrote:
> > On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
> >> On 01/17/2013 07:12 PM, Robin Holt wrote:
> >>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
> >>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
> >>>>>
> >>>>> There is a race condition between mmu_notifier_unregister() and
> >>>>> __mmu_notifier_release().
> >>>>>
> >>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
> >>>>> of a filp_close() ->flush() callout (task A), and the other calling
> >>>>> mmu_notifier_release() from an mmput() (task B).
> >>>>>
> >>>>>                 A                               B
> >>>>> t1                                              srcu_read_lock()
> >>>>> t2              if (!hlist_unhashed())
> >>>>> t3                                              srcu_read_unlock()
> >>>>> t4              srcu_read_lock()
> >>>>> t5                                              hlist_del_init_rcu()
> >>>>> t6                                              synchronize_srcu()
> >>>>> t7              srcu_read_unlock()
> >>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
> >>>>
> >>>> The detailed code here is:
> >>>> 	hlist_del_rcu(&mn->hlist);
> >>>>
> >>>> Can mn be NULL? I do not think so since mn is always the embedded struct
> >>>> of the caller, it be freed after calling mmu_notifier_unregister.
> >>>
> >>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
> >>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
> >>> it attempts to update *hlist->pprev = hlist->next and that is where it
> >>> takes the NULL pointer deref.
> >>
> >> Yes, sorry for my careless. So, That can not be fixed by using
> >> hlist_del_init_rcu instead?
> > 
> > The problem is the race described above.  Thread 'A' has checked to see
> > if n->pprev != NULL.  Based upon that, it did called the mn->release()
> > method.  While it was trying to call the release method, thread 'B' ended
> > up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
> > 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
> > part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
> > and then *pprev = n->next; hits the NULL pointer deref hits.
> 
> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.

How does that address the calling of the ->release() method twice?

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18  2:48               ` Robin Holt
@ 2013-01-18  3:04                 ` Xiao Guangrong
  2013-01-18 12:14                   ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2013-01-18  3:04 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On 01/18/2013 10:48 AM, Robin Holt wrote:
> On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
>> On 01/17/2013 09:45 PM, Robin Holt wrote:
>>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
>>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
>>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>>>>>>>
>>>>>>> There is a race condition between mmu_notifier_unregister() and
>>>>>>> __mmu_notifier_release().
>>>>>>>
>>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
>>>>>>> mmu_notifier_release() from an mmput() (task B).
>>>>>>>
>>>>>>>                 A                               B
>>>>>>> t1                                              srcu_read_lock()
>>>>>>> t2              if (!hlist_unhashed())
>>>>>>> t3                                              srcu_read_unlock()
>>>>>>> t4              srcu_read_lock()
>>>>>>> t5                                              hlist_del_init_rcu()
>>>>>>> t6                                              synchronize_srcu()
>>>>>>> t7              srcu_read_unlock()
>>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>>>>>>
>>>>>> The detailed code here is:
>>>>>> 	hlist_del_rcu(&mn->hlist);
>>>>>>
>>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
>>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
>>>>>
>>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
>>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
>>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
>>>>> takes the NULL pointer deref.
>>>>
>>>> Yes, sorry for my careless. So, That can not be fixed by using
>>>> hlist_del_init_rcu instead?
>>>
>>> The problem is the race described above.  Thread 'A' has checked to see
>>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
>>> method.  While it was trying to call the release method, thread 'B' ended
>>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
>>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
>>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
>>> and then *pprev = n->next; hits the NULL pointer deref hits.
>>
>> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
>> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
> 
> How does that address the calling of the ->release() method twice?

Hmm, what is the problem of it? If it is just for "performance issue", i think
it is not worth introducing so complex lock rule just for the really rare case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18  3:04                 ` Xiao Guangrong
@ 2013-01-18 12:14                   ` Robin Holt
  2013-01-18 12:51                     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Holt @ 2013-01-18 12:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Robin Holt, Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity,
	Hugh Dickins, Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On Fri, Jan 18, 2013 at 11:04:10AM +0800, Xiao Guangrong wrote:
> On 01/18/2013 10:48 AM, Robin Holt wrote:
> > On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
> >> On 01/17/2013 09:45 PM, Robin Holt wrote:
> >>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
> >>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
> >>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
> >>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
> >>>>>>>
> >>>>>>> There is a race condition between mmu_notifier_unregister() and
> >>>>>>> __mmu_notifier_release().
> >>>>>>>
> >>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
> >>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
> >>>>>>> mmu_notifier_release() from an mmput() (task B).
> >>>>>>>
> >>>>>>>                 A                               B
> >>>>>>> t1                                              srcu_read_lock()
> >>>>>>> t2              if (!hlist_unhashed())
> >>>>>>> t3                                              srcu_read_unlock()
> >>>>>>> t4              srcu_read_lock()
> >>>>>>> t5                                              hlist_del_init_rcu()
> >>>>>>> t6                                              synchronize_srcu()
> >>>>>>> t7              srcu_read_unlock()
> >>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
> >>>>>>
> >>>>>> The detailed code here is:
> >>>>>> 	hlist_del_rcu(&mn->hlist);
> >>>>>>
> >>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
> >>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
> >>>>>
> >>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
> >>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
> >>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
> >>>>> takes the NULL pointer deref.
> >>>>
> >>>> Yes, sorry for my careless. So, That can not be fixed by using
> >>>> hlist_del_init_rcu instead?
> >>>
> >>> The problem is the race described above.  Thread 'A' has checked to see
> >>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
> >>> method.  While it was trying to call the release method, thread 'B' ended
> >>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
> >>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
> >>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
> >>> and then *pprev = n->next; hits the NULL pointer deref hits.
> >>
> >> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
> >> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
> > 
> > How does that address the calling of the ->release() method twice?
> 
> Hmm, what is the problem of it? If it is just for "performance issue", i think
> it is not worth introducing so complex lock rule just for the really rare case.

Complex lock rule?  We merely moved the lock up earlier in code path.
Without this, we have some cases where you get called on ->release()
twice, while the majority of cases your notifier gets called once and
it hits a NULL pointer deref at that.  What is so complex about that?

I originally was going to change both the __mmu_notifier_release()
function and the mmu_notifier_unregister() function to make the sequence,
lock, unlink, unlock, callout, but I thought that, although being more
correct, would get push back despite the fact that the lock is structure
local and likely to only be contended from two threads at the same time.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18 12:14                   ` Robin Holt
@ 2013-01-18 12:51                     ` Xiao Guangrong
  2013-01-18 15:14                       ` Robin Holt
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2013-01-18 12:51 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On 01/18/2013 08:14 PM, Robin Holt wrote:
> On Fri, Jan 18, 2013 at 11:04:10AM +0800, Xiao Guangrong wrote:
>> On 01/18/2013 10:48 AM, Robin Holt wrote:
>>> On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
>>>> On 01/17/2013 09:45 PM, Robin Holt wrote:
>>>>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
>>>>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
>>>>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>>>>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>>>>>>>>>
>>>>>>>>> There is a race condition between mmu_notifier_unregister() and
>>>>>>>>> __mmu_notifier_release().
>>>>>>>>>
>>>>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>>>>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
>>>>>>>>> mmu_notifier_release() from an mmput() (task B).
>>>>>>>>>
>>>>>>>>>                 A                               B
>>>>>>>>> t1                                              srcu_read_lock()
>>>>>>>>> t2              if (!hlist_unhashed())
>>>>>>>>> t3                                              srcu_read_unlock()
>>>>>>>>> t4              srcu_read_lock()
>>>>>>>>> t5                                              hlist_del_init_rcu()
>>>>>>>>> t6                                              synchronize_srcu()
>>>>>>>>> t7              srcu_read_unlock()
>>>>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>>>>>>>>
>>>>>>>> The detailed code here is:
>>>>>>>> 	hlist_del_rcu(&mn->hlist);
>>>>>>>>
>>>>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
>>>>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
>>>>>>>
>>>>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
>>>>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
>>>>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
>>>>>>> takes the NULL pointer deref.
>>>>>>
>>>>>> Yes, sorry for my careless. So, That can not be fixed by using
>>>>>> hlist_del_init_rcu instead?
>>>>>
>>>>> The problem is the race described above.  Thread 'A' has checked to see
>>>>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
>>>>> method.  While it was trying to call the release method, thread 'B' ended
>>>>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
>>>>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
>>>>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
>>>>> and then *pprev = n->next; hits the NULL pointer deref hits.
>>>>
>>>> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
>>>> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
>>>
>>> How does that address the calling of the ->release() method twice?
>>
>> Hmm, what is the problem of it? If it is just for "performance issue", i think
>> it is not worth introducing so complex lock rule just for the really rare case.
> 
> Complex lock rule?  We merely moved the lock up earlier in code path.
> Without this, we have some cases where you get called on ->release()
> twice, while the majority of cases your notifier gets called once and
> it hits a NULL pointer deref at that.  What is so complex about that?


Aha, if we use hlist_del_init_rcu() instead of hlist_del_rcu, can the NULL deref
bug be fixed?

- If yes, you'd better make it as a simple patch, it is good for backport. Then
  make the second patch to fix the "problem" of calling ->release twice.

- if no. Could you please detail the changelog. From the changelog, i only see
  the bug is cased by calling hlist_del_rcu on the unhashed node.

Thank you! :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18 12:51                     ` Xiao Guangrong
@ 2013-01-18 15:14                       ` Robin Holt
  2013-01-23  2:00                         ` Wanpeng Li
  2013-01-23  2:00                         ` Wanpeng Li
  0 siblings, 2 replies; 15+ messages in thread
From: Robin Holt @ 2013-01-18 15:14 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Robin Holt, Andrea Arcangeli, linux-mm, Wanpeng Li, Avi Kivity,
	Hugh Dickins, Marcelo Tosatti, Sagi Grimberg, Haggai Eran

On Fri, Jan 18, 2013 at 08:51:46PM +0800, Xiao Guangrong wrote:
> On 01/18/2013 08:14 PM, Robin Holt wrote:
> > On Fri, Jan 18, 2013 at 11:04:10AM +0800, Xiao Guangrong wrote:
> >> On 01/18/2013 10:48 AM, Robin Holt wrote:
> >>> On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
> >>>> On 01/17/2013 09:45 PM, Robin Holt wrote:
> >>>>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
> >>>>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
> >>>>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
> >>>>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
> >>>>>>>>>
> >>>>>>>>> There is a race condition between mmu_notifier_unregister() and
> >>>>>>>>> __mmu_notifier_release().
> >>>>>>>>>
> >>>>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
> >>>>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
> >>>>>>>>> mmu_notifier_release() from an mmput() (task B).
> >>>>>>>>>
> >>>>>>>>>                 A                               B
> >>>>>>>>> t1                                              srcu_read_lock()
> >>>>>>>>> t2              if (!hlist_unhashed())
> >>>>>>>>> t3                                              srcu_read_unlock()
> >>>>>>>>> t4              srcu_read_lock()
> >>>>>>>>> t5                                              hlist_del_init_rcu()
> >>>>>>>>> t6                                              synchronize_srcu()
> >>>>>>>>> t7              srcu_read_unlock()
> >>>>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
> >>>>>>>>
> >>>>>>>> The detailed code here is:
> >>>>>>>> 	hlist_del_rcu(&mn->hlist);
> >>>>>>>>
> >>>>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
> >>>>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
> >>>>>>>
> >>>>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
> >>>>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
> >>>>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
> >>>>>>> takes the NULL pointer deref.
> >>>>>>
> >>>>>> Yes, sorry for my careless. So, That can not be fixed by using
> >>>>>> hlist_del_init_rcu instead?
> >>>>>
> >>>>> The problem is the race described above.  Thread 'A' has checked to see
> >>>>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
> >>>>> method.  While it was trying to call the release method, thread 'B' ended
> >>>>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
> >>>>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
> >>>>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
> >>>>> and then *pprev = n->next; hits the NULL pointer deref hits.
> >>>>
> >>>> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
> >>>> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
> >>>
> >>> How does that address the calling of the ->release() method twice?
> >>
> >> Hmm, what is the problem of it? If it is just for "performance issue", i think
> >> it is not worth introducing so complex lock rule just for the really rare case.
> > 
> > Complex lock rule?  We merely moved the lock up earlier in code path.
> > Without this, we have some cases where you get called on ->release()
> > twice, while the majority of cases your notifier gets called once and
> > it hits a NULL pointer deref at that.  What is so complex about that?
> 
> 
> Aha, if we use hlist_del_init_rcu() instead of hlist_del_rcu, can the NULL deref
> bug be fixed?
> 
> - If yes, you'd better make it as a simple patch, it is good for backport. Then
>   make the second patch to fix the "problem" of calling ->release twice.
> 
> - if no. Could you please detail the changelog. From the changelog, i only see
>   the bug is cased by calling hlist_del_rcu on the unhashed node.

What is it about this patch makes you think it is complex?  There are:

1) 11 Lines relocated.
2) 5 new lines added (4 four an optimization, 1 for "else" case).
3) 5 blank line introduced.
4) 1 comment line fixed.

I would happily remove the optimization which brings us down to an else
case.  That could be removed by introducing a temporary variable as well,
but that seems pointless.

Bottom line, I do not see how this patch, as-is or with some slight
tweaking, is not already candidate material for the -stable trees.
It is certainly not complex and significantly improves an inconsistency
with how the unregister notifier has worked for a few years.  It is a new
behavior which is contrary to the comments in mmu_notifier.h which says:

         * Called either by mmu_notifier_unregister or when the mm is
         * being destroyed by exit_mmap, always before all pages are
...
        void (*release)(struct mmu_notifier *mn,

That does not say "and/or", it says "or" which used to be "once and
only once", but is now "once, unless it is twice".  This new behavior
was affecting some of our test jobs, but the failures were so sporadic
that we were not making any progress on identifying the failures until we
stumbled on a test case which more frequently failed, then refined that
test to trigger easily.

My reluctance to not improving the double callout is I would need to repeat
a significant amount of testing to ensure the other problems are also
fixed with just the removal of the NULL pointer deref.  I believe they
are, but I am not certain.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18 15:14                       ` Robin Holt
  2013-01-23  2:00                         ` Wanpeng Li
@ 2013-01-23  2:00                         ` Wanpeng Li
  1 sibling, 0 replies; 15+ messages in thread
From: Wanpeng Li @ 2013-01-23  2:00 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Xiao Guangrong, Sagi Grimberg, Haggai Eran

On Fri, Jan 18, 2013 at 09:14:30AM -0600, Robin Holt wrote:
>On Fri, Jan 18, 2013 at 08:51:46PM +0800, Xiao Guangrong wrote:
>> On 01/18/2013 08:14 PM, Robin Holt wrote:
>> > On Fri, Jan 18, 2013 at 11:04:10AM +0800, Xiao Guangrong wrote:
>> >> On 01/18/2013 10:48 AM, Robin Holt wrote:
>> >>> On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
>> >>>> On 01/17/2013 09:45 PM, Robin Holt wrote:
>> >>>>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
>> >>>>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
>> >>>>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>> >>>>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>> >>>>>>>>>
>> >>>>>>>>> There is a race condition between mmu_notifier_unregister() and
>> >>>>>>>>> __mmu_notifier_release().
>> >>>>>>>>>
>> >>>>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>> >>>>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
>> >>>>>>>>> mmu_notifier_release() from an mmput() (task B).
>> >>>>>>>>>
>> >>>>>>>>>                 A                               B
>> >>>>>>>>> t1                                              srcu_read_lock()
>> >>>>>>>>> t2              if (!hlist_unhashed())
>> >>>>>>>>> t3                                              srcu_read_unlock()
>> >>>>>>>>> t4              srcu_read_lock()
>> >>>>>>>>> t5                                              hlist_del_init_rcu()
>> >>>>>>>>> t6                                              synchronize_srcu()
>> >>>>>>>>> t7              srcu_read_unlock()
>> >>>>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>> >>>>>>>>
>> >>>>>>>> The detailed code here is:
>> >>>>>>>> 	hlist_del_rcu(&mn->hlist);
>> >>>>>>>>
>> >>>>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
>> >>>>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
>> >>>>>>>
>> >>>>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
>> >>>>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
>> >>>>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
>> >>>>>>> takes the NULL pointer deref.
>> >>>>>>
>> >>>>>> Yes, sorry for my careless. So, That can not be fixed by using
>> >>>>>> hlist_del_init_rcu instead?
>> >>>>>
>> >>>>> The problem is the race described above.  Thread 'A' has checked to see
>> >>>>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
>> >>>>> method.  While it was trying to call the release method, thread 'B' ended
>> >>>>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
>> >>>>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
>> >>>>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
>> >>>>> and then *pprev = n->next; hits the NULL pointer deref hits.
>> >>>>
>> >>>> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
>> >>>> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
>> >>>
>> >>> How does that address the calling of the ->release() method twice?
>> >>
>> >> Hmm, what is the problem of it? If it is just for "performance issue", i think
>> >> it is not worth introducing so complex lock rule just for the really rare case.
>> > 
>> > Complex lock rule?  We merely moved the lock up earlier in code path.
>> > Without this, we have some cases where you get called on ->release()
>> > twice, while the majority of cases your notifier gets called once and
>> > it hits a NULL pointer deref at that.  What is so complex about that?
>> 
>> 
>> Aha, if we use hlist_del_init_rcu() instead of hlist_del_rcu, can the NULL deref
>> bug be fixed?
>> 
>> - If yes, you'd better make it as a simple patch, it is good for backport. Then
>>   make the second patch to fix the "problem" of calling ->release twice.
>> 
>> - if no. Could you please detail the changelog. From the changelog, i only see
>>   the bug is cased by calling hlist_del_rcu on the unhashed node.
>
>What is it about this patch makes you think it is complex?  There are:
>
>1) 11 Lines relocated.
>2) 5 new lines added (4 four an optimization, 1 for "else" case).
>3) 5 blank line introduced.
>4) 1 comment line fixed.
>
>I would happily remove the optimization which brings us down to an else
>case.  That could be removed by introducing a temporary variable as well,
>but that seems pointless.
>
>Bottom line, I do not see how this patch, as-is or with some slight
>tweaking, is not already candidate material for the -stable trees.
>It is certainly not complex and significantly improves an inconsistency
>with how the unregister notifier has worked for a few years.  It is a new
>behavior which is contrary to the comments in mmu_notifier.h which says:
>
>         * Called either by mmu_notifier_unregister or when the mm is
>         * being destroyed by exit_mmap, always before all pages are
>...
>        void (*release)(struct mmu_notifier *mn,
>
>That does not say "and/or", it says "or" which used to be "once and
>only once", but is now "once, unless it is twice".  This new behavior
>was affecting some of our test jobs, but the failures were so sporadic
>that we were not making any progress on identifying the failures until we
>stumbled on a test case which more frequently failed, then refined that
>test to trigger easily.
>
>My reluctance to not improving the double callout is I would need to repeat
>a significant amount of testing to ensure the other problems are also
>fixed with just the removal of the NULL pointer deref.  I believe they
>are, but I am not certain.

Hi Robin,

It seems that the race lead to both NULL pointer deref and call release
twice, however, title and changelog just mentioned NULL pointer deref. 
Otherwise, fair enough to me. :)

Regards,
Wanpeng Li 

>
>Thanks,
>Robin
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix.
  2013-01-18 15:14                       ` Robin Holt
@ 2013-01-23  2:00                         ` Wanpeng Li
  2013-01-23  2:00                         ` Wanpeng Li
  1 sibling, 0 replies; 15+ messages in thread
From: Wanpeng Li @ 2013-01-23  2:00 UTC (permalink / raw)
  To: Robin Holt
  Cc: Andrea Arcangeli, linux-mm, Avi Kivity, Hugh Dickins,
	Marcelo Tosatti, Xiao Guangrong, Sagi Grimberg, Haggai Eran

On Fri, Jan 18, 2013 at 09:14:30AM -0600, Robin Holt wrote:
>On Fri, Jan 18, 2013 at 08:51:46PM +0800, Xiao Guangrong wrote:
>> On 01/18/2013 08:14 PM, Robin Holt wrote:
>> > On Fri, Jan 18, 2013 at 11:04:10AM +0800, Xiao Guangrong wrote:
>> >> On 01/18/2013 10:48 AM, Robin Holt wrote:
>> >>> On Fri, Jan 18, 2013 at 10:42:07AM +0800, Xiao Guangrong wrote:
>> >>>> On 01/17/2013 09:45 PM, Robin Holt wrote:
>> >>>>> On Thu, Jan 17, 2013 at 08:19:55PM +0800, Xiao Guangrong wrote:
>> >>>>>> On 01/17/2013 07:12 PM, Robin Holt wrote:
>> >>>>>>> On Thu, Jan 17, 2013 at 10:45:32AM +0800, Xiao Guangrong wrote:
>> >>>>>>>> On 01/17/2013 05:01 AM, Robin Holt wrote:
>> >>>>>>>>>
>> >>>>>>>>> There is a race condition between mmu_notifier_unregister() and
>> >>>>>>>>> __mmu_notifier_release().
>> >>>>>>>>>
>> >>>>>>>>> Assume two tasks, one calling mmu_notifier_unregister() as a result
>> >>>>>>>>> of a filp_close() ->flush() callout (task A), and the other calling
>> >>>>>>>>> mmu_notifier_release() from an mmput() (task B).
>> >>>>>>>>>
>> >>>>>>>>>                 A                               B
>> >>>>>>>>> t1                                              srcu_read_lock()
>> >>>>>>>>> t2              if (!hlist_unhashed())
>> >>>>>>>>> t3                                              srcu_read_unlock()
>> >>>>>>>>> t4              srcu_read_lock()
>> >>>>>>>>> t5                                              hlist_del_init_rcu()
>> >>>>>>>>> t6                                              synchronize_srcu()
>> >>>>>>>>> t7              srcu_read_unlock()
>> >>>>>>>>> t8              hlist_del_rcu()  <--- NULL pointer deref.
>> >>>>>>>>
>> >>>>>>>> The detailed code here is:
>> >>>>>>>> 	hlist_del_rcu(&mn->hlist);
>> >>>>>>>>
>> >>>>>>>> Can mn be NULL? I do not think so since mn is always the embedded struct
>> >>>>>>>> of the caller, it be freed after calling mmu_notifier_unregister.
>> >>>>>>>
>> >>>>>>> If you look at __mmu_notifier_release() it is using hlist_del_init_rcu()
>> >>>>>>> which will set the hlist->pprev to NULL.  When hlist_del_rcu() is called,
>> >>>>>>> it attempts to update *hlist->pprev = hlist->next and that is where it
>> >>>>>>> takes the NULL pointer deref.
>> >>>>>>
>> >>>>>> Yes, sorry for my careless. So, That can not be fixed by using
>> >>>>>> hlist_del_init_rcu instead?
>> >>>>>
>> >>>>> The problem is the race described above.  Thread 'A' has checked to see
>> >>>>> if n->pprev != NULL.  Based upon that, it did called the mn->release()
>> >>>>> method.  While it was trying to call the release method, thread 'B' ended
>> >>>>> up calling hlist_del_init_rcu() which set n->pprev = NULL.  Then thread
>> >>>>> 'A' got to run again and now it tries to do the hlist_del_rcu() which, as
>> >>>>> part of __hlist_del(), the pprev will be set to n->pprev (which is NULL)
>> >>>>> and then *pprev = n->next; hits the NULL pointer deref hits.
>> >>>>
>> >>>> I mean using hlist_del_init_rcu instead of hlist_del_rcu in
>> >>>> mmu_notifier_unregister(), hlist_del_init_rcu is aware of ->pprev.
>> >>>
>> >>> How does that address the calling of the ->release() method twice?
>> >>
>> >> Hmm, what is the problem of it? If it is just for "performance issue", i think
>> >> it is not worth introducing so complex lock rule just for the really rare case.
>> > 
>> > Complex lock rule?  We merely moved the lock up earlier in code path.
>> > Without this, we have some cases where you get called on ->release()
>> > twice, while the majority of cases your notifier gets called once and
>> > it hits a NULL pointer deref at that.  What is so complex about that?
>> 
>> 
>> Aha, if we use hlist_del_init_rcu() instead of hlist_del_rcu, can the NULL deref
>> bug be fixed?
>> 
>> - If yes, you'd better make it as a simple patch, it is good for backport. Then
>>   make the second patch to fix the "problem" of calling ->release twice.
>> 
>> - if no. Could you please detail the changelog. From the changelog, i only see
>>   the bug is cased by calling hlist_del_rcu on the unhashed node.
>
>What is it about this patch makes you think it is complex?  There are:
>
>1) 11 Lines relocated.
>2) 5 new lines added (4 four an optimization, 1 for "else" case).
>3) 5 blank line introduced.
>4) 1 comment line fixed.
>
>I would happily remove the optimization which brings us down to an else
>case.  That could be removed by introducing a temporary variable as well,
>but that seems pointless.
>
>Bottom line, I do not see how this patch, as-is or with some slight
>tweaking, is not already candidate material for the -stable trees.
>It is certainly not complex and significantly improves an inconsistency
>with how the unregister notifier has worked for a few years.  It is a new
>behavior which is contrary to the comments in mmu_notifier.h which says:
>
>         * Called either by mmu_notifier_unregister or when the mm is
>         * being destroyed by exit_mmap, always before all pages are
>...
>        void (*release)(struct mmu_notifier *mn,
>
>That does not say "and/or", it says "or" which used to be "once and
>only once", but is now "once, unless it is twice".  This new behavior
>was affecting some of our test jobs, but the failures were so sporadic
>that we were not making any progress on identifying the failures until we
>stumbled on a test case which more frequently failed, then refined that
>test to trigger easily.
>
>My reluctance to not improving the double callout is I would need to repeat
>a significant amount of testing to ensure the other problems are also
>fixed with just the removal of the NULL pointer deref.  I believe they
>are, but I am not certain.

Hi Robin,

It seems that the race lead to both NULL pointer deref and call release
twice, however, title and changelog just mentioned NULL pointer deref. 
Otherwise, fair enough to me. :)

Regards,
Wanpeng Li 

>
>Thanks,
>Robin
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-23  2:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-15 16:29 Question/problem with mmu_notifier_unregister Robin Holt
2013-01-16 20:00 ` Robin Holt
2013-01-16 21:01   ` [PATCH] [Patch] mmu_notifier_unregister NULL Pointer deref fix Robin Holt
2013-01-17  2:45     ` Xiao Guangrong
2013-01-17 11:12       ` Robin Holt
2013-01-17 12:19         ` Xiao Guangrong
2013-01-17 13:45           ` Robin Holt
2013-01-18  2:42             ` Xiao Guangrong
2013-01-18  2:48               ` Robin Holt
2013-01-18  3:04                 ` Xiao Guangrong
2013-01-18 12:14                   ` Robin Holt
2013-01-18 12:51                     ` Xiao Guangrong
2013-01-18 15:14                       ` Robin Holt
2013-01-23  2:00                         ` Wanpeng Li
2013-01-23  2:00                         ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox