[PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak_scan()
@ 2022-06-12 18:32 Waiman Long
  2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Waiman Long @ 2022-06-12 18:32 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton; +Cc: linux-mm, linux-kernel, Waiman Long

There are 3 RCU-based object iteration loops in kmemleak_scan(). Because
of the need to take RCU read lock, we can't insert cond_resched() into
the loop like other parts of the function. As there can be millions of
objects to be scanned, it takes a while to iterate all of them. The
kmemleak functionality is usually enabled in a debug kernel which is
much slower than a non-debug kernel. With sufficient number of kmemleak
objects, the time to iterate them all may exceed 22s causing soft lockup.

  watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kmemleak:625]

This patch series make changes to the 3 object iteration loops in
kmemleak_scan() to prevent them from causing soft lockup.

Waiman Long (3):
  mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear()
  mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking
    lock
  mm/kmemleak: Prevent soft lockup in first object iteration loop of
    kmemleak_scan()

 mm/kmemleak.c | 52 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 11 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear()
  2022-06-12 18:32 [PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak_scan() Waiman Long
@ 2022-06-12 18:32 ` Waiman Long
  2022-06-13  7:15   ` Muchun Song
  2022-06-14 15:57   ` Catalin Marinas
  2022-06-12 18:33 ` [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock Waiman Long
  2022-06-12 18:33 ` [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan() Waiman Long
  2 siblings, 2 replies; 12+ messages in thread
From: Waiman Long @ 2022-06-12 18:32 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton; +Cc: linux-mm, linux-kernel, Waiman Long

The kmemleak_scan() function is called only from the kmemleak scan
thread or from write to the kmemleak debugfs file. Both are in task
context and so we can directly use the simpler _irq() lock/unlock calls
instead of the more complex _irqsave/_irqrestore variants.

Similarly, kmemleak_clear() is called only from write to the kmemleak
debugfs file. The same change can be applied.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index a182f5ddaf68..dad9219c972c 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1413,7 +1413,6 @@ static void scan_gray_list(void)
  */
 static void kmemleak_scan(void)
 {
-	unsigned long flags;
 	struct kmemleak_object *object;
 	struct zone *zone;
 	int __maybe_unused i;
@@ -1424,7 +1423,7 @@ static void kmemleak_scan(void)
 	/* prepare the kmemleak_object's */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		raw_spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irq(&object->lock);
 #ifdef DEBUG
 		/*
 		 * With a few exceptions there should be a maximum of
@@ -1441,7 +1440,7 @@ static void kmemleak_scan(void)
 		if (color_gray(object) && get_object(object))
 			list_add_tail(&object->gray_list, &gray_list);
 
-		raw_spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irq(&object->lock);
 	}
 	rcu_read_unlock();
 
@@ -1509,14 +1508,14 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		raw_spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irq(&object->lock);
 		if (color_white(object) && (object->flags & OBJECT_ALLOCATED)
 		    && update_checksum(object) && get_object(object)) {
 			/* color it gray temporarily */
 			object->count = object->min_count;
 			list_add_tail(&object->gray_list, &gray_list);
 		}
-		raw_spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irq(&object->lock);
 	}
 	rcu_read_unlock();
 
@@ -1536,7 +1535,7 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		raw_spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irq(&object->lock);
 		if (unreferenced_object(object) &&
 		    !(object->flags & OBJECT_REPORTED)) {
 			object->flags |= OBJECT_REPORTED;
@@ -1546,7 +1545,7 @@ static void kmemleak_scan(void)
 
 			new_leaks++;
 		}
-		raw_spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irq(&object->lock);
 	}
 	rcu_read_unlock();
 
@@ -1748,15 +1747,14 @@ static int dump_str_object_info(const char *str)
 static void kmemleak_clear(void)
 {
 	struct kmemleak_object *object;
-	unsigned long flags;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		raw_spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irq(&object->lock);
 		if ((object->flags & OBJECT_REPORTED) &&
 		    unreferenced_object(object))
 			__paint_it(object, KMEMLEAK_GREY);
-		raw_spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irq(&object->lock);
 	}
 	rcu_read_unlock();
 
-- 
2.31.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock
  2022-06-12 18:32 [PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak_scan() Waiman Long
  2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
@ 2022-06-12 18:33 ` Waiman Long
  2022-06-14 16:54   ` Catalin Marinas
  2022-06-12 18:33 ` [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan() Waiman Long
  2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2022-06-12 18:33 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton; +Cc: linux-mm, linux-kernel, Waiman Long

There are 3 RCU-based object iteration loops in kmemleak_scan(). Because
of the need to take RCU read lock, we can't insert cond_resched() into
the loop like other parts of the function. As there can be millions of
objects to be scanned, it takes a while to iterate all of them. The
kmemleak functionality is usually enabled in a debug kernel which is
much slower than a non-debug kernel. With sufficient number of kmemleak
objects, the time to iterate them all may exceed 22s causing soft lockup.

  watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kmemleak:625]

In this particular bug report, the soft lockup happen in the 2nd
iteration loop.

In the 2nd and 3rd loops, most of the objects are checked and then
skipped under the object lock. Only a selected fews are modified. Those
objects certainly need lock protection. However, the lock/unlock
operation is slow especially with interrupt disabling and enabling
included.

We can actually do some basic check like color_white() without taking
the lock and skip the object accordingly. Of course, this kind of check
is racy and may miss objects that are being modified concurrently. The
cost of missed objects, however, is just that they will be discovered in
the next scan instead. The advantage of doing so is that iteration can
be done much faster especially with LOCKDEP enabled in a debug kernel.

With a debug kernel running on a 2-socket 96-thread x86-64 system
(HZ=1000), the 2nd and 3rd iteration loops speedup with this patch on
the first kmemleak_scan() call after bootup is shown in the table below.

                   Before patch                    After patch
  Loop #    # of objects  Elapsed time     # of objects  Elapsed time
  ------    ------------  ------------     ------------  ------------
    2        2,599,850      2.392s          2,596,364       0.266s
    3        2,600,176      2.171s          2,597,061       0.260s

This patch reduces loop iteration times by about 88%. This will greatly
reduce the chance of a soft lockup happening in the 2nd or 3rd iteration
loops.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index dad9219c972c..7dd64139a7c7 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1508,6 +1508,13 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
+		/*
+		 * This is racy but we can save the overhead of lock/unlock
+		 * calls. The missed objects, if any, should be caught in
+		 * the next scan.
+		 */
+		if (!color_white(object))
+			continue;
 		raw_spin_lock_irq(&object->lock);
 		if (color_white(object) && (object->flags & OBJECT_ALLOCATED)
 		    && update_checksum(object) && get_object(object)) {
@@ -1535,6 +1542,13 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
+		/*
+		 * This is racy but we can save the overhead of lock/unlock
+		 * calls. The missed objects, if any, should be caught in
+		 * the next scan.
+		 */
+		if (!color_white(object))
+			continue;
 		raw_spin_lock_irq(&object->lock);
 		if (unreferenced_object(object) &&
 		    !(object->flags & OBJECT_REPORTED)) {
-- 
2.31.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()
  2022-06-12 18:32 [PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak_scan() Waiman Long
  2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
  2022-06-12 18:33 ` [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock Waiman Long
@ 2022-06-12 18:33 ` Waiman Long
  2022-06-14 17:15   ` Catalin Marinas
  2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2022-06-12 18:33 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton; +Cc: linux-mm, linux-kernel, Waiman Long

The first RCU-based object iteration loop has to put almost all the
objects into the gray list and so cannot skip taking the object lock.

One way to avoid soft lockup is to insert occasional cond_resched()
into the loop. This cannot be done while holding the RCU read lock
which is to protect objects from removal. However, putting an object
into the gray list means getting a reference to the object. That will
prevent the object from removal as well without the need to hold the
RCU read lock. So insert a cond_resched() call after every 64k objects
are put into the gray list.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7dd64139a7c7..a7c42e134fa1 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1417,12 +1417,15 @@ static void kmemleak_scan(void)
 	struct zone *zone;
 	int __maybe_unused i;
 	int new_leaks = 0;
+	int gray_list_cnt = 0;
 
 	jiffies_last_scan = jiffies;
 
 	/* prepare the kmemleak_object's */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
+		bool object_pinned = false;
+
 		raw_spin_lock_irq(&object->lock);
 #ifdef DEBUG
 		/*
@@ -1437,10 +1440,25 @@ static void kmemleak_scan(void)
 #endif
 		/* reset the reference count (whiten the object) */
 		object->count = 0;
-		if (color_gray(object) && get_object(object))
+		if (color_gray(object) && get_object(object)) {
 			list_add_tail(&object->gray_list, &gray_list);
+			gray_list_cnt++;
+			object_pinned = true;
+		}
 
 		raw_spin_unlock_irq(&object->lock);
+
+		/*
+		 * With object pinned by a positive reference count, it
+		 * won't go away and we can safely release the RCU read
+		 * lock and do a cond_resched() to avoid soft lockup every
+		 * 64k objects.
+		 */
+		if (object_pinned && !(gray_list_cnt & 0xffff)) {
+			rcu_read_unlock();
+			cond_resched();
+			rcu_read_lock();
+		}
 	}
 	rcu_read_unlock();
 
-- 
2.31.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear()
  2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
@ 2022-06-13  7:15   ` Muchun Song
  2022-06-14 15:57   ` Catalin Marinas
  1 sibling, 0 replies; 12+ messages in thread
From: Muchun Song @ 2022-06-13  7:15 UTC (permalink / raw)
  To: Waiman Long; +Cc: Catalin Marinas, Andrew Morton, linux-mm, linux-kernel

On Sun, Jun 12, 2022 at 02:32:59PM -0400, Waiman Long wrote:
> The kmemleak_scan() function is called only from the kmemleak scan
> thread or from write to the kmemleak debugfs file. Both are in task
> context and so we can directly use the simpler _irq() lock/unlock calls
> instead of the more complex _irqsave/_irqrestore variants.
> 
> Similarly, kmemleak_clear() is called only from write to the kmemleak
> debugfs file. The same change can be applied.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear()
  2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
  2022-06-13  7:15   ` Muchun Song
@ 2022-06-14 15:57   ` Catalin Marinas
  1 sibling, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2022-06-14 15:57 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Sun, Jun 12, 2022 at 02:32:59PM -0400, Waiman Long wrote:
> The kmemleak_scan() function is called only from the kmemleak scan
> thread or from write to the kmemleak debugfs file. Both are in task
> context and so we can directly use the simpler _irq() lock/unlock calls
> instead of the more complex _irqsave/_irqrestore variants.
> 
> Similarly, kmemleak_clear() is called only from write to the kmemleak
> debugfs file. The same change can be applied.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock
  2022-06-12 18:33 ` [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock Waiman Long
@ 2022-06-14 16:54   ` Catalin Marinas
  2022-06-14 17:17     ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2022-06-14 16:54 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Sun, Jun 12, 2022 at 02:33:00PM -0400, Waiman Long wrote:
> With a debug kernel running on a 2-socket 96-thread x86-64 system
> (HZ=1000), the 2nd and 3rd iteration loops speedup with this patch on
> the first kmemleak_scan() call after bootup is shown in the table below.
> 
>                    Before patch                    After patch
>   Loop #    # of objects  Elapsed time     # of objects  Elapsed time
>   ------    ------------  ------------     ------------  ------------
>     2        2,599,850      2.392s          2,596,364       0.266s
>     3        2,600,176      2.171s          2,597,061       0.260s
> 
> This patch reduces loop iteration times by about 88%. This will greatly
> reduce the chance of a soft lockup happening in the 2nd or 3rd iteration
> loops.

Nice numbers, thanks for digging into this.

But I'm slightly surprised that the first loop doesn't cause any
problems.

> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index dad9219c972c..7dd64139a7c7 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1508,6 +1508,13 @@ static void kmemleak_scan(void)
>  	 */
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(object, &object_list, object_list) {
> +		/*
> +		 * This is racy but we can save the overhead of lock/unlock
> +		 * calls. The missed objects, if any, should be caught in
> +		 * the next scan.
> +		 */
> +		if (!color_white(object))
> +			continue;
>  		raw_spin_lock_irq(&object->lock);
>  		if (color_white(object) && (object->flags & OBJECT_ALLOCATED)
>  		    && update_checksum(object) && get_object(object)) {

It's not actually scanning (like tree look-ups) but only updating the
checksum of the potentially orphan objects. If the problem is caused by
object->lock, we should have seen it with the first loop as well.

It is possible that some large list is occasionally missed if there are
concurrent updates and a significant number of objects turn up "white",
forcing the checksum update. Otherwise this shouldn't be much different
from the first loop if there are no massive (false) leaks.

I think the race on color_white() can only be with a kmemleak_ignore()
or kmemleak_not_leak() call, otherwise the object colour shouldn't be
changed. So such objects can only turn from white to gray or black, so
the race I think is safe.

> @@ -1535,6 +1542,13 @@ static void kmemleak_scan(void)
>  	 */
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(object, &object_list, object_list) {
> +		/*
> +		 * This is racy but we can save the overhead of lock/unlock
> +		 * calls. The missed objects, if any, should be caught in
> +		 * the next scan.
> +		 */
> +		if (!color_white(object))
> +			continue;
>  		raw_spin_lock_irq(&object->lock);
>  		if (unreferenced_object(object) &&
>  		    !(object->flags & OBJECT_REPORTED)) {

Same here.

I did wonder whether it's worth keeping object->lock around, I even have
a stashed patch lying around from 2019. Instead we'd have the big
kmemleak_lock held for longer, though released periodically during
scanning. We can then move the lock outside the loop and traversal would
be faster but with an increased latency on slab allocation/freeing on
other CPUs. Right now we take the kmemleak_lock when scanning a single
block (e.g. object) to protect the rb-tree and rely on object->lock to
ensure the object isn't freed. Other concurrent allocs/frees would only
be blocked during single object scanning.

Anyway, I'm not entirely sure it's the lock causing the issue as we
don't see it with the first loop. I'm more inclined to think it's the
checksum and the skipping if !color_white() would do the trick.

Unless there's a better idea:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()
  2022-06-12 18:33 ` [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan() Waiman Long
@ 2022-06-14 17:15   ` Catalin Marinas
  2022-06-14 17:27     ` Catalin Marinas
  0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2022-06-14 17:15 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Sun, Jun 12, 2022 at 02:33:01PM -0400, Waiman Long wrote:
> @@ -1437,10 +1440,25 @@ static void kmemleak_scan(void)
>  #endif
>  		/* reset the reference count (whiten the object) */
>  		object->count = 0;
> -		if (color_gray(object) && get_object(object))
> +		if (color_gray(object) && get_object(object)) {
>  			list_add_tail(&object->gray_list, &gray_list);
> +			gray_list_cnt++;
> +			object_pinned = true;
> +		}
>  
>  		raw_spin_unlock_irq(&object->lock);
> +
> +		/*
> +		 * With object pinned by a positive reference count, it
> +		 * won't go away and we can safely release the RCU read
> +		 * lock and do a cond_resched() to avoid soft lockup every
> +		 * 64k objects.
> +		 */
> +		if (object_pinned && !(gray_list_cnt & 0xffff)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			rcu_read_lock();
> +		}

I'm not sure this gains much. There should be very few gray objects
initially (those passed to kmemleak_not_leak() for example). The
majority should be white objects.

If we drop the fine-grained object->lock, we could instead take
kmemleak_lock outside the loop with a cond_resched_lock(&kmemleak_lock)
within the loop. I think we can get away with not having an
rcu_read_lock() at all for list traversal with the big lock outside the
loop.

The reason I added it in the first kmemleak incarnation was to defer
kmemleak_object freeing as it was causing a re-entrant call into the
slab allocator. I later went for fine-grained locking and RCU list
traversal but I may have overdone it ;).

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock
  2022-06-14 16:54   ` Catalin Marinas
@ 2022-06-14 17:17     ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2022-06-14 17:17 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, linux-kernel

On 6/14/22 12:54, Catalin Marinas wrote:
> On Sun, Jun 12, 2022 at 02:33:00PM -0400, Waiman Long wrote:
>> With a debug kernel running on a 2-socket 96-thread x86-64 system
>> (HZ=1000), the 2nd and 3rd iteration loops speedup with this patch on
>> the first kmemleak_scan() call after bootup is shown in the table below.
>>
>>                     Before patch                    After patch
>>    Loop #    # of objects  Elapsed time     # of objects  Elapsed time
>>    ------    ------------  ------------     ------------  ------------
>>      2        2,599,850      2.392s          2,596,364       0.266s
>>      3        2,600,176      2.171s          2,597,061       0.260s
>>
>> This patch reduces loop iteration times by about 88%. This will greatly
>> reduce the chance of a soft lockup happening in the 2nd or 3rd iteration
>> loops.
> Nice numbers, thanks for digging into this.
>
> But I'm slightly surprised that the first loop doesn't cause any
> problems.

The first loop is still problematic. It is just a bit faster with the 
same number of objects. The corresponding elapsed time is about 1.7s. 
The heuristics used in this patch cannot be applied to the first loop. 
See patch 3 on how to avoid soft lockup in the first loop.

>
>> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
>> index dad9219c972c..7dd64139a7c7 100644
>> --- a/mm/kmemleak.c
>> +++ b/mm/kmemleak.c
>> @@ -1508,6 +1508,13 @@ static void kmemleak_scan(void)
>>   	 */
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(object, &object_list, object_list) {
>> +		/*
>> +		 * This is racy but we can save the overhead of lock/unlock
>> +		 * calls. The missed objects, if any, should be caught in
>> +		 * the next scan.
>> +		 */
>> +		if (!color_white(object))
>> +			continue;
>>   		raw_spin_lock_irq(&object->lock);
>>   		if (color_white(object) && (object->flags & OBJECT_ALLOCATED)
>>   		    && update_checksum(object) && get_object(object)) {
> It's not actually scanning (like tree look-ups) but only updating the
> checksum of the potentially orphan objects. If the problem is caused by
> object->lock, we should have seen it with the first loop as well.

See above. Maybe I should clarify in the patch description that similar 
change cannot be applied to the first loop.

>
> It is possible that some large list is occasionally missed if there are
> concurrent updates and a significant number of objects turn up "white",
> forcing the checksum update. Otherwise this shouldn't be much different
> from the first loop if there are no massive (false) leaks.
>
> I think the race on color_white() can only be with a kmemleak_ignore()
> or kmemleak_not_leak() call, otherwise the object colour shouldn't be
> changed. So such objects can only turn from white to gray or black, so
> the race I think is safe.
>
>> @@ -1535,6 +1542,13 @@ static void kmemleak_scan(void)
>>   	 */
>>   	rcu_read_lock();
>>   	list_for_each_entry_rcu(object, &object_list, object_list) {
>> +		/*
>> +		 * This is racy but we can save the overhead of lock/unlock
>> +		 * calls. The missed objects, if any, should be caught in
>> +		 * the next scan.
>> +		 */
>> +		if (!color_white(object))
>> +			continue;
>>   		raw_spin_lock_irq(&object->lock);
>>   		if (unreferenced_object(object) &&
>>   		    !(object->flags & OBJECT_REPORTED)) {
> Same here.
>
> I did wonder whether it's worth keeping object->lock around, I even have
> a stashed patch lying around from 2019. Instead we'd have the big
> kmemleak_lock held for longer, though released periodically during
> scanning. We can then move the lock outside the loop and traversal would
> be faster but with an increased latency on slab allocation/freeing on
> other CPUs. Right now we take the kmemleak_lock when scanning a single
> block (e.g. object) to protect the rb-tree and rely on object->lock to
> ensure the object isn't freed. Other concurrent allocs/frees would only
> be blocked during single object scanning.
>
> Anyway, I'm not entirely sure it's the lock causing the issue as we
> don't see it with the first loop. I'm more inclined to think it's the
> checksum and the skipping if !color_white() would do the trick.
>
> Unless there's a better idea:
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

The lock is a problem because of lockdep. Once I disable lockdep, the 
elapsed time can drop to about 0.7s. However, lockdep is normally 
enabled in a debug kernel. I will try to investigate if there is a way 
to optimize lockdep or such repeated lock/unlock loop.

Thanks,
Longman

>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()
  2022-06-14 17:15   ` Catalin Marinas
@ 2022-06-14 17:27     ` Catalin Marinas
  2022-06-14 18:22       ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2022-06-14 17:27 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Tue, Jun 14, 2022 at 06:15:14PM +0100, Catalin Marinas wrote:
> On Sun, Jun 12, 2022 at 02:33:01PM -0400, Waiman Long wrote:
> > @@ -1437,10 +1440,25 @@ static void kmemleak_scan(void)
> >  #endif
> >  		/* reset the reference count (whiten the object) */
> >  		object->count = 0;
> > -		if (color_gray(object) && get_object(object))
> > +		if (color_gray(object) && get_object(object)) {
> >  			list_add_tail(&object->gray_list, &gray_list);
> > +			gray_list_cnt++;
> > +			object_pinned = true;
> > +		}
> >  
> >  		raw_spin_unlock_irq(&object->lock);
> > +
> > +		/*
> > +		 * With object pinned by a positive reference count, it
> > +		 * won't go away and we can safely release the RCU read
> > +		 * lock and do a cond_resched() to avoid soft lockup every
> > +		 * 64k objects.
> > +		 */
> > +		if (object_pinned && !(gray_list_cnt & 0xffff)) {
> > +			rcu_read_unlock();
> > +			cond_resched();
> > +			rcu_read_lock();
> > +		}
> 
> I'm not sure this gains much. There should be very few gray objects
> initially (those passed to kmemleak_not_leak() for example). The
> majority should be white objects.
> 
> If we drop the fine-grained object->lock, we could instead take
> kmemleak_lock outside the loop with a cond_resched_lock(&kmemleak_lock)
> within the loop. I think we can get away with not having an
> rcu_read_lock() at all for list traversal with the big lock outside the
> loop.

Actually this doesn't work is the current object in the iteration is
freed. Does list_for_each_rcu_safe() help?

-- 
Catalin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()
  2022-06-14 17:27     ` Catalin Marinas
@ 2022-06-14 18:22       ` Waiman Long
  2022-06-14 18:28         ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2022-06-14 18:22 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, linux-kernel

On 6/14/22 13:27, Catalin Marinas wrote:
> On Tue, Jun 14, 2022 at 06:15:14PM +0100, Catalin Marinas wrote:
>> On Sun, Jun 12, 2022 at 02:33:01PM -0400, Waiman Long wrote:
>>> @@ -1437,10 +1440,25 @@ static void kmemleak_scan(void)
>>>   #endif
>>>   		/* reset the reference count (whiten the object) */
>>>   		object->count = 0;
>>> -		if (color_gray(object) && get_object(object))
>>> +		if (color_gray(object) && get_object(object)) {
>>>   			list_add_tail(&object->gray_list, &gray_list);
>>> +			gray_list_cnt++;
>>> +			object_pinned = true;
>>> +		}
>>>   
I may have the mistaken belief that setting count to 0 will make most 
object gray. Apparently, that may not be the case here.
>>>   		raw_spin_unlock_irq(&object->lock);
>>> +
>>> +		/*
>>> +		 * With object pinned by a positive reference count, it
>>> +		 * won't go away and we can safely release the RCU read
>>> +		 * lock and do a cond_resched() to avoid soft lockup every
>>> +		 * 64k objects.
>>> +		 */
>>> +		if (object_pinned && !(gray_list_cnt & 0xffff)) {
>>> +			rcu_read_unlock();
>>> +			cond_resched();
>>> +			rcu_read_lock();
>>> +		}
>> I'm not sure this gains much. There should be very few gray objects
>> initially (those passed to kmemleak_not_leak() for example). The
>> majority should be white objects.
>>
>> If we drop the fine-grained object->lock, we could instead take
>> kmemleak_lock outside the loop with a cond_resched_lock(&kmemleak_lock)
>> within the loop. I think we can get away with not having an
>> rcu_read_lock() at all for list traversal with the big lock outside the
>> loop.
> Actually this doesn't work is the current object in the iteration is
> freed. Does list_for_each_rcu_safe() help?

list_for_each_rcu_safe() helps if we are worrying about object being 
freed. However, it won't help if object->next is freed instead.

How about something like:

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7dd64139a7c7..fd836e43cb16 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1417,12 +1417,16 @@ static void kmemleak_scan(void)
         struct zone *zone;
         int __maybe_unused i;
         int new_leaks = 0;
+       int loop1_cnt = 0;

         jiffies_last_scan = jiffies;

         /* prepare the kmemleak_object's */
         rcu_read_lock();
         list_for_each_entry_rcu(object, &object_list, object_list) {
+               bool obj_pinned = false;
+
+               loop1_cnt++;
                 raw_spin_lock_irq(&object->lock);
  #ifdef DEBUG
                 /*
@@ -1437,10 +1441,32 @@ static void kmemleak_scan(void)
  #endif
                 /* reset the reference count (whiten the object) */
                 object->count = 0;
-               if (color_gray(object) && get_object(object))
+               if (color_gray(object) && get_object(object)) {
                         list_add_tail(&object->gray_list, &gray_list);
+                       obj_pinned = true;
+               }

                 raw_spin_unlock_irq(&object->lock);
+
+               /*
+                * Do a cond_resched() to avoid soft lockup every 64k 
objects.
+                * Make sure a reference has been taken so that the object
+                * won't go away without RCU read lock.
+                */
+               if (loop1_cnt & 0xffff) {
+                       if (!obj_pinned && !get_object(object)) {
+                               /* Try the next object instead */
+                               loop1_cnt--;
+                               continue;
+                       }
+
+                       rcu_read_unlock();
+                       cond_resched();
+                       rcu_read_lock();
+
+                       if (!obj_pinned)
+                               put_object(object);
+               }
         }
         rcu_read_unlock();

Cheers,
Longman




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()
  2022-06-14 18:22       ` Waiman Long
@ 2022-06-14 18:28         ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2022-06-14 18:28 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, linux-kernel


On 6/14/22 14:22, Waiman Long wrote:
> On 6/14/22 13:27, Catalin Marinas wrote:
>
>>>> raw_spin_unlock_irq(&object->lock);
>>>> +
>>>> +        /*
>>>> +         * With object pinned by a positive reference count, it
>>>> +         * won't go away and we can safely release the RCU read
>>>> +         * lock and do a cond_resched() to avoid soft lockup every
>>>> +         * 64k objects.
>>>> +         */
>>>> +        if (object_pinned && !(gray_list_cnt & 0xffff)) {
>>>> +            rcu_read_unlock();
>>>> +            cond_resched();
>>>> +            rcu_read_lock();
>>>> +        }
>>> I'm not sure this gains much. There should be very few gray objects
>>> initially (those passed to kmemleak_not_leak() for example). The
>>> majority should be white objects.
>>>
>>> If we drop the fine-grained object->lock, we could instead take
>>> kmemleak_lock outside the loop with a cond_resched_lock(&kmemleak_lock)
>>> within the loop. I think we can get away with not having an
>>> rcu_read_lock() at all for list traversal with the big lock outside the
>>> loop.
>> Actually this doesn't work is the current object in the iteration is
>> freed. Does list_for_each_rcu_safe() help?
>
> list_for_each_rcu_safe() helps if we are worrying about object being 
> freed. However, it won't help if object->next is freed instead.
>
> How about something like:
>
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 7dd64139a7c7..fd836e43cb16 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1417,12 +1417,16 @@ static void kmemleak_scan(void)
>         struct zone *zone;
>         int __maybe_unused i;
>         int new_leaks = 0;
> +       int loop1_cnt = 0;
>
>         jiffies_last_scan = jiffies;
>
>         /* prepare the kmemleak_object's */
>         rcu_read_lock();
>         list_for_each_entry_rcu(object, &object_list, object_list) {
> +               bool obj_pinned = false;
> +
> +               loop1_cnt++;
>                 raw_spin_lock_irq(&object->lock);
>  #ifdef DEBUG
>                 /*
> @@ -1437,10 +1441,32 @@ static void kmemleak_scan(void)
>  #endif
>                 /* reset the reference count (whiten the object) */
>                 object->count = 0;
> -               if (color_gray(object) && get_object(object))
> +               if (color_gray(object) && get_object(object)) {
>                         list_add_tail(&object->gray_list, &gray_list);
> +                       obj_pinned = true;
> +               }
>
>                 raw_spin_unlock_irq(&object->lock);
> +
> +               /*
> +                * Do a cond_resched() to avoid soft lockup every 64k 
> objects.
> +                * Make sure a reference has been taken so that the 
> object
> +                * won't go away without RCU read lock.
> +                */
> +               if (loop1_cnt & 0xffff) {


Sorry, should be "(!(loop1_cnt & 0xffff))".

> + if (!obj_pinned && !get_object(object)) {
> +                               /* Try the next object instead */
> +                               loop1_cnt--;
> +                               continue;
> +                       }
> +
> +                       rcu_read_unlock();
> +                       cond_resched();
> +                       rcu_read_lock();
> +
> +                       if (!obj_pinned)
> +                               put_object(object);
> +               }
>         }
>         rcu_read_unlock();
>
Cheers,
Longman



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-14 18:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-12 18:32 [PATCH 0/3] mm/kmemleak: Avoid soft lockup in kmemleak_scan() Waiman Long
2022-06-12 18:32 ` [PATCH 1/3] mm/kmemleak: Use _irq lock/unlock variants in kmemleak_scan/_clear() Waiman Long
2022-06-13  7:15   ` Muchun Song
2022-06-14 15:57   ` Catalin Marinas
2022-06-12 18:33 ` [PATCH 2/3] mm/kmemleak: Skip unlikely objects in kmemleak_scan() without taking lock Waiman Long
2022-06-14 16:54   ` Catalin Marinas
2022-06-14 17:17     ` Waiman Long
2022-06-12 18:33 ` [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan() Waiman Long
2022-06-14 17:15   ` Catalin Marinas
2022-06-14 17:27     ` Catalin Marinas
2022-06-14 18:22       ` Waiman Long
2022-06-14 18:28         ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox