linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* kmemleak handling of kfree_rcu
@ 2023-08-30 16:37 Christoph Paasch
  2023-09-04 21:22 ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Paasch @ 2023-08-30 16:37 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, MPTCP Upstream

[-- Attachment #1: Type: text/plain, Size: 791 bytes --]

Hello,

for the MPTCP upstream project, we are running syzkaller [1] continuously to qualify our kernel changes.

We found one issue with kmemleak and its handling of kfree_rcu.


Specifically, it looks like kmemleak falsely reports memory-leaks when the object is being freed by kfree_rcu after a certain grace-period.

For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#issuecomment-1584819482 shows how the syzkaller program reliably produces a kmemleak report, although the object is not leaked (we confirmed that by simply increasing MSECS_MIN_AGE to something larger than the grace-period).


Is this a known limitation of kmemleak, or is there something else that needs to be done ?



Thanks,
Christoph

[1] https://github.com/google/syzkaller

[-- Attachment #2: Type: text/html, Size: 1267 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-08-30 16:37 kmemleak handling of kfree_rcu Christoph Paasch
@ 2023-09-04 21:22 ` Catalin Marinas
  2023-09-05 11:17   ` Joel Fernandes
  2023-09-05 21:22   ` Christoph Paasch
  0 siblings, 2 replies; 17+ messages in thread
From: Catalin Marinas @ 2023-09-04 21:22 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hi Christoph,

Please try not to send html, many servers block such emails.

Also adding the RCU list.

On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
> for the MPTCP upstream project, we are running syzkaller [1] continuously to
> qualify our kernel changes.
> 
> We found one issue with kmemleak and its handling of kfree_rcu.
> 
> Specifically, it looks like kmemleak falsely reports memory-leaks when the
> object is being freed by kfree_rcu after a certain grace-period.
> 
> For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
> issuecomment-1584819482 shows how the syzkaller program reliably produces a
> kmemleak report, although the object is not leaked (we confirmed that by simply
> increasing MSECS_MIN_AGE to something larger than the grace-period).

Not sure which RCU variant you are using but most likely the false
positives are caused by the original reference to the object being lost
and the pointer added to a new location that kmemleak does not track
(e.g. bnode->records[] in the tree-based variant).

A quick attempt (untested, not even compiled):

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1449cb69a0e0..681a1eb7560a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -53,6 +53,7 @@
 #include <linux/sysrq.h>
 #include <linux/kprobes.h>
 #include <linux/gfp.h>
+#include <linux/kmemleak.h>
 #include <linux/oom.h>
 #include <linux/smpboot.h>
 #include <linux/jiffies.h>
@@ -2934,6 +2935,7 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
 
 	llist_for_each_safe(pos, n, page_list) {
 		free_page((unsigned long)pos);
+		kmemleak_free(pos);
 		freed++;
 	}
 
@@ -2962,8 +2964,16 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
 					rcu_state.name, bnode->records[i], 0);
 
 				vfree(bnode->records[i]);
+				/* avoid false negatives */
+				kmemleak_erase(&bnode->records[i]);
 			}
 		}
+		/*
+		 * Avoid kmemleak false negatives when such pointers are later
+		 * re-allocated.
+		 */
+		for (i = 0; i < bnode->nr_records; i++)
+			kmemleak_erase(&bnode->records[i]);
 		rcu_lock_release(&rcu_callback_map);
 	}
 
@@ -2972,8 +2982,10 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
 		bnode = NULL;
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
-	if (bnode)
+	if (bnode) {
 		free_page((unsigned long) bnode);
+		kmemleak_free(bnode);
+	}
 
 	cond_resched_tasks_rcu_qs();
 }
@@ -3241,6 +3253,12 @@ static void fill_page_cache_func(struct work_struct *work)
 			free_page((unsigned long) bnode);
 			break;
 		}
+
+		/*
+		 * Scan the bnode->records[] array until the objects are
+		 * actually freed.
+		 */
+		kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
 	}
 
 	atomic_set(&krcp->work_in_progress, 0);
@@ -3308,6 +3326,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 			// scenarios.
 			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
+			/*
+			 * Scan the bnode->records[] array until the objects are
+			 * actually freed.
+			 */
+			kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
 			raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
 		}
 

-- 
Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-04 21:22 ` Catalin Marinas
@ 2023-09-05 11:17   ` Joel Fernandes
  2023-09-05 14:41     ` Catalin Marinas
  2023-09-05 21:22   ` Christoph Paasch
  1 sibling, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-05 11:17 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

On Mon, Sep 04, 2023 at 10:22:56PM +0100, Catalin Marinas wrote:
> Hi Christoph,
> 
> Please try not to send html, many servers block such emails.
> 
> Also adding the RCU list.
> 
> On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
> > for the MPTCP upstream project, we are running syzkaller [1] continuously to
> > qualify our kernel changes.
> > 
> > We found one issue with kmemleak and its handling of kfree_rcu.
> > 
> > Specifically, it looks like kmemleak falsely reports memory-leaks when the
> > object is being freed by kfree_rcu after a certain grace-period.
> > 
> > For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
> > issuecomment-1584819482 shows how the syzkaller program reliably produces a
> > kmemleak report, although the object is not leaked (we confirmed that by simply
> > increasing MSECS_MIN_AGE to something larger than the grace-period).
> 
> Not sure which RCU variant you are using but most likely the false
> positives are caused by the original reference to the object being lost
> and the pointer added to a new location that kmemleak does not track
> (e.g. bnode->records[] in the tree-based variant).
> 
> A quick attempt (untested, not even compiled):
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c

I am not sure if that works. Correct me if I'm wrong but the issue is that
the allocated pointer being RCU-freed is no longer reachable by kmemleak (it
is scanned but not reachable), however it might still be reachable via an
RCU reader. In such a situation, it is a false-positive.

The correct fix then should probably be to mark the object as
kmemleak_not_leak() until a grace period elapses. This will cause the object
to not be reported but still be scanned until eventually the lower layers
will remove the object from kmemleak-tracking after the grace period. Per the
docs also, that API is used to prevent false-positives.

Instead what the below diff appears to do is to mark the bnode cache as a
kmemleak object itself, which smells a bit wrong. The bnode is not an
allocated object in the traditional sense, it is simple an internal data
structure. That may not solve the issue as the false-positive unreachable
object is still unreachable.

Or did I misunderstand how kmemleak works? (Quite possible).

thanks,

 - Joel


> index 1449cb69a0e0..681a1eb7560a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -53,6 +53,7 @@
>  #include <linux/sysrq.h>
>  #include <linux/kprobes.h>
>  #include <linux/gfp.h>
> +#include <linux/kmemleak.h>
>  #include <linux/oom.h>
>  #include <linux/smpboot.h>
>  #include <linux/jiffies.h>
> @@ -2934,6 +2935,7 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
>  
>  	llist_for_each_safe(pos, n, page_list) {
>  		free_page((unsigned long)pos);
> +		kmemleak_free(pos);
>  		freed++;
>  	}
>  
> @@ -2962,8 +2964,16 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
>  					rcu_state.name, bnode->records[i], 0);
>  
>  				vfree(bnode->records[i]);
> +				/* avoid false negatives */
> +				kmemleak_erase(&bnode->records[i]);
>  			}
>  		}
> +		/*
> +		 * Avoid kmemleak false negatives when such pointers are later
> +		 * re-allocated.
> +		 */
> +		for (i = 0; i < bnode->nr_records; i++)
> +			kmemleak_erase(&bnode->records[i]);
>  		rcu_lock_release(&rcu_callback_map);
>  	}
>  
> @@ -2972,8 +2982,10 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
>  		bnode = NULL;
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  
> -	if (bnode)
> +	if (bnode) {
>  		free_page((unsigned long) bnode);
> +		kmemleak_free(bnode);
> +	}
>  
>  	cond_resched_tasks_rcu_qs();
>  }
> @@ -3241,6 +3253,12 @@ static void fill_page_cache_func(struct work_struct *work)
>  			free_page((unsigned long) bnode);
>  			break;
>  		}
> +
> +		/*
> +		 * Scan the bnode->records[] array until the objects are
> +		 * actually freed.
> +		 */
> +		kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
>  	}
>  
>  	atomic_set(&krcp->work_in_progress, 0);
> @@ -3308,6 +3326,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
>  			// scenarios.
>  			bnode = (struct kvfree_rcu_bulk_data *)
>  				__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
> +			/*
> +			 * Scan the bnode->records[] array until the objects are
> +			 * actually freed.
> +			 */
> +			kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
>  			raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
>  		}
>  
> 
> -- 
> Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-05 11:17   ` Joel Fernandes
@ 2023-09-05 14:41     ` Catalin Marinas
  2023-09-06 14:35       ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2023-09-05 14:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> On Mon, Sep 04, 2023 at 10:22:56PM +0100, Catalin Marinas wrote:
> > On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
> > > for the MPTCP upstream project, we are running syzkaller [1] continuously to
> > > qualify our kernel changes.
> > > 
> > > We found one issue with kmemleak and its handling of kfree_rcu.
> > > 
> > > Specifically, it looks like kmemleak falsely reports memory-leaks when the
> > > object is being freed by kfree_rcu after a certain grace-period.
> > > 
> > > For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
> > > issuecomment-1584819482 shows how the syzkaller program reliably produces a
> > > kmemleak report, although the object is not leaked (we confirmed that by simply
> > > increasing MSECS_MIN_AGE to something larger than the grace-period).
> > 
> > Not sure which RCU variant you are using but most likely the false
> > positives are caused by the original reference to the object being lost
> > and the pointer added to a new location that kmemleak does not track
> > (e.g. bnode->records[] in the tree-based variant).
> > 
> > A quick attempt (untested, not even compiled):
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> 
> I am not sure if that works. Correct me if I'm wrong but the issue is that
> the allocated pointer being RCU-freed is no longer reachable by kmemleak (it
> is scanned but not reachable), however it might still be reachable via an
> RCU reader. In such a situation, it is a false-positive.

Yes, with a small correction that the object being RCU-freed is not
scanned if it cannot be reached by kmemleak (unless explicitly marked as
not a leak).

> The correct fix then should probably be to mark the object as
> kmemleak_not_leak() until a grace period elapses. This will cause the object
> to not be reported but still be scanned until eventually the lower layers
> will remove the object from kmemleak-tracking after the grace period. Per the
> docs also, that API is used to prevent false-positives.

This should work as well but I'd use kmemleak_ignore() instead of
kmemleak_not_leak(). The former, apart from masking the false positive,
also tells kmemleak not to scan the object. After a kvfree_rcu(), the
object shouldn't have any valid references to other objects, so not
worth scanning.

> Instead what the below diff appears to do is to mark the bnode cache as a
> kmemleak object itself, which smells a bit wrong. The bnode is not an
> allocated object in the traditional sense, it is simple an internal data
> structure.

We do this in other places for objects containing pointers and which
aren't tracked by kmemleak (it doesn't track page allocations as there
would be too many leading to lots of false positives or overlapping
objects - the slab itself uses the page allocator). An example where we
do something similar is alloc_large_system_hash(). We could add a
wrapper API if kmemleak_alloc() feels wrong but underlying in kmemleak
it would use the same mechanism for recording and scanning the bnode.

> That may not solve the issue as the false-positive unreachable
> object is still unreachable.

It should solve the issue as long as the bnode is reachable from the
root objects (e.g. the data/bss sections; I think it is via the
per_cpu_ptr(&krc, cpu)->bkvcache llist_head). When the bnode is scanned,
it will find the pointer to the RCU-freed object and mark it as
referenced (not a leak).

Anyway, as we trust the RCU freeing implementation not to leak data, we
can go with your simpler suggestion of a kmemleak_ignore() call on the
kvfree_rcu() path. Probably even better as the object pending to be
freed will no longer be scanned.

-- 
Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-04 21:22 ` Catalin Marinas
  2023-09-05 11:17   ` Joel Fernandes
@ 2023-09-05 21:22   ` Christoph Paasch
  2023-09-06 17:21     ` Catalin Marinas
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Paasch @ 2023-09-05 21:22 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hello Catalin,

> On Sep 4, 2023, at 2:22 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> Hi Christoph,
> 
> Please try not to send html, many servers block such emails.

Sorry for that… wasn’t expecting my mail-client to come into my way...

> 
> Also adding the RCU list.
> 
> On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
>> for the MPTCP upstream project, we are running syzkaller [1] continuously to
>> qualify our kernel changes.
>> 
>> We found one issue with kmemleak and its handling of kfree_rcu.
>> 
>> Specifically, it looks like kmemleak falsely reports memory-leaks when the
>> object is being freed by kfree_rcu after a certain grace-period.
>> 
>> For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
>> issuecomment-1584819482 shows how the syzkaller program reliably produces a
>> kmemleak report, although the object is not leaked (we confirmed that by simply
>> increasing MSECS_MIN_AGE to something larger than the grace-period).
> 
> Not sure which RCU variant you are using but most likely the false
> positives are caused by the original reference to the object being lost
> and the pointer added to a new location that kmemleak does not track
> (e.g. bnode->records[] in the tree-based variant).
> 
> A quick attempt (untested, not even compiled):

I tried out your patch. It does resolve the false positive !

However, I am occasionally getting a report of a single object being leaked. When I try to visualize it with `cat /sys/kernel/debug/kmemleak`, the object does not show up anymore…

So, something else seems to be going on here as well now.


If you have an updated patch, let me know. I can test it.


Thanks,
Christoph

> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1449cb69a0e0..681a1eb7560a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -53,6 +53,7 @@
> #include <linux/sysrq.h>
> #include <linux/kprobes.h>
> #include <linux/gfp.h>
> +#include <linux/kmemleak.h>
> #include <linux/oom.h>
> #include <linux/smpboot.h>
> #include <linux/jiffies.h>
> @@ -2934,6 +2935,7 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
> 
> llist_for_each_safe(pos, n, page_list) {
> free_page((unsigned long)pos);
> + kmemleak_free(pos);
> freed++;
> }
> 
> @@ -2962,8 +2964,16 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
> rcu_state.name, bnode->records[i], 0);
> 
> vfree(bnode->records[i]);
> + /* avoid false negatives */
> + kmemleak_erase(&bnode->records[i]);
> }
> }
> + /*
> + * Avoid kmemleak false negatives when such pointers are later
> + * re-allocated.
> + */
> + for (i = 0; i < bnode->nr_records; i++)
> + kmemleak_erase(&bnode->records[i]);
> rcu_lock_release(&rcu_callback_map);
> }
> 
> @@ -2972,8 +2982,10 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
> bnode = NULL;
> raw_spin_unlock_irqrestore(&krcp->lock, flags);
> 
> - if (bnode)
> + if (bnode) {
> free_page((unsigned long) bnode);
> + kmemleak_free(bnode);
> + }
> 
> cond_resched_tasks_rcu_qs();
> }
> @@ -3241,6 +3253,12 @@ static void fill_page_cache_func(struct work_struct *work)
> free_page((unsigned long) bnode);
> break;
> }
> +
> + /*
> + * Scan the bnode->records[] array until the objects are
> + * actually freed.
> + */
> + kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
> }
> 
> atomic_set(&krcp->work_in_progress, 0);
> @@ -3308,6 +3326,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> // scenarios.
> bnode = (struct kvfree_rcu_bulk_data *)
> __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
> + /*
> + * Scan the bnode->records[] array until the objects are
> + * actually freed.
> + */
> + kmemleak_alloc(bnode, PAGE_SIZE, 0, GFP_KERNEL);
> raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
> }
> 
> 
> -- 
> Catalin
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-05 14:41     ` Catalin Marinas
@ 2023-09-06 14:35       ` Joel Fernandes
  2023-09-06 17:15         ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-06 14:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hi Catalin,

On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > On Mon, Sep 04, 2023 at 10:22:56PM +0100, Catalin Marinas wrote:
> > > On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
> > > > for the MPTCP upstream project, we are running syzkaller [1] continuously to
> > > > qualify our kernel changes.
> > > > 
> > > > We found one issue with kmemleak and its handling of kfree_rcu.
> > > > 
> > > > Specifically, it looks like kmemleak falsely reports memory-leaks when the
> > > > object is being freed by kfree_rcu after a certain grace-period.
> > > > 
> > > > For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
> > > > issuecomment-1584819482 shows how the syzkaller program reliably produces a
> > > > kmemleak report, although the object is not leaked (we confirmed that by simply
> > > > increasing MSECS_MIN_AGE to something larger than the grace-period).
> > > 
> > > Not sure which RCU variant you are using but most likely the false
> > > positives are caused by the original reference to the object being lost
> > > and the pointer added to a new location that kmemleak does not track
> > > (e.g. bnode->records[] in the tree-based variant).
> > > 
> > > A quick attempt (untested, not even compiled):
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > 
> > I am not sure if that works. Correct me if I'm wrong but the issue is that
> > the allocated pointer being RCU-freed is no longer reachable by kmemleak (it
> > is scanned but not reachable), however it might still be reachable via an
> > RCU reader. In such a situation, it is a false-positive.
> 
> Yes, with a small correction that the object being RCU-freed is not
> scanned if it cannot be reached by kmemleak (unless explicitly marked as
> not a leak).

Yes I used wrong wording. I meant it is added to the kmemleak rbtree as
something that needs to be reachable, but cannot be reached. Hopefully that
makes sense.

> > The correct fix then should probably be to mark the object as
> > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > to not be reported but still be scanned until eventually the lower layers
> > will remove the object from kmemleak-tracking after the grace period. Per the
> > docs also, that API is used to prevent false-positives.
> 
> This should work as well but I'd use kmemleak_ignore() instead of
> kmemleak_not_leak(). The former, apart from masking the false positive,
> also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> object shouldn't have any valid references to other objects, so not
> worth scanning.

Yes I am also OK with that, however to me I consider the object as alive as
long as the grace period does not end. But I agree with you and it may not be
worth tracking them or scanning them.

> > Instead what the below diff appears to do is to mark the bnode cache as a
> > kmemleak object itself, which smells a bit wrong. The bnode is not an
> > allocated object in the traditional sense, it is simple an internal data
> > structure.
> 
> We do this in other places for objects containing pointers and which
> aren't tracked by kmemleak (it doesn't track page allocations as there
> would be too many leading to lots of false positives or overlapping
> objects - the slab itself uses the page allocator). An example where we
> do something similar is alloc_large_system_hash(). We could add a
> wrapper API if kmemleak_alloc() feels wrong but underlying in kmemleak
> it would use the same mechanism for recording and scanning the bnode.

Ah I see what you did, you basically made the bnode as a kmemleak object in
its own right, and perhaps the kmemleak detector can reach the object you
added. That actually (though sounding like a little of a hack) would work too.
I say hack because the object you added is not an allocated object in the
traditional sense :-D. But as you said, there is precendent for that.

> > That may not solve the issue as the false-positive unreachable
> > object is still unreachable.
> 
> It should solve the issue as long as the bnode is reachable from the
> root objects (e.g. the data/bss sections; I think it is via the
> per_cpu_ptr(&krc, cpu)->bkvcache llist_head). When the bnode is scanned,
> it will find the pointer to the RCU-freed object and mark it as
> referenced (not a leak).

Right! That's what I was missing.

> Anyway, as we trust the RCU freeing implementation not to leak data, we
> can go with your simpler suggestion of a kmemleak_ignore() call on the
> kvfree_rcu() path. Probably even better as the object pending to be
> freed will no longer be scanned.

Sounds good and thanks a lot Catalin! Unless you beat me to it, I'll send a
patch out along those lines by the weekend and CC you with your suggested-by
and the reported-by from the reporter ;-). Let me know if you have a preference.

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 14:35       ` Joel Fernandes
@ 2023-09-06 17:15         ` Catalin Marinas
  2023-09-06 19:11           ` Paul E. McKenney
  2023-09-12 13:15           ` Matthieu Baerts
  0 siblings, 2 replies; 17+ messages in thread
From: Catalin Marinas @ 2023-09-06 17:15 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

On Wed, Sep 06, 2023 at 02:35:29PM +0000, Joel Fernandes wrote:
> On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> > On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > > The correct fix then should probably be to mark the object as
> > > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > > to not be reported but still be scanned until eventually the lower layers
> > > will remove the object from kmemleak-tracking after the grace period. Per the
> > > docs also, that API is used to prevent false-positives.
> > 
> > This should work as well but I'd use kmemleak_ignore() instead of
> > kmemleak_not_leak(). The former, apart from masking the false positive,
> > also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> > object shouldn't have any valid references to other objects, so not
> > worth scanning.
> 
> Yes I am also OK with that, however to me I consider the object as alive as
> long as the grace period does not end. But I agree with you and it may not be
> worth tracking them or scanning them.

I guess from an RCU perspective, the object is still alive. From the
kvfree_rcu() caller perspective though, it can disappear at any point
after the grace period, so it shouldn't rely on its content being valid
and referencing other objects (other than transiently e.g. in RCU list
traversal).

It probably only matters if we have some very long grace periods (I'm
not up to date with the recent RCU developments). In such cases, the
object still being scanned could introduce false negatives. That's my
reasoning for suggesting kmemleak_ignore() rather than
kmemleak_not_leak().

> > > Instead what the below diff appears to do is to mark the bnode cache as a
> > > kmemleak object itself, which smells a bit wrong. The bnode is not an
> > > allocated object in the traditional sense, it is simple an internal data
> > > structure.
> > 
> > We do this in other places for objects containing pointers and which
> > aren't tracked by kmemleak (it doesn't track page allocations as there
> > would be too many leading to lots of false positives or overlapping
> > objects - the slab itself uses the page allocator). An example where we
> > do something similar is alloc_large_system_hash(). We could add a
> > wrapper API if kmemleak_alloc() feels wrong but underlying in kmemleak
> > it would use the same mechanism for recording and scanning the bnode.
> 
> Ah I see what you did, you basically made the bnode as a kmemleak object in
> its own right, and perhaps the kmemleak detector can reach the object you
> added. That actually (though sounding like a little of a hack) would work too.
> I say hack because the object you added is not an allocated object in the
> traditional sense :-D. But as you said, there is precendent for that.

Even in kmemleak.c we have create_object() for the data/bss sections.
It's just that the kmemleak_alloc() wrapper API implies some form of
slab allocation (though it's any allocation really, not just slab).

> Sounds good and thanks a lot Catalin! Unless you beat me to it, I'll send a
> patch out along those lines by the weekend and CC you with your suggested-by
> and the reported-by from the reporter ;-). Let me know if you have a preference.

I had a patch already but got distracted by a few (real) leaks reported
while testing it. Feel free to pick it up and change _ignore to
_not_leak if you find that more suitable. Well, it would be good for
Christoph to test it as I haven't managed to reproduce the false
positive.

----------------------8<--------------------------
From b25350cb6f8a906a6164b625bfd57021190cb105 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 6 Sep 2023 17:52:45 +0100
Subject: [PATCH] rcu: kmemleak: Ignore kmemleak false positives when
 RCU-freeing objects

Since the actual slab freeing is deferred when calling kvfree_rcu(), so
is the kmemleak_free() callback informing kmemleak of the object
deletion. From the perspective of the kvfree_rcu() caller, the object is
freed and it may remove any references to it. Since kmemleak does not
scan the tree RCU internal data storing the pointer, it will report such
objects as leaks during the grace period.

Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
that the tiny RCU implementation does not have such issue since the
objects can be tracked from the rcu_ctrlblk structure.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
---
 kernel/rcu/tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb1caefa8bd0..2ac39b5705df 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -31,6 +31,7 @@
 #include <linux/bitops.h>
 #include <linux/export.h>
 #include <linux/completion.h>
+#include <linux/kmemleak.h>
 #include <linux/moduleparam.h>
 #include <linux/panic.h>
 #include <linux/panic_notifier.h>
@@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
 		success = true;
 	}
 
+	/*
+	 * The kvfree_rcu() caller considers the pointer freed at this point
+	 * and likely removes any references to it. Since the the actual slab
+	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
+	 * this object (no scanning or false positives reporting).
+	 */
+	kmemleak_ignore(ptr);
+
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
 		schedule_delayed_monitor_work(krcp);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-05 21:22   ` Christoph Paasch
@ 2023-09-06 17:21     ` Catalin Marinas
  2023-09-10 23:10       ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2023-09-06 17:21 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Andrew Morton, linux-mm, MPTCP Upstream, rcu

On Tue, Sep 05, 2023 at 02:22:13PM -0700, Christoph Paasch wrote:
> On Sep 4, 2023, at 2:22 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Not sure which RCU variant you are using but most likely the false
> > positives are caused by the original reference to the object being lost
> > and the pointer added to a new location that kmemleak does not track
> > (e.g. bnode->records[] in the tree-based variant).
> > 
> > A quick attempt (untested, not even compiled):
> 
> I tried out your patch. It does resolve the false positive !
> 
> However, I am occasionally getting a report of a single object being
> leaked. When I try to visualize it with `cat
> /sys/kernel/debug/kmemleak`, the object does not show up anymore…

How often do you trigger the scanning? Since kmemleak does not stop the
world (as some garbage collectors do), there's potential for false
positives (e.g. a reference to it is in a register on some CPU while
being moved from one list to another). The heuristics employed for this
is to checksum the object and only report if the checksum has not
changed in successive scans. But this is still problematic if scanning
is done quickly in succession. The default 10min scanning (even 1min)
shouldn't be an issue.

I had a plan to do a "stop_scan" option (using stop_machine) but never
got the time to do this.

> If you have an updated patch, let me know. I can test it.

I sent it in reply to Joel.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 17:15         ` Catalin Marinas
@ 2023-09-06 19:11           ` Paul E. McKenney
  2023-09-06 21:37             ` Catalin Marinas
  2023-09-12 13:15           ` Matthieu Baerts
  1 sibling, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2023-09-06 19:11 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joel Fernandes, Christoph Paasch, Andrew Morton, linux-mm,
	MPTCP Upstream, rcu

On Wed, Sep 06, 2023 at 06:15:49PM +0100, Catalin Marinas wrote:
> On Wed, Sep 06, 2023 at 02:35:29PM +0000, Joel Fernandes wrote:
> > On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> > > On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > > > The correct fix then should probably be to mark the object as
> > > > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > > > to not be reported but still be scanned until eventually the lower layers
> > > > will remove the object from kmemleak-tracking after the grace period. Per the
> > > > docs also, that API is used to prevent false-positives.
> > > 
> > > This should work as well but I'd use kmemleak_ignore() instead of
> > > kmemleak_not_leak(). The former, apart from masking the false positive,
> > > also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> > > object shouldn't have any valid references to other objects, so not
> > > worth scanning.
> > 
> > Yes I am also OK with that, however to me I consider the object as alive as
> > long as the grace period does not end. But I agree with you and it may not be
> > worth tracking them or scanning them.
> 
> I guess from an RCU perspective, the object is still alive. From the
> kvfree_rcu() caller perspective though, it can disappear at any point
> after the grace period, so it shouldn't rely on its content being valid
> and referencing other objects (other than transiently e.g. in RCU list
> traversal).
> 
> It probably only matters if we have some very long grace periods (I'm
> not up to date with the recent RCU developments). In such cases, the
> object still being scanned could introduce false negatives. That's my
> reasoning for suggesting kmemleak_ignore() rather than
> kmemleak_not_leak().

Very long RCU readers still result in very long RCU grace periods.  And,
after some tens of seconds, RCU CPU stall warnings.  So don't let your
RCU readers run for that long.  But you knew that already.  ;-)

> > > > Instead what the below diff appears to do is to mark the bnode cache as a
> > > > kmemleak object itself, which smells a bit wrong. The bnode is not an
> > > > allocated object in the traditional sense, it is simple an internal data
> > > > structure.
> > > 
> > > We do this in other places for objects containing pointers and which
> > > aren't tracked by kmemleak (it doesn't track page allocations as there
> > > would be too many leading to lots of false positives or overlapping
> > > objects - the slab itself uses the page allocator). An example where we
> > > do something similar is alloc_large_system_hash(). We could add a
> > > wrapper API if kmemleak_alloc() feels wrong but underlying in kmemleak
> > > it would use the same mechanism for recording and scanning the bnode.
> > 
> > Ah I see what you did, you basically made the bnode as a kmemleak object in
> > its own right, and perhaps the kmemleak detector can reach the object you
> > added. That actually (though sounding like a little of a hack) would work too.
> > I say hack because the object you added is not an allocated object in the
> > traditional sense :-D. But as you said, there is precendent for that.
> 
> Even in kmemleak.c we have create_object() for the data/bss sections.
> It's just that the kmemleak_alloc() wrapper API implies some form of
> slab allocation (though it's any allocation really, not just slab).
> 
> > Sounds good and thanks a lot Catalin! Unless you beat me to it, I'll send a
> > patch out along those lines by the weekend and CC you with your suggested-by
> > and the reported-by from the reporter ;-). Let me know if you have a preference.
> 
> I had a patch already but got distracted by a few (real) leaks reported
> while testing it. Feel free to pick it up and change _ignore to
> _not_leak if you find that more suitable. Well, it would be good for
> Christoph to test it as I haven't managed to reproduce the false
> positive.
> 
> ----------------------8<--------------------------
> >From b25350cb6f8a906a6164b625bfd57021190cb105 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Wed, 6 Sep 2023 17:52:45 +0100
> Subject: [PATCH] rcu: kmemleak: Ignore kmemleak false positives when
>  RCU-freeing objects
> 
> Since the actual slab freeing is deferred when calling kvfree_rcu(), so
> is the kmemleak_free() callback informing kmemleak of the object
> deletion. From the perspective of the kvfree_rcu() caller, the object is
> freed and it may remove any references to it. Since kmemleak does not
> scan the tree RCU internal data storing the pointer, it will report such
> objects as leaks during the grace period.
> 
> Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
> that the tiny RCU implementation does not have such issue since the
> objects can be tracked from the rcu_ctrlblk structure.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> ---
>  kernel/rcu/tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cb1caefa8bd0..2ac39b5705df 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -31,6 +31,7 @@
>  #include <linux/bitops.h>
>  #include <linux/export.h>
>  #include <linux/completion.h>
> +#include <linux/kmemleak.h>
>  #include <linux/moduleparam.h>
>  #include <linux/panic.h>
>  #include <linux/panic_notifier.h>
> @@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
>  		success = true;
>  	}
>  
> +	/*
> +	 * The kvfree_rcu() caller considers the pointer freed at this point
> +	 * and likely removes any references to it. Since the the actual slab
> +	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
> +	 * this object (no scanning or false positives reporting).
> +	 */
> +	kmemleak_ignore(ptr);

Do we want to un-ignore it at the end of the grace period?  Or will that
happen automatically when it is passed to kfree()?  (My guess is that
the answer to both questions is "yes", but I figured that I should ask.)

							Thanx, Paul

> +
>  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
>  		schedule_delayed_monitor_work(krcp);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 19:11           ` Paul E. McKenney
@ 2023-09-06 21:37             ` Catalin Marinas
  2023-09-06 22:02               ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2023-09-06 21:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Christoph Paasch, Andrew Morton, linux-mm,
	MPTCP Upstream, rcu

On Wed, Sep 06, 2023 at 12:11:12PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 06, 2023 at 06:15:49PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 06, 2023 at 02:35:29PM +0000, Joel Fernandes wrote:
> > > On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> > > > On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > > > > The correct fix then should probably be to mark the object as
> > > > > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > > > > to not be reported but still be scanned until eventually the lower layers
> > > > > will remove the object from kmemleak-tracking after the grace period. Per the
> > > > > docs also, that API is used to prevent false-positives.
> > > > 
> > > > This should work as well but I'd use kmemleak_ignore() instead of
> > > > kmemleak_not_leak(). The former, apart from masking the false positive,
> > > > also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> > > > object shouldn't have any valid references to other objects, so not
> > > > worth scanning.
> > > 
> > > Yes I am also OK with that, however to me I consider the object as alive as
> > > long as the grace period does not end. But I agree with you and it may not be
> > > worth tracking them or scanning them.
> > 
> > I guess from an RCU perspective, the object is still alive. From the
> > kvfree_rcu() caller perspective though, it can disappear at any point
> > after the grace period, so it shouldn't rely on its content being valid
> > and referencing other objects (other than transiently e.g. in RCU list
> > traversal).
> > 
> > It probably only matters if we have some very long grace periods (I'm
> > not up to date with the recent RCU developments). In such cases, the
> > object still being scanned could introduce false negatives. That's my
> > reasoning for suggesting kmemleak_ignore() rather than
> > kmemleak_not_leak().
> 
> Very long RCU readers still result in very long RCU grace periods.  And,
> after some tens of seconds, RCU CPU stall warnings.  So don't let your
> RCU readers run for that long.  But you knew that already.  ;-)

That's still ok. I was more thinking of deferred freeing well past the
RCU readers completing.

> > @@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> >  		success = true;
> >  	}
> >  
> > +	/*
> > +	 * The kvfree_rcu() caller considers the pointer freed at this point
> > +	 * and likely removes any references to it. Since the the actual slab
> > +	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
> > +	 * this object (no scanning or false positives reporting).
> > +	 */
> > +	kmemleak_ignore(ptr);
> 
> Do we want to un-ignore it at the end of the grace period?  Or will that
> happen automatically when it is passed to kfree()?  (My guess is that
> the answer to both questions is "yes", but I figured that I should ask.)

No need to un-ignore. This function only tells kmemleak it's not a
leak and doesn't have any interesting data to scan. Kmemleak still keeps
track of it until properly freed.

-- 
Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 21:37             ` Catalin Marinas
@ 2023-09-06 22:02               ` Paul E. McKenney
  2023-09-06 23:10                 ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2023-09-06 22:02 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Joel Fernandes, Christoph Paasch, Andrew Morton, linux-mm,
	MPTCP Upstream, rcu, urezki

On Wed, Sep 06, 2023 at 10:37:40PM +0100, Catalin Marinas wrote:
> On Wed, Sep 06, 2023 at 12:11:12PM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 06, 2023 at 06:15:49PM +0100, Catalin Marinas wrote:
> > > On Wed, Sep 06, 2023 at 02:35:29PM +0000, Joel Fernandes wrote:
> > > > On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> > > > > On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > > > > > The correct fix then should probably be to mark the object as
> > > > > > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > > > > > to not be reported but still be scanned until eventually the lower layers
> > > > > > will remove the object from kmemleak-tracking after the grace period. Per the
> > > > > > docs also, that API is used to prevent false-positives.
> > > > > 
> > > > > This should work as well but I'd use kmemleak_ignore() instead of
> > > > > kmemleak_not_leak(). The former, apart from masking the false positive,
> > > > > also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> > > > > object shouldn't have any valid references to other objects, so not
> > > > > worth scanning.
> > > > 
> > > > Yes I am also OK with that, however to me I consider the object as alive as
> > > > long as the grace period does not end. But I agree with you and it may not be
> > > > worth tracking them or scanning them.
> > > 
> > > I guess from an RCU perspective, the object is still alive. From the
> > > kvfree_rcu() caller perspective though, it can disappear at any point
> > > after the grace period, so it shouldn't rely on its content being valid
> > > and referencing other objects (other than transiently e.g. in RCU list
> > > traversal).
> > > 
> > > It probably only matters if we have some very long grace periods (I'm
> > > not up to date with the recent RCU developments). In such cases, the
> > > object still being scanned could introduce false negatives. That's my
> > > reasoning for suggesting kmemleak_ignore() rather than
> > > kmemleak_not_leak().
> > 
> > Very long RCU readers still result in very long RCU grace periods.  And,
> > after some tens of seconds, RCU CPU stall warnings.  So don't let your
> > RCU readers run for that long.  But you knew that already.  ;-)
> 
> That's still ok. I was more thinking of deferred freeing well past the
> RCU readers completing.

Ah, that can happen.  Some kernels are built with CONFIG_RCU_LAZY=y, which
delays freeing in order to reduce power consumption.  And kfree_rcu()
will also delay for a bit.  But in both cases, a flood of callbacks
should get things going again.

But an isolated kfree_rcu() might well see a few seconds delay.
Saving your battery!  ;-)

> > > @@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> > >  		success = true;
> > >  	}
> > >  
> > > +	/*
> > > +	 * The kvfree_rcu() caller considers the pointer freed at this point
> > > +	 * and likely removes any references to it. Since the the actual slab
> > > +	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
> > > +	 * this object (no scanning or false positives reporting).
> > > +	 */
> > > +	kmemleak_ignore(ptr);
> > 
> > Do we want to un-ignore it at the end of the grace period?  Or will that
> > happen automatically when it is passed to kfree()?  (My guess is that
> > the answer to both questions is "yes", but I figured that I should ask.)
> 
> No need to un-ignore. This function only tells kmemleak it's not a
> leak and doesn't have any interesting data to scan. Kmemleak still keeps
> track of it until properly freed.

Got it, thank you!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 22:02               ` Paul E. McKenney
@ 2023-09-06 23:10                 ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2023-09-06 23:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Catalin Marinas, Christoph Paasch, Andrew Morton, linux-mm,
	MPTCP Upstream, rcu, urezki

On Wed, Sep 06, 2023 at 03:02:45PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 06, 2023 at 10:37:40PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 06, 2023 at 12:11:12PM -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 06, 2023 at 06:15:49PM +0100, Catalin Marinas wrote:
> > > > On Wed, Sep 06, 2023 at 02:35:29PM +0000, Joel Fernandes wrote:
> > > > > On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> > > > > > On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > > > > > > The correct fix then should probably be to mark the object as
> > > > > > > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > > > > > > to not be reported but still be scanned until eventually the lower layers
> > > > > > > will remove the object from kmemleak-tracking after the grace period. Per the
> > > > > > > docs also, that API is used to prevent false-positives.
> > > > > > 
> > > > > > This should work as well but I'd use kmemleak_ignore() instead of
> > > > > > kmemleak_not_leak(). The former, apart from masking the false positive,
> > > > > > also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> > > > > > object shouldn't have any valid references to other objects, so not
> > > > > > worth scanning.
> > > > > 
> > > > > Yes I am also OK with that, however to me I consider the object as alive as
> > > > > long as the grace period does not end. But I agree with you and it may not be
> > > > > worth tracking them or scanning them.
> > > > 
> > > > I guess from an RCU perspective, the object is still alive. From the
> > > > kvfree_rcu() caller perspective though, it can disappear at any point
> > > > after the grace period, so it shouldn't rely on its content being valid
> > > > and referencing other objects (other than transiently e.g. in RCU list
> > > > traversal).
> > > > 
> > > > It probably only matters if we have some very long grace periods (I'm
> > > > not up to date with the recent RCU developments). In such cases, the
> > > > object still being scanned could introduce false negatives. That's my
> > > > reasoning for suggesting kmemleak_ignore() rather than
> > > > kmemleak_not_leak().
> > > 
> > > Very long RCU readers still result in very long RCU grace periods.  And,
> > > after some tens of seconds, RCU CPU stall warnings.  So don't let your
> > > RCU readers run for that long.  But you knew that already.  ;-)
> > 
> > That's still ok. I was more thinking of deferred freeing well past the
> > RCU readers completing.
> 
> Ah, that can happen.  Some kernels are built with CONFIG_RCU_LAZY=y, which
> delays freeing in order to reduce power consumption.  And kfree_rcu()
> will also delay for a bit.  But in both cases, a flood of callbacks
> should get things going again.
> 
> But an isolated kfree_rcu() might well see a few seconds delay.
> Saving your battery!  ;-)

I agree with both of you. I also think kmemleak_ignore() is the right thing
to do for kfree_rcu().

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 17:21     ` Catalin Marinas
@ 2023-09-10 23:10       ` Joel Fernandes
  2023-09-11 17:41         ` Christoph Paasch
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-10 23:10 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hi Christoph,

On Wed, Sep 6, 2023 at 1:21 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, Sep 05, 2023 at 02:22:13PM -0700, Christoph Paasch wrote:
[..]
> > If you have an updated patch, let me know. I can test it.
>
> I sent it in reply to Joel.

Just checking if you got a chance to try Catalin's patch?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-10 23:10       ` Joel Fernandes
@ 2023-09-11 17:41         ` Christoph Paasch
  2023-09-12  9:52           ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Paasch @ 2023-09-11 17:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Catalin Marinas, Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hello,

> On Sep 10, 2023, at 4:10 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> Hi Christoph,
> 
> On Wed, Sep 6, 2023 at 1:21 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>> 
>> On Tue, Sep 05, 2023 at 02:22:13PM -0700, Christoph Paasch wrote:
> [..]
>>> If you have an updated patch, let me know. I can test it.
>> 
>> I sent it in reply to Joel.
> 
> Just checking if you got a chance to try Catalin's patch?

sorry for the delay! Yes, I tested it and it works!

 Thanks,
Christoph

> 
> thanks,
> 
> - Joel
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-11 17:41         ` Christoph Paasch
@ 2023-09-12  9:52           ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2023-09-12  9:52 UTC (permalink / raw)
  To: Christoph Paasch
  Cc: Joel Fernandes, Andrew Morton, linux-mm, MPTCP Upstream, rcu

On Mon, Sep 11, 2023 at 10:41:15AM -0700, Christoph Paasch wrote:
> On Sep 10, 2023, at 4:10 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Wed, Sep 6, 2023 at 1:21 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On Tue, Sep 05, 2023 at 02:22:13PM -0700, Christoph Paasch wrote:
> > [..]
> >>> If you have an updated patch, let me know. I can test it.
> >> 
> >> I sent it in reply to Joel.
> > 
> > Just checking if you got a chance to try Catalin's patch?
> 
> sorry for the delay! Yes, I tested it and it works!

Great, thanks!

-- 
Catalin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-06 17:15         ` Catalin Marinas
  2023-09-06 19:11           ` Paul E. McKenney
@ 2023-09-12 13:15           ` Matthieu Baerts
  2023-09-12 13:32             ` Joel Fernandes
  1 sibling, 1 reply; 17+ messages in thread
From: Matthieu Baerts @ 2023-09-12 13:15 UTC (permalink / raw)
  To: Catalin Marinas, Joel Fernandes
  Cc: Christoph Paasch, Andrew Morton, linux-mm, MPTCP Upstream, rcu

Hi Catalin,

On 06/09/2023 19:15, Catalin Marinas wrote:

(...)

> I had a patch already but got distracted by a few (real) leaks reported
> while testing it. Feel free to pick it up and change _ignore to
> _not_leak if you find that more suitable. Well, it would be good for
> Christoph to test it as I haven't managed to reproduce the false
> positive.

Thank you for the patch!

> ----------------------8<--------------------------
> From b25350cb6f8a906a6164b625bfd57021190cb105 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Wed, 6 Sep 2023 17:52:45 +0100
> Subject: [PATCH] rcu: kmemleak: Ignore kmemleak false positives when
>  RCU-freeing objects
> 
> Since the actual slab freeing is deferred when calling kvfree_rcu(), so
> is the kmemleak_free() callback informing kmemleak of the object
> deletion. From the perspective of the kvfree_rcu() caller, the object is
> freed and it may remove any references to it. Since kmemleak does not
> scan the tree RCU internal data storing the pointer, it will report such
> objects as leaks during the grace period.
> 
> Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
> that the tiny RCU implementation does not have such issue since the
> objects can be tracked from the rcu_ctrlblk structure.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> ---
>  kernel/rcu/tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cb1caefa8bd0..2ac39b5705df 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -31,6 +31,7 @@
>  #include <linux/bitops.h>
>  #include <linux/export.h>
>  #include <linux/completion.h>
> +#include <linux/kmemleak.h>
>  #include <linux/moduleparam.h>
>  #include <linux/panic.h>
>  #include <linux/panic_notifier.h>
> @@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
>  		success = true;
>  	}
>  
> +	/*
> +	 * The kvfree_rcu() caller considers the pointer freed at this point
> +	 * and likely removes any references to it. Since the the actual slab

Just in case you didn't send this patch, checkpatch.pl noticed that the
word "the" was repeated in the comment here above ("Since the the actual").

Cheers,
Matt

> +	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
> +	 * this object (no scanning or false positives reporting).
> +	 */
> +	kmemleak_ignore(ptr);
> +
>  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
>  		schedule_delayed_monitor_work(krcp);
> 

-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kmemleak handling of kfree_rcu
  2023-09-12 13:15           ` Matthieu Baerts
@ 2023-09-12 13:32             ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2023-09-12 13:32 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Catalin Marinas, Christoph Paasch, Andrew Morton, linux-mm,
	MPTCP Upstream, rcu



> On Sep 12, 2023, at 9:15 AM, Matthieu Baerts <matthieu.baerts@tessares.net> wrote:
> Hi Catalin,
> 
> On 06/09/2023 19:15, Catalin Marinas wrote:
> 
> (...)
> 
>> I had a patch already but got distracted by a few (real) leaks reported
>> while testing it. Feel free to pick it up and change _ignore to
>> _not_leak if you find that more suitable. Well, it would be good for
>> Christoph to test it as I haven't managed to reproduce the false
>> positive.
> 
> Thank you for the patch!
> 
>> ----------------------8<--------------------------
>> From b25350cb6f8a906a6164b625bfd57021190cb105 Mon Sep 17 00:00:00 2001
>> From: Catalin Marinas <catalin.marinas@arm.com>
>> Date: Wed, 6 Sep 2023 17:52:45 +0100
>> Subject: [PATCH] rcu: kmemleak: Ignore kmemleak false positives when
>> RCU-freeing objects
>> 
>> Since the actual slab freeing is deferred when calling kvfree_rcu(), so
>> is the kmemleak_free() callback informing kmemleak of the object
>> deletion. From the perspective of the kvfree_rcu() caller, the object is
>> freed and it may remove any references to it. Since kmemleak does not
>> scan the tree RCU internal data storing the pointer, it will report such
>> objects as leaks during the grace period.
>> 
>> Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
>> that the tiny RCU implementation does not have such issue since the
>> objects can be tracked from the rcu_ctrlblk structure.
>> 
>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>> Reported-by: Christoph Paasch <cpaasch@apple.com>
>> ---
>> kernel/rcu/tree.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>> 
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index cb1caefa8bd0..2ac39b5705df 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -31,6 +31,7 @@
>> #include <linux/bitops.h>
>> #include <linux/export.h>
>> #include <linux/completion.h>
>> +#include <linux/kmemleak.h>
>> #include <linux/moduleparam.h>
>> #include <linux/panic.h>
>> #include <linux/panic_notifier.h>
>> @@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
>>        success = true;
>>    }
>> 
>> +    /*
>> +     * The kvfree_rcu() caller considers the pointer freed at this point
>> +     * and likely removes any references to it. Since the the actual slab
> 
> Just in case you didn't send this patch, checkpatch.pl noticed that the
> word "the" was repeated in the comment here above ("Since the the actual").

Sounds like you detected a leak in the word the. :-)

(Sorry could not resist).

 - Joel


> 
> Cheers,
> Matt
> 
>> +     * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
>> +     * this object (no scanning or false positives reporting).
>> +     */
>> +    kmemleak_ignore(ptr);
>> +
>>    // Set timer to drain after KFREE_DRAIN_JIFFIES.
>>    if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
>>        schedule_delayed_monitor_work(krcp);
> 
> -- 
> Tessares | Belgium | Hybrid Access Solutions
> www.tessares.net


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-09-12 13:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30 16:37 kmemleak handling of kfree_rcu Christoph Paasch
2023-09-04 21:22 ` Catalin Marinas
2023-09-05 11:17   ` Joel Fernandes
2023-09-05 14:41     ` Catalin Marinas
2023-09-06 14:35       ` Joel Fernandes
2023-09-06 17:15         ` Catalin Marinas
2023-09-06 19:11           ` Paul E. McKenney
2023-09-06 21:37             ` Catalin Marinas
2023-09-06 22:02               ` Paul E. McKenney
2023-09-06 23:10                 ` Joel Fernandes
2023-09-12 13:15           ` Matthieu Baerts
2023-09-12 13:32             ` Joel Fernandes
2023-09-05 21:22   ` Christoph Paasch
2023-09-06 17:21     ` Catalin Marinas
2023-09-10 23:10       ` Joel Fernandes
2023-09-11 17:41         ` Christoph Paasch
2023-09-12  9:52           ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox