linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC net-next/mm V1 0/3] page_pool: new approach for leak detection and shutdown phase
@ 2023-04-25 17:15 Jesper Dangaard Brouer
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-25 17:15 UTC (permalink / raw)
  To: Ilias Apalodimas, netdev, Eric Dumazet, linux-mm, Mel Gorman
  Cc: Jesper Dangaard Brouer, lorenzo,
	Toke Høiland-Jørgensen, linyunsheng, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

Patchset change summary:
 - Remove PP workqueue and inflight warnings, instead rely on inflight
   pages to trigger cleanup
 - Moves leak detection to the MM-layer page allocator when combined
   with CONFIG_DEBUG_VM.

The page_pool (PP) workqueue calling page_pool_release_retry generate
too many false-positive reports. Further more, these reports of
page_pool shutdown still having inflight packets are not very helpful
to track down the root-cause.

In the past these reports have helped us catch driver bugs, that
leaked pages by invoking put_page directly, often in code paths
handling error cases. PP pages had a shorter lifespan (within driver
and XDP code paths). Since PP pages got a recycle return path for
SKBs, the lifespan for a PP page can be much longer. Thus, it is time
to revisit periodic release retry mechanism. The default 60 sec
lifespan assumption is obviously wrong/obsolete, as things like TCP
sockets can keep SKBs around for much longer (e.g. retransmits,
timeouts, NAPI defer schemes etc).

The inflight reports, means one of two things: (1) API user is still
holding on, or (2) page got leaked and will never be returned to PP.
The PP need to accept it have no control over (1) how long outstanding
PP pages are kept by the API users. What we really want to is to catch
are(2) pages that "leak". Meaning they didn't get proper returned via
PP APIs.

Leaked PP pages result in these issues: (A) We can never release
page_pool memory structs, which (B) holds on to a refcnt on struct
device for DMA mapping, and (C) leaking DMA-mappings that (D) means a
hardware device can potentially write into a page returned to the page
allocator.

---

Jesper Dangaard Brouer (3):
      page_pool: Remove workqueue in new shutdown scheme
      page_pool: Use static_key for shutdown phase
      mm/page_pool: catch page_pool memory leaks


 include/net/page_pool.h |  9 +++---
 mm/page_alloc.c         |  7 +++++
 net/core/page_pool.c    | 61 +++++++++++++++++++++--------------------
 3 files changed, 42 insertions(+), 35 deletions(-)

--



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme
  2023-04-25 17:15 [PATCH RFC net-next/mm V1 0/3] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
@ 2023-04-25 17:15 ` Jesper Dangaard Brouer
  2023-04-27  0:57   ` Yunsheng Lin
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase Jesper Dangaard Brouer
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 3/3] mm/page_pool: catch page_pool memory leaks Jesper Dangaard Brouer
  2 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-25 17:15 UTC (permalink / raw)
  To: Ilias Apalodimas, netdev, Eric Dumazet, linux-mm, Mel Gorman
  Cc: Jesper Dangaard Brouer, lorenzo,
	Toke Høiland-Jørgensen, linyunsheng, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

This removes the workqueue scheme that periodically tests when
inflight reach zero such that page_pool memory can be freed.

This change adds code to fast-path free checking for a shutdown flags
bit after returning PP pages.

Performance is very important for PP, as the fast path is used for
XDP_DROP use-cases where NIC drivers recycle PP pages directly into PP
alloc cache.

The goal were that this code change should have zero impact on this
fast-path. The slight code reorg of likely() are deliberate. Micro
benchmarking done via kernel module[1] on x86_64, shows this code
change only cost a single instruction extra (approx 0.3 nanosec on CPU
E5-1650 @3.60GHz).

It is possible to make this code zero impact via static_key, but that
change is split out into the next patch, as we are unsure if it is
worth the complexity.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/page_pool.h |    9 +++----
 net/core/page_pool.c    |   59 +++++++++++++++++++----------------------------
 2 files changed, 28 insertions(+), 40 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c8ec2f34722b..a71c0f2695b0 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -50,6 +50,9 @@
 				 PP_FLAG_DMA_SYNC_DEV |\
 				 PP_FLAG_PAGE_FRAG)
 
+/* Internal flag: PP in shutdown phase, waiting for inflight pages */
+#define PP_FLAG_SHUTDOWN	BIT(8)
+
 /*
  * Fast allocation side cache array/stack
  *
@@ -151,11 +154,6 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
 struct page_pool {
 	struct page_pool_params p;
 
-	struct delayed_work release_dw;
-	void (*disconnect)(void *);
-	unsigned long defer_start;
-	unsigned long defer_warn;
-
 	u32 pages_state_hold_cnt;
 	unsigned int frag_offset;
 	struct page *frag_page;
@@ -165,6 +163,7 @@ struct page_pool {
 	/* these stats are incremented while in softirq context */
 	struct page_pool_alloc_stats alloc_stats;
 #endif
+	void (*disconnect)(void *);
 	u32 xdp_mem_id;
 
 	/*
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e212e9d7edcb..ce7e8dda6403 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -23,9 +23,6 @@
 
 #include <trace/events/page_pool.h>
 
-#define DEFER_TIME (msecs_to_jiffies(1000))
-#define DEFER_WARN_INTERVAL (60 * HZ)
-
 #define BIAS_MAX	LONG_MAX
 
 #ifdef CONFIG_PAGE_POOL_STATS
@@ -380,6 +377,10 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	struct page *page;
 	int i, nr_pages;
 
+	/* API usage BUG: PP in shutdown phase, cannot alloc new pages */
+	if (WARN_ON(pool->p.flags & PP_FLAG_SHUTDOWN))
+		return NULL;
+
 	/* Don't support bulk alloc for high-order pages */
 	if (unlikely(pp_order))
 		return __page_pool_alloc_page_order(pool, gfp);
@@ -489,10 +490,6 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
 	page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
 	page_pool_clear_pp_info(page);
-
-	/* This may be the last page returned, releasing the pool, so
-	 * it is not safe to reference pool afterwards.
-	 */
 	count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
 	trace_page_pool_state_release(pool, page, count);
 }
@@ -535,7 +532,7 @@ static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page)
 static bool page_pool_recycle_in_cache(struct page *page,
 				       struct page_pool *pool)
 {
-	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) {
+	if (pool->alloc.count == PP_ALLOC_CACHE_SIZE) {
 		recycle_stat_inc(pool, cache_full);
 		return false;
 	}
@@ -546,6 +543,8 @@ static bool page_pool_recycle_in_cache(struct page *page,
 	return true;
 }
 
+static void page_pool_shutdown_attempt(struct page_pool *pool);
+
 /* If the page refcnt == 1, this will try to recycle the page.
  * if PP_FLAG_DMA_SYNC_DEV is set, we'll try to sync the DMA area for
  * the configured size min(dma_sync_size, pool->max_len).
@@ -572,7 +571,8 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
 			page_pool_dma_sync_for_device(pool, page,
 						      dma_sync_size);
 
-		if (allow_direct && in_softirq() &&
+		/* During PP shutdown, no direct recycle must occur */
+		if (likely(allow_direct && in_softirq()) &&
 		    page_pool_recycle_in_cache(page, pool))
 			return NULL;
 
@@ -609,6 +609,8 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
 		recycle_stat_inc(pool, ring_full);
 		page_pool_return_page(pool, page);
 	}
+	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_defragged_page);
 
@@ -648,13 +650,17 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 
 	/* Hopefully all pages was return into ptr_ring */
 	if (likely(i == bulk_len))
-		return;
+		goto out;
 
 	/* ptr_ring cache full, free remaining pages outside producer lock
 	 * since put_page() with refcnt == 1 can be an expensive operation
 	 */
 	for (; i < bulk_len; i++)
 		page_pool_return_page(pool, data[i]);
+
+out:
+	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);
 
@@ -808,27 +814,10 @@ static int page_pool_release(struct page_pool *pool)
 	return inflight;
 }
 
-static void page_pool_release_retry(struct work_struct *wq)
+noinline
+static void page_pool_shutdown_attempt(struct page_pool *pool)
 {
-	struct delayed_work *dwq = to_delayed_work(wq);
-	struct page_pool *pool = container_of(dwq, typeof(*pool), release_dw);
-	int inflight;
-
-	inflight = page_pool_release(pool);
-	if (!inflight)
-		return;
-
-	/* Periodic warning */
-	if (time_after_eq(jiffies, pool->defer_warn)) {
-		int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
-
-		pr_warn("%s() stalled pool shutdown %d inflight %d sec\n",
-			__func__, inflight, sec);
-		pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
-	}
-
-	/* Still not ready to be disconnected, retry later */
-	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
+	page_pool_release(pool);
 }
 
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
@@ -868,11 +857,11 @@ void page_pool_destroy(struct page_pool *pool)
 	if (!page_pool_release(pool))
 		return;
 
-	pool->defer_start = jiffies;
-	pool->defer_warn  = jiffies + DEFER_WARN_INTERVAL;
-
-	INIT_DELAYED_WORK(&pool->release_dw, page_pool_release_retry);
-	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
+	/* PP have pages inflight, thus cannot immediately release memory.
+	 * Enter into shutdown phase, and retry release to handle races.
+	 */
+	pool->p.flags |= PP_FLAG_SHUTDOWN;
+	page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_destroy);
 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase
  2023-04-25 17:15 [PATCH RFC net-next/mm V1 0/3] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
@ 2023-04-25 17:15 ` Jesper Dangaard Brouer
  2023-04-25 23:30   ` Alexei Starovoitov
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 3/3] mm/page_pool: catch page_pool memory leaks Jesper Dangaard Brouer
  2 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-25 17:15 UTC (permalink / raw)
  To: Ilias Apalodimas, netdev, Eric Dumazet, linux-mm, Mel Gorman
  Cc: Jesper Dangaard Brouer, lorenzo,
	Toke Høiland-Jørgensen, linyunsheng, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

Performance is very important for page pool (PP). This add the use of
static_key APIs for regaining a single instruction, which makes the
new PP shutdown scheme zero impact.

We are uncertain if this is 100% correct, because static_key APIs uses
a mutex lock and it is uncertain if all contexts that can return pages
can support this. We could spawn a workqueue (like we just removed) to
workaround this issue.

Seeking input if this is worth the complexity.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/page_pool.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ce7e8dda6403..3821d8874b15 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -25,6 +25,8 @@
 
 #define BIAS_MAX	LONG_MAX
 
+DEFINE_STATIC_KEY_FALSE(pp_shutdown_phase);
+
 #ifdef CONFIG_PAGE_POOL_STATS
 /* alloc_stat_inc is intended to be used in softirq context */
 #define alloc_stat_inc(pool, __stat)	(pool->alloc_stats.__stat++)
@@ -378,7 +380,8 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	int i, nr_pages;
 
 	/* API usage BUG: PP in shutdown phase, cannot alloc new pages */
-	if (WARN_ON(pool->p.flags & PP_FLAG_SHUTDOWN))
+	if (static_key_enabled(&pp_shutdown_phase) &&
+	    WARN_ON(pool->p.flags & PP_FLAG_SHUTDOWN))
 		return NULL;
 
 	/* Don't support bulk alloc for high-order pages */
@@ -609,7 +612,7 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
 		recycle_stat_inc(pool, ring_full);
 		page_pool_return_page(pool, page);
 	}
-	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+	if (static_branch_unlikely(&pp_shutdown_phase))
 		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_defragged_page);
@@ -659,7 +662,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 		page_pool_return_page(pool, data[i]);
 
 out:
-	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+	if (static_branch_unlikely(&pp_shutdown_phase))
 		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);
@@ -817,7 +820,15 @@ static int page_pool_release(struct page_pool *pool)
 noinline
 static void page_pool_shutdown_attempt(struct page_pool *pool)
 {
-	page_pool_release(pool);
+	int inflight;
+
+	if (!(pool->p.flags & PP_FLAG_SHUTDOWN))
+		return;
+
+	inflight = page_pool_release(pool);
+
+	if (static_key_enabled(&pp_shutdown_phase) && !inflight)
+		static_branch_dec(&pp_shutdown_phase);
 }
 
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
@@ -861,6 +872,7 @@ void page_pool_destroy(struct page_pool *pool)
 	 * Enter into shutdown phase, and retry release to handle races.
 	 */
 	pool->p.flags |= PP_FLAG_SHUTDOWN;
+	static_branch_inc(&pp_shutdown_phase);
 	page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_destroy);




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next/mm V1 3/3] mm/page_pool: catch page_pool memory leaks
  2023-04-25 17:15 [PATCH RFC net-next/mm V1 0/3] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase Jesper Dangaard Brouer
@ 2023-04-25 17:15 ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-25 17:15 UTC (permalink / raw)
  To: Ilias Apalodimas, netdev, Eric Dumazet, linux-mm, Mel Gorman
  Cc: Jesper Dangaard Brouer, lorenzo,
	Toke Høiland-Jørgensen, linyunsheng, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

Pages belonging to a page_pool (PP) instance must be freed through the
PP APIs in-order to correctly release any DMA mappings and release
refcnt on the DMA device when freeing PP instance. When PP release a
page (page_pool_release_page) the page->pp_magic value is cleared.

This patch detect a leaked PP page in free_page_is_bad() via
unexpected state of page->pp_magic value being PP_SIGNATURE.

We choose to report and treat it as a bad page. It would be possible
to release the page via returning it to the PP instance as the
page->pp pointer is likely still valid.

Notice this code is only activated when either compiled with
CONFIG_DEBUG_VM or boot cmdline debug_pagealloc=on, and
CONFIG_PAGE_POOL.

Reduced example output of leak with PP_SIGNATURE = dead000000000040:

 BUG: Bad page state in process swapper/0  pfn:110bbf
 page:000000005bc8cfb8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x110bbf000 pfn:0x110bbf
 flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
 raw: 002fffff80000000 dead000000000040 ffff888117255000 0000000000000000
 raw: 0000000110bbf000 000000000000003e 00000000ffffffff 0000000000000000
 page dumped because: page_pool leak
 [...]

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/page_alloc.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e39705c7bdc..137b72f8ab8b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1247,6 +1247,9 @@ static inline bool page_expected_state(struct page *page,
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
 			page->memcg_data |
+#endif
+#ifdef CONFIG_PAGE_POOL
+			((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1273,6 +1276,10 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 #ifdef CONFIG_MEMCG
 	if (unlikely(page->memcg_data))
 		bad_reason = "page still charged to cgroup";
+#endif
+#ifdef CONFIG_PAGE_POOL
+	if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
+		bad_reason = "page_pool leak";
 #endif
 	return bad_reason;
 }




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase Jesper Dangaard Brouer
@ 2023-04-25 23:30   ` Alexei Starovoitov
  2023-04-26 15:15     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2023-04-25 23:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Ilias Apalodimas, netdev, Eric Dumazet, linux-mm, Mel Gorman,
	lorenzo, Toke Høiland-Jørgensen, linyunsheng, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
> Performance is very important for page pool (PP). This add the use of
> static_key APIs for regaining a single instruction, which makes the
> new PP shutdown scheme zero impact.
> 
> We are uncertain if this is 100% correct, because static_key APIs uses
> a mutex lock and it is uncertain if all contexts that can return pages
> can support this. We could spawn a workqueue (like we just removed) to
> workaround this issue.

With debug atomic sleep the issue should be trivial to see.
iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
which is not sleepable and mutex-es should warn.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase
  2023-04-25 23:30   ` Alexei Starovoitov
@ 2023-04-26 15:15     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-26 15:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: brouer, Ilias Apalodimas, netdev, Eric Dumazet, linux-mm,
	Mel Gorman, lorenzo, Toke Høiland-Jørgensen,
	linyunsheng, bpf, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Andrew Morton, willy


On 26/04/2023 01.30, Alexei Starovoitov wrote:
> On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
>> Performance is very important for page pool (PP). This add the use of
>> static_key APIs for regaining a single instruction, which makes the
>> new PP shutdown scheme zero impact.
>>
>> We are uncertain if this is 100% correct, because static_key APIs uses
>> a mutex lock and it is uncertain if all contexts that can return pages
>> can support this. We could spawn a workqueue (like we just removed) to
>> workaround this issue.
> 
> With debug atomic sleep the issue should be trivial to see.
> iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
> which is not sleepable and mutex-es should warn.
> 

Sure, adding CONFIG_DEBUG_ATOMIC_SLEEP and (keeping) CONFIG_DEBUG_MUTEXES.

Testing this with veth changes that added page_pool to veth
commit 0ebab78cbcbf ("net: veth: add page_pool for page recycling").
which makes it easier to create/trigger inflight packets
via softnet_data->defer_list.

It confirms my suspicion, this patch is not correct.

The dmesg warning looks like below.
(The XXX printk is just from my own debugging patch)

The call site net_rx_action+0xe7 is skb_defer_free_flush
  $ ./scripts/faddr2line vmlinux net_rx_action+0xe7
  net_rx_action+0xe7/0x360:
  skb_defer_free_flush at net/core/dev.c:6615
  (inlined by) skb_defer_free_flush at net/core/dev.c:6601
  (inlined by) net_rx_action at net/core/dev.c:6677


[ 1540.073256] XXX page_pool_shutdown_attempt() inflight=24
[ 1540.078654] XXX page_pool_destroy() Enter into shutdown phase - 
inflight=24
[ 1540.094234] XXX page_pool_shutdown_attempt() inflight=23
[ 1540.099626] XXX page_pool_shutdown_attempt() inflight=22
[ 1540.105012] XXX page_pool_shutdown_attempt() inflight=21
[ 1540.110402] XXX page_pool_shutdown_attempt() inflight=20
[ 1540.115793] XXX page_pool_shutdown_attempt() inflight=19
[ 1540.121183] XXX page_pool_shutdown_attempt() inflight=18
[ 1540.126564] XXX page_pool_shutdown_attempt() inflight=17
[ 1540.131945] XXX page_pool_shutdown_attempt() inflight=16
[ 1540.137329] XXX page_pool_shutdown_attempt() inflight=15
[ 1540.142710] XXX page_pool_shutdown_attempt() inflight=14
[ 1540.148110] XXX page_pool_shutdown_attempt() inflight=13
[ 1540.153512] XXX page_pool_shutdown_attempt() inflight=12
[ 1540.312284] XXX page_pool_shutdown_attempt() inflight=11
[ 1540.317712] XXX page_pool_shutdown_attempt() inflight=10
[ 1540.323128] XXX page_pool_shutdown_attempt() inflight=9
[ 1540.328459] XXX page_pool_shutdown_attempt() inflight=8
[ 1540.333788] XXX page_pool_shutdown_attempt() inflight=7
[ 1540.339111] XXX page_pool_shutdown_attempt() inflight=6
[ 1540.344439] XXX page_pool_shutdown_attempt() inflight=5
[ 1540.349768] XXX page_pool_shutdown_attempt() inflight=4
[ 1540.355091] XXX page_pool_shutdown_attempt() inflight=3
[ 1540.360420] XXX page_pool_shutdown_attempt() inflight=2
[ 1540.365741] XXX page_pool_shutdown_attempt() inflight=1
[ 1540.371064] BUG: sleeping function called from invalid context at 
include/linux/percpu-rwsem.h:49
[ 1540.380052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 
26501, name: netperf
[ 1540.388171] preempt_count: 101, expected: 0
[ 1540.392452] RCU nest depth: 2, expected: 0
[ 1540.396640] CPU: 2 PID: 26501 Comm: netperf Not tainted 
6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.406147] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.413749] Call Trace:
[ 1540.416289]  <IRQ>
[ 1540.418395]  dump_stack_lvl+0x32/0x50
[ 1540.422160]  __might_resched+0x11c/0x160
[ 1540.426182]  cpus_read_lock+0x16/0x60
[ 1540.429942]  static_key_slow_dec+0x17/0x50
[ 1540.434137]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.438940]  page_pool_return_skb_page+0x68/0xe0
[ 1540.443654]  skb_free_head+0x4f/0x90
[ 1540.447326]  skb_release_data+0x142/0x1a0
[ 1540.451435]  napi_consume_skb+0x6b/0x180
[ 1540.455448]  net_rx_action+0xe7/0x360
[ 1540.459209]  __do_softirq+0xcb/0x2b1
[ 1540.462885]  do_softirq+0x63/0x90
[ 1540.466297]  </IRQ>
[ 1540.468487]  <TASK>
[ 1540.470671]  __local_bh_enable_ip+0x64/0x70
[ 1540.474948]  __dev_queue_xmit+0x257/0x8a0
[ 1540.479049]  ? __ip_local_out+0x48/0x160
[ 1540.483070]  ip_finish_output2+0x25c/0x540
[ 1540.487255]  __ip_queue_xmit+0x171/0x420
[ 1540.491266]  ? ___slab_alloc+0xd1/0x670
[ 1540.495202]  __tcp_transmit_skb+0x83c/0x950
[ 1540.499486]  tcp_write_xmit+0x373/0xa60
[ 1540.503419]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.508125]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.512408]  tcp_sendmsg+0x27/0x40
[ 1540.515907]  sock_sendmsg+0x8b/0xa0
[ 1540.519494]  ? sockfd_lookup_light+0x12/0x70
[ 1540.523856]  __sys_sendto+0xeb/0x130
[ 1540.527531]  ? __switch_to_asm+0x3a/0x80
[ 1540.531549]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.535736]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.539751]  __x64_sys_sendto+0x20/0x30
[ 1540.543686]  do_syscall_64+0x3a/0x90
[ 1540.547357]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.552502] RIP: 0033:0x7f389f713680
[ 1540.556174] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.575075] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.582755] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.589983] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1540.597208] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1540.604438] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1540.611664] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1540.618896]  </TASK>
[ 1540.621199] ------------[ cut here ]------------
[ 1540.625911] WARNING: CPU: 2 PID: 26501 at kernel/smp.c:912 
smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.635001] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4 
ib_umad rdma_ucm ib_ipoib rdma_cm iw_cm irdma ib_cm ice 
intel_uncore_frequency intel_uncore_frequency_common coretemp kvm_intel 
mlx5_ib kvm ib_uverbs irqbypass rapl intel_cstate ib_core i2c_i801 
intel_uncore pcspkr i2c_smbus bfq acpi_ipmi wmi ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad sch_fq_codel drm fuse zram zsmalloc mlx5_core 
ixgbe i40e igb igc sd_mod t10_pi mlxfw psample hid_generic 
crc64_rocksoft_generic i2c_algo_bit crc64_rocksoft i2c_core mdio ptp 
pps_core [last unloaded: bpfilter]
[ 1540.684653] CPU: 2 PID: 26501 Comm: netperf Tainted: G        W 
    6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.695642] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.703239] RIP: 0010:smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.709090] Code: fe ff ff 8b 44 24 14 48 c7 c6 a0 04 fc 82 8b 15 85 
f4 e6 01 48 8b 3c 24 8d 48 01 48 63 c9 e8 36 c5 4b 00 89 c2 e9 c5 fd ff 
ff <0f> 0b e9 86 fd ff ff 8b 7c 24 28 e8 4f b8 f7 ff eb af 66 66 2e 0f
[ 1540.727999] RSP: 0018:ffffc9000003cda8 EFLAGS: 00010206
[ 1540.733322] RAX: 0000000000000102 RBX: ffffffff8383b2ea RCX: 
0000000000000003
[ 1540.740552] RDX: 0000000000000000 RSI: ffffffff810271a0 RDI: 
ffffffff82fc04a0
[ 1540.747782] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000001
[ 1540.755008] R10: 000000087c9e9000 R11: ffffffff82c4f620 R12: 
0000000000000000
[ 1540.762241] R13: ffffffff8383b300 R14: 0000000000000002 R15: 
0000000000000001
[ 1540.769474] FS:  00007f389f954740(0000) GS:ffff88887fc80000(0000) 
knlGS:0000000000000000
[ 1540.777685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1540.783528] CR2: 00007f1e27582000 CR3: 000000015775c004 CR4: 
00000000003706e0
[ 1540.790761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 1540.797987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[ 1540.805220] Call Trace:
[ 1540.807763]  <IRQ>
[ 1540.809868]  ? __pfx_do_sync_core+0x10/0x10
[ 1540.814146]  on_each_cpu_cond_mask+0x20/0x40
[ 1540.818514]  text_poke_bp_batch+0x9e/0x250
[ 1540.822708]  text_poke_finish+0x1b/0x30
[ 1540.826644]  arch_jump_label_transform_apply+0x16/0x30
[ 1540.831880]  __static_key_slow_dec_cpuslocked.part.0+0x2f/0x40
[ 1540.837809]  static_key_slow_dec+0x30/0x50
[ 1540.841999]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.846803]  page_pool_return_skb_page+0x68/0xe0
[ 1540.851517]  skb_free_head+0x4f/0x90
[ 1540.855194]  skb_release_data+0x142/0x1a0
[ 1540.859300]  napi_consume_skb+0x6b/0x180
[ 1540.863320]  net_rx_action+0xe7/0x360
[ 1540.867082]  __do_softirq+0xcb/0x2b1
[ 1540.870756]  do_softirq+0x63/0x90
[ 1540.874171]  </IRQ>
[ 1540.876359]  <TASK>
[ 1540.878553]  __local_bh_enable_ip+0x64/0x70
[ 1540.882829]  __dev_queue_xmit+0x257/0x8a0
[ 1540.886938]  ? __ip_local_out+0x48/0x160
[ 1540.890959]  ip_finish_output2+0x25c/0x540
[ 1540.895155]  __ip_queue_xmit+0x171/0x420
[ 1540.899175]  ? ___slab_alloc+0xd1/0x670
[ 1540.903108]  __tcp_transmit_skb+0x83c/0x950
[ 1540.907393]  tcp_write_xmit+0x373/0xa60
[ 1540.911327]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.916041]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.920324]  tcp_sendmsg+0x27/0x40
[ 1540.923821]  sock_sendmsg+0x8b/0xa0
[ 1540.927411]  ? sockfd_lookup_light+0x12/0x70
[ 1540.931780]  __sys_sendto+0xeb/0x130
[ 1540.935456]  ? __switch_to_asm+0x3a/0x80
[ 1540.939477]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.943670]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.947693]  __x64_sys_sendto+0x20/0x30
[ 1540.951626]  do_syscall_64+0x3a/0x90
[ 1540.955300]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.960451] RIP: 0033:0x7f389f713680
[ 1540.964124] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.983035] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.990724] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.997957] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1541.005187] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1541.012422] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1541.019649] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1541.026881]  </TASK>
[ 1541.029166] ---[ end trace 0000000000000000 ]---
[ 1541.033925] XXX page_pool_shutdown_attempt() inflight=0






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme
  2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
@ 2023-04-27  0:57   ` Yunsheng Lin
  2023-04-27 10:47     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2023-04-27  0:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Ilias Apalodimas, netdev, Eric Dumazet,
	linux-mm, Mel Gorman
  Cc: lorenzo, Toke Høiland-Jørgensen, bpf, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Andrew Morton, willy

On 2023/4/26 1:15, Jesper Dangaard Brouer wrote:
> @@ -609,6 +609,8 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
>  		recycle_stat_inc(pool, ring_full);
>  		page_pool_return_page(pool, page);
>  	}
> +	if (pool->p.flags & PP_FLAG_SHUTDOWN)
> +		page_pool_shutdown_attempt(pool);

It seems we have allowed page_pool_shutdown_attempt() to be called
concurrently here, isn't there a time window between atomic_inc_return_relaxed()
and page_pool_inflight() for pool->pages_state_release_cnt, which may cause
double calling of page_pool_free()?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme
  2023-04-27  0:57   ` Yunsheng Lin
@ 2023-04-27 10:47     ` Jesper Dangaard Brouer
  2023-04-27 18:29       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-27 10:47 UTC (permalink / raw)
  To: Yunsheng Lin, Ilias Apalodimas, netdev, Eric Dumazet, linux-mm,
	Mel Gorman
  Cc: brouer, lorenzo, Toke Høiland-Jørgensen, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy



On 27/04/2023 02.57, Yunsheng Lin wrote:
> On 2023/4/26 1:15, Jesper Dangaard Brouer wrote:
>> @@ -609,6 +609,8 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
>>   		recycle_stat_inc(pool, ring_full);
>>   		page_pool_return_page(pool, page);
>>   	}
>> +	if (pool->p.flags & PP_FLAG_SHUTDOWN)
>> +		page_pool_shutdown_attempt(pool);
> 
> It seems we have allowed page_pool_shutdown_attempt() to be called
> concurrently here, isn't there a time window between atomic_inc_return_relaxed()
> and page_pool_inflight() for pool->pages_state_release_cnt, which may cause
> double calling of page_pool_free()?
> 

Yes, I think that is correct.
I actually woke up this morning thinking of this case of double freeing,
and this time window.  Thanks for spotting and confirming this issue.

Basically: Two concurrent CPUs executing page_pool_shutdown_attempt() 
can both end-up seeing inflight equal zero, resulting in both of them 
kfreeing the memory (in page_pool_free()) as they both think they are 
the last user of PP instance.

I've been thinking how to address this.
This is my current idea:

(1) Atomic variable inc and test (or cmpxchg) that resolves last user race.
(2) Defer free to call_rcu callback to let other CPUs finish.
(3) Might need rcu_read_lock() in page_pool_shutdown_attempt().

--Jesper



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme
  2023-04-27 10:47     ` Jesper Dangaard Brouer
@ 2023-04-27 18:29       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-27 18:29 UTC (permalink / raw)
  To: Yunsheng Lin, Ilias Apalodimas, netdev, Eric Dumazet, linux-mm,
	Mel Gorman
  Cc: brouer, lorenzo, Toke Høiland-Jørgensen, bpf,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Andrew Morton,
	willy

[-- Attachment #1: Type: text/plain, Size: 1643 bytes --]


On 27/04/2023 12.47, Jesper Dangaard Brouer wrote:
> 
> On 27/04/2023 02.57, Yunsheng Lin wrote:
>> On 2023/4/26 1:15, Jesper Dangaard Brouer wrote:
>>> @@ -609,6 +609,8 @@ void page_pool_put_defragged_page(struct 
>>> page_pool *pool, struct page *page,
>>>           recycle_stat_inc(pool, ring_full);
>>>           page_pool_return_page(pool, page);
>>>       }
>>> +    if (pool->p.flags & PP_FLAG_SHUTDOWN)
>>> +        page_pool_shutdown_attempt(pool);
>>
>> It seems we have allowed page_pool_shutdown_attempt() to be called
>> concurrently here, isn't there a time window between 
>> atomic_inc_return_relaxed()
>> and page_pool_inflight() for pool->pages_state_release_cnt, which may 
>> cause
>> double calling of page_pool_free()?
>>
> 
> Yes, I think that is correct.
> I actually woke up this morning thinking of this case of double freeing,
> and this time window.  Thanks for spotting and confirming this issue.
> 
> Basically: Two concurrent CPUs executing page_pool_shutdown_attempt() 
> can both end-up seeing inflight equal zero, resulting in both of them 
> kfreeing the memory (in page_pool_free()) as they both think they are 
> the last user of PP instance.
> 
> I've been thinking how to address this.
> This is my current idea:
> 
> (1) Atomic variable inc and test (or cmpxchg) that resolves last user race.
> (2) Defer free to call_rcu callback to let other CPUs finish.
> (3) Might need rcu_read_lock() in page_pool_shutdown_attempt().
> 

I think I found a more simply approach (adjustment patch attached).
That avoids races and any call_rcu callbacks.

Will post a V2.

--Jesper

[-- Attachment #2: 02-fix-race --]
[-- Type: text/plain, Size: 4504 bytes --]

fix race

From: Jesper Dangaard Brouer <brouer@redhat.com>

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/page_pool.c |   48 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ce7e8dda6403..25139b162674 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -451,9 +451,8 @@ EXPORT_SYMBOL(page_pool_alloc_pages);
  */
 #define _distance(a, b)	(s32)((a) - (b))
 
-static s32 page_pool_inflight(struct page_pool *pool)
+static s32 __page_pool_inflight(struct page_pool *pool, u32 release_cnt)
 {
-	u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
 	u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
 	s32 inflight;
 
@@ -465,6 +464,14 @@ static s32 page_pool_inflight(struct page_pool *pool)
 	return inflight;
 }
 
+static s32 page_pool_inflight(struct page_pool *pool)
+{
+	u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
+	return __page_pool_inflight(pool, release_cnt);
+}
+
+static int page_pool_free_attempt(struct page_pool *pool, u32 release_cnt);
+
 /* Disconnects a page (from a page_pool).  API users can have a need
  * to disconnect a page (from a page_pool), to allow it to be used as
  * a regular page (that will eventually be returned to the normal
@@ -473,7 +480,7 @@ static s32 page_pool_inflight(struct page_pool *pool)
 void page_pool_release_page(struct page_pool *pool, struct page *page)
 {
 	dma_addr_t dma;
-	int count;
+	u32 count;
 
 	if (!(pool->p.flags & PP_FLAG_DMA_MAP))
 		/* Always account for inflight pages, even if we didn't
@@ -490,8 +497,12 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
 	page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
 	page_pool_clear_pp_info(page);
-	count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
+	count = atomic_inc_return(&pool->pages_state_release_cnt);
 	trace_page_pool_state_release(pool, page, count);
+
+	/* In shutdown phase, last page will free pool instance */
+	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+		page_pool_free_attempt(pool, count);
 }
 EXPORT_SYMBOL(page_pool_release_page);
 
@@ -543,7 +554,7 @@ static bool page_pool_recycle_in_cache(struct page *page,
 	return true;
 }
 
-static void page_pool_shutdown_attempt(struct page_pool *pool);
+static void page_pool_empty_ring(struct page_pool *pool);
 
 /* If the page refcnt == 1, this will try to recycle the page.
  * if PP_FLAG_DMA_SYNC_DEV is set, we'll try to sync the DMA area for
@@ -610,7 +621,7 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
 		page_pool_return_page(pool, page);
 	}
 	if (pool->p.flags & PP_FLAG_SHUTDOWN)
-		page_pool_shutdown_attempt(pool);
+		page_pool_empty_ring(pool);
 }
 EXPORT_SYMBOL(page_pool_put_defragged_page);
 
@@ -660,7 +671,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 
 out:
 	if (pool->p.flags & PP_FLAG_SHUTDOWN)
-		page_pool_shutdown_attempt(pool);
+		page_pool_empty_ring(pool);
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);
 
@@ -743,6 +754,7 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 }
 EXPORT_SYMBOL(page_pool_alloc_frag);
 
+noinline
 static void page_pool_empty_ring(struct page_pool *pool)
 {
 	struct page *page;
@@ -802,22 +814,28 @@ static void page_pool_scrub(struct page_pool *pool)
 	page_pool_empty_ring(pool);
 }
 
-static int page_pool_release(struct page_pool *pool)
+noinline
+static int page_pool_free_attempt(struct page_pool *pool, u32 release_cnt)
 {
 	int inflight;
 
-	page_pool_scrub(pool);
-	inflight = page_pool_inflight(pool);
+	inflight = __page_pool_inflight(pool, release_cnt);
 	if (!inflight)
 		page_pool_free(pool);
 
 	return inflight;
 }
 
-noinline
-static void page_pool_shutdown_attempt(struct page_pool *pool)
+static int page_pool_release(struct page_pool *pool)
 {
-	page_pool_release(pool);
+	int inflight;
+
+	page_pool_scrub(pool);
+	inflight = page_pool_inflight(pool);
+	if (!inflight)
+		page_pool_free(pool);
+
+	return inflight;
 }
 
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
@@ -861,7 +879,9 @@ void page_pool_destroy(struct page_pool *pool)
 	 * Enter into shutdown phase, and retry release to handle races.
 	 */
 	pool->p.flags |= PP_FLAG_SHUTDOWN;
-	page_pool_shutdown_attempt(pool);
+
+	/* Concurrent CPUs could have returned last pages into ptr_ring */
+	page_pool_empty_ring(pool);
 }
 EXPORT_SYMBOL(page_pool_destroy);
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-04-27 18:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-25 17:15 [PATCH RFC net-next/mm V1 0/3] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 1/3] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
2023-04-27  0:57   ` Yunsheng Lin
2023-04-27 10:47     ` Jesper Dangaard Brouer
2023-04-27 18:29       ` Jesper Dangaard Brouer
2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for shutdown phase Jesper Dangaard Brouer
2023-04-25 23:30   ` Alexei Starovoitov
2023-04-26 15:15     ` Jesper Dangaard Brouer
2023-04-25 17:15 ` [PATCH RFC net-next/mm V1 3/3] mm/page_pool: catch page_pool memory leaks Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox