On 27/04/2023 12.47, Jesper Dangaard Brouer wrote: > > On 27/04/2023 02.57, Yunsheng Lin wrote: >> On 2023/4/26 1:15, Jesper Dangaard Brouer wrote: >>> @@ -609,6 +609,8 @@ void page_pool_put_defragged_page(struct >>> page_pool *pool, struct page *page, >>>           recycle_stat_inc(pool, ring_full); >>>           page_pool_return_page(pool, page); >>>       } >>> +    if (pool->p.flags & PP_FLAG_SHUTDOWN) >>> +        page_pool_shutdown_attempt(pool); >> >> It seems we have allowed page_pool_shutdown_attempt() to be called >> concurrently here, isn't there a time window between >> atomic_inc_return_relaxed() >> and page_pool_inflight() for pool->pages_state_release_cnt, which may >> cause >> double calling of page_pool_free()? >> > > Yes, I think that is correct. > I actually woke up this morning thinking of this case of double freeing, > and this time window.  Thanks for spotting and confirming this issue. > > Basically: Two concurrent CPUs executing page_pool_shutdown_attempt() > can both end-up seeing inflight equal zero, resulting in both of them > kfreeing the memory (in page_pool_free()) as they both think they are > the last user of PP instance. > > I've been thinking how to address this. > This is my current idea: > > (1) Atomic variable inc and test (or cmpxchg) that resolves last user race. > (2) Defer free to call_rcu callback to let other CPUs finish. > (3) Might need rcu_read_lock() in page_pool_shutdown_attempt(). > I think I found a more simply approach (adjustment patch attached). That avoids races and any call_rcu callbacks. Will post a V2. --Jesper