linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>, <davem@davemloft.net>,
	<kuba@kernel.org>, <pabeni@redhat.com>
Cc: <zhangkun09@huawei.com>, <liuyonglong@huawei.com>,
	<fanghaiqing@huawei.com>, Robin Murphy <robin.murphy@arm.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	IOMMU <iommu@lists.linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Dumazet <edumazet@google.com>,
	Simon Horman <horms@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound
Date: Thu, 16 Jan 2025 20:52:58 +0800	[thread overview]
Message-ID: <2b5a58f3-d67a-4bf7-921a-033326958ac6@huawei.com> (raw)
In-Reply-To: <921c827c-41b7-40af-8c01-c21adbe8f41f@kernel.org>

On 2025/1/16 0:29, Jesper Dangaard Brouer wrote:
> 
> 
> On 10/01/2025 14.06, Yunsheng Lin wrote:
> [...]
>> In order not to call DMA APIs to do DMA unmmapping after driver
>> has already unbound and stall the unloading of the networking
>> driver, use some pre-allocated item blocks to record inflight
>> pages including the ones which are handed over to network stack,
>> so the page_pool can do the DMA unmmapping for those pages when
>> page_pool_destroy() is called. As the pre-allocated item blocks
>> need to be large enough to avoid performance degradation, add a
>> 'item_fast_empty' stat to indicate the unavailability of the
>> pre-allocated item blocks.
>>
> 

...

>> +
>> +static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
>> +                             netmem_ref netmem,
>> +                             bool destroyed)
>> +{
>> +    struct page_pool_item *item;
>> +    dma_addr_t dma;
>> +
>> +    if (!pool->dma_map)
>> +        /* Always account for inflight pages, even if we didn't
>> +         * map them
>> +         */
>> +        return;
>> +
>> +    dma = page_pool_get_dma_addr_netmem(netmem);
>> +    item = netmem_get_pp_item(netmem);
>> +
>> +    /* dma unmapping is always needed when page_pool_destory() is not called
>> +     * yet.
>> +     */
>> +    DEBUG_NET_WARN_ON_ONCE(!destroyed && !page_pool_item_is_mapped(item));
>> +    if (unlikely(destroyed && !page_pool_item_is_mapped(item)))
>> +        return;
>> +
>> +    /* When page is unmapped, it cannot be returned to our pool */
>> +    dma_unmap_page_attrs(pool->p.dev, dma,
>> +                 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
>> +                 DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
>> +    page_pool_set_dma_addr_netmem(netmem, 0);
>> +    page_pool_item_clear_mapped(item);
>> +}
>> +
> 
> I have a hard time reading/reviewing/maintaining below code, without
> some design description.  This code needs more comments on what is the
> *intend* and design it's trying to achieve.
> 
> From patch description the only hint I have is:
>  "use some pre-allocated item blocks to record inflight pages"
> 
> E.g. Why is it needed/smart to hijack the page->pp pointer?

Mainly because there is no space available for keeping tracking of inflight
pages, using page->pp can only find the page_pool owning the page, but page_pool
is not able to keep track of the inflight page when the page is handled by
networking stack.

By using page_pool_item as below, the state is used to tell if a specific
item is being used/dma mapped or not by scanning all the item blocks in
pool->item_blocks. If a specific item is used by a page, then 'pp_netmem'
will point to that page so that dma unmapping can be done for that page
when page_pool_destroy() is called, otherwise free items sit in the
pool->hold_items or pool->release_items by using 'lentry':

struct page_pool_item {
	unsigned long state;
	
	union {
		netmem_ref pp_netmem;
		struct llist_node lentry;
	};
};

When a page is added to the page_pool, a item is deleted from pool->hold_items
or pool->release_items and set the 'pp_netmem' pointing to that page and set
'state' accordingly in order to keep track of that page.

When a page is deleted from the page_pool, it is able to tell which page_pool
this page belong to by using the below function, and after clearing the 'state',
the item is added back to pool->release_items so that the item is reused for new
pages.

static inline struct page_pool_item_block *
page_pool_item_to_block(struct page_pool_item *item)
{
	return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK);
}

 static inline struct page_pool *page_pool_get_pp(struct page *page)
 {
      return page_pool_item_to_block(page->pp_item)->pp;
 }


> 
>> +static void __page_pool_item_init(struct page_pool *pool, struct page *page)
>> +{
> 
> Function name is confusing.  First I though this was init'ing a single
> item, but looking at the code it is iterating over ITEMS_PER_PAGE.
> 
> Maybe it should be called page_pool_item_block_init ?

The __page_pool_item_init() is added to make the below
page_pool_item_init() function more readable or maintainable, changing
it to page_pool_item_block_init doesn't seems consistent?

> 
>> +    struct page_pool_item_block *block = page_address(page);
>> +    struct page_pool_item *items = block->items;
>> +    unsigned int i;
>> +
>> +    list_add(&block->list, &pool->item_blocks);
>> +    block->pp = pool;
>> +
>> +    for (i = 0; i < ITEMS_PER_PAGE; i++) {
>> +        page_pool_item_init_state(&items[i]);
>> +        __llist_add(&items[i].lentry, &pool->hold_items);
>> +    }
>> +}
>> +
>> +static int page_pool_item_init(struct page_pool *pool)
>> +{
>> +#define PAGE_POOL_MIN_INFLIGHT_ITEMS        512
>> +    struct page_pool_item_block *block;
>> +    int item_cnt;
>> +
>> +    INIT_LIST_HEAD(&pool->item_blocks);
>> +    init_llist_head(&pool->hold_items);
>> +    init_llist_head(&pool->release_items);
>> +
>> +    item_cnt = pool->p.pool_size * 2 + PP_ALLOC_CACHE_SIZE +
>> +        PAGE_POOL_MIN_INFLIGHT_ITEMS;
>> +    while (item_cnt > 0) {
>> +        struct page *page;
>> +
>> +        page = alloc_pages_node(pool->p.nid, GFP_KERNEL, 0);
>> +        if (!page)
>> +            goto err;
>> +
>> +        __page_pool_item_init(pool, page);
>> +        item_cnt -= ITEMS_PER_PAGE;
>> +    }
>> +
>> +    return 0;
>> +err:
>> +    list_for_each_entry(block, &pool->item_blocks, list)
>> +        put_page(virt_to_page(block));

This one also have used-after-free problem as the page_pool_item_uninit
in the previous version.

>> +
>> +    return -ENOMEM;
>> +}
>> +



  reply	other threads:[~2025-01-16 12:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-15 16:29   ` Jesper Dangaard Brouer
2025-01-16 12:52     ` Yunsheng Lin [this message]
2025-01-16 16:09       ` Jesper Dangaard Brouer
2025-01-17 11:56         ` Yunsheng Lin
2025-01-17 16:56           ` Jesper Dangaard Brouer
2025-01-18 13:36             ` Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-15 11:33   ` Yunsheng Lin
2025-01-15 17:40     ` Jesper Dangaard Brouer
2025-01-16 12:52       ` Yunsheng Lin
2025-01-16 18:02         ` Jesper Dangaard Brouer
2025-01-17 11:35           ` Yunsheng Lin
2025-01-18  8:04             ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b5a58f3-d67a-4bf7-921a-033326958ac6@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fanghaiqing@huawei.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=iommu@lists.linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=zhangkun09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox