From: Yunsheng Lin <linyunsheng@huawei.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>, <davem@davemloft.net>,
<kuba@kernel.org>, <pabeni@redhat.com>
Cc: <zhangkun09@huawei.com>, <liuyonglong@huawei.com>,
<fanghaiqing@huawei.com>, Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
IOMMU <iommu@lists.linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<netdev@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound
Date: Fri, 17 Jan 2025 19:56:43 +0800 [thread overview]
Message-ID: <92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com> (raw)
In-Reply-To: <95f258b2-52f5-4a80-a670-b9a182caec7c@kernel.org>
On 2025/1/17 0:09, Jesper Dangaard Brouer wrote:
...
>> Mainly because there is no space available for keeping tracking of inflight
>> pages, using page->pp can only find the page_pool owning the page, but page_pool
>> is not able to keep track of the inflight page when the page is handled by
>> networking stack.
>>
>> By using page_pool_item as below, the state is used to tell if a specific
>> item is being used/dma mapped or not by scanning all the item blocks in
>> pool->item_blocks. If a specific item is used by a page, then 'pp_netmem'
>> will point to that page so that dma unmapping can be done for that page
>> when page_pool_destroy() is called, otherwise free items sit in the
>> pool->hold_items or pool->release_items by using 'lentry':
>>
>> struct page_pool_item {
>> unsigned long state;
>>
>> union {
>> netmem_ref pp_netmem;
>> struct llist_node lentry;
>> };
>> };
>
> pahole -C page_pool_item vmlinux
> struct page_pool_item {
> /* An 'encoded_next' is a pointer to next item, lower 2 bits is used to
> * indicate the state of current item.
> */
> long unsigned int encoded_next; /* 0 8 */
> union {
> netmem_ref pp_netmem; /* 8 8 */
> struct llist_node lentry; /* 8 8 */
> }; /* 8 8 */
>
> /* size: 16, cachelines: 1, members: 2 */
> /* last cacheline: 16 bytes */
> };
>
>
>> When a page is added to the page_pool, a item is deleted from pool->hold_items
>> or pool->release_items and set the 'pp_netmem' pointing to that page and set
>> 'state' accordingly in order to keep track of that page.
>>
>> When a page is deleted from the page_pool, it is able to tell which page_pool
>> this page belong to by using the below function, and after clearing the 'state',
>> the item is added back to pool->release_items so that the item is reused for new
>> pages.
>>
>
> To understand below, I'm listing struct page_pool_item_block for other
> reviewers:
>
> pahole -C page_pool_item_block vmlinux
> struct page_pool_item_block {
> struct page_pool * pp; /* 0 8 */
> struct list_head list; /* 8 16 */
> unsigned int flags; /* 24 4 */
> refcount_t ref; /* 28 4 */
> struct page_pool_item items[]; /* 32 0 */
>
> /* size: 32, cachelines: 1, members: 5 */
> /* last cacheline: 32 bytes */
> };
>
>> static inline struct page_pool_item_block *
>> page_pool_item_to_block(struct page_pool_item *item)
>> {
>> return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK);
>
> This trick requires some comments explaining what is going on!
> Please correct me if I'm wrong: Here you a masking off the lower bits of
> the pointer to page_pool_item *item, as you know that a struct
> page_pool_item_block is stored in the top of a struct page. This trick
> is like a "container_of" for going from page_pool_item to
> page_pool_item_block, right?
Yes, you are right.
>
> I do notice that you have a comment above struct page_pool_item_block
> (that says "item_block is always PAGE_SIZE"), which is nice, but to be
> more explicit/clear:
> I want a big comment block (placed above the main code here) that
> explains the design and intention behind this newly invented
> "item-block" scheme, like e.g. the connection between
> page_pool_item_block and page_pool_item. Like the advantage/trick that
> allows page->pp pointer to be an "item" and be mapped back to a "block"
> to find the page_pool object it belongs to. Don't write *what* the code
> does, but write about the intended purpose and design reasons behind the
> code.
The comment for page_pool_item_block is below, it seems I also wrote about
intended purpose and design reasons here.
/* The size of item_block is always PAGE_SIZE, so that the address of item_block
* for a specific item can be calculated using 'item & PAGE_MASK'
*/
Anyway, If putting something like above for page_pool_item_to_block() does
make it clearer, will add some comment for page_pool_item_to_block() too.
>
>
>> }
>>
>> static inline struct page_pool *page_pool_get_pp(struct page *page)
>> {
>> return page_pool_item_to_block(page->pp_item)->pp;
>> }
>>
>>
>>>
>>>> +static void __page_pool_item_init(struct page_pool *pool, struct page *page)
>>>> +{
>>>
>>> Function name is confusing. First I though this was init'ing a single
>>> item, but looking at the code it is iterating over ITEMS_PER_PAGE.
>>>
>>> Maybe it should be called page_pool_item_block_init ?
>>
>> The __page_pool_item_init() is added to make the below
>> page_pool_item_init() function more readable or maintainable, changing
>> it to page_pool_item_block_init doesn't seems consistent?
>
> You (of-cause) also have to rename the other function, I though that was
> implicitly understood.
>
> BUT does my suggested rename make sense? What I'm seeing is that all
> the *items* in the "block" is getting inited. But we are also setting up
> the "block" (e.g. "block->pp=pool").
I am not really sure about that, as using the PAGE_SIZE block to hold the
item seems like a implementation detail, which might change in the future,
renaming other function to something like that doesn't seem right to me IMHO.
Also the next patch will add page_pool_item_blk_add() to support unlimited
inflight pages, it seems a better name is needed for that too, perheps rename
page_pool_item_blk_add() to page_pool_dynamic_item_add()?
For __page_pool_item_init(), perhaps just inline it back to page_pool_item_init()
as __page_pool_item_init() is only used by page_pool_item_init(), and both of them
are not really large function.
next prev parent reply other threads:[~2025-01-17 11:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-15 16:29 ` Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 16:09 ` Jesper Dangaard Brouer
2025-01-17 11:56 ` Yunsheng Lin [this message]
2025-01-17 16:56 ` Jesper Dangaard Brouer
2025-01-18 13:36 ` Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-15 11:33 ` Yunsheng Lin
2025-01-15 17:40 ` Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 18:02 ` Jesper Dangaard Brouer
2025-01-17 11:35 ` Yunsheng Lin
2025-01-18 8:04 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com \
--to=linyunsheng@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=iommu@lists.linux.dev \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=robin.murphy@arm.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox