linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: zhangkun09@huawei.com, liuyonglong@huawei.com,
	fanghaiqing@huawei.com, Robin Murphy <robin.murphy@arm.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	IOMMU <iommu@lists.linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Dumazet <edumazet@google.com>,
	Simon Horman <horms@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound
Date: Fri, 17 Jan 2025 17:56:29 +0100	[thread overview]
Message-ID: <eb2381d2-34a4-4915-b0b5-b07cc81646d3@kernel.org> (raw)
In-Reply-To: <92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com>



On 17/01/2025 12.56, Yunsheng Lin wrote:
> On 2025/1/17 0:09, Jesper Dangaard Brouer wrote:
> 
> ...
> 
>>> Mainly because there is no space available for keeping tracking of inflight
>>> pages, using page->pp can only find the page_pool owning the page, but page_pool
>>> is not able to keep track of the inflight page when the page is handled by
>>> networking stack.
>>>
>>> By using page_pool_item as below, the state is used to tell if a specific
>>> item is being used/dma mapped or not by scanning all the item blocks in
>>> pool->item_blocks. If a specific item is used by a page, then 'pp_netmem'
>>> will point to that page so that dma unmapping can be done for that page
>>> when page_pool_destroy() is called, otherwise free items sit in the
>>> pool->hold_items or pool->release_items by using 'lentry':
>>>
>>> struct page_pool_item {
>>>      unsigned long state;
>>>      
>>>      union {
>>>          netmem_ref pp_netmem;
>>>          struct llist_node lentry;
>>>      };
>>> };
>>
>> pahole  -C page_pool_item vmlinux
>> struct page_pool_item {
>>      /* An 'encoded_next' is a pointer to next item, lower 2 bits is used to
>>       * indicate the state of current item.
>>       */
>>      long unsigned int          encoded_next;     /*     0     8 */
>>      union {
>>          netmem_ref         pp_netmem;        /*     8     8 */
>>          struct llist_node  lentry;           /*     8     8 */
>>      };                                           /*     8     8 */
>>
>>      /* size: 16, cachelines: 1, members: 2 */
>>      /* last cacheline: 16 bytes */
>> };
>>
>>
>>> When a page is added to the page_pool, a item is deleted from pool->hold_items
>>> or pool->release_items and set the 'pp_netmem' pointing to that page and set
>>> 'state' accordingly in order to keep track of that page.
>>>
>>> When a page is deleted from the page_pool, it is able to tell which page_pool
>>> this page belong to by using the below function, and after clearing the 'state',
>>> the item is added back to pool->release_items so that the item is reused for new
>>> pages.
>>>
>>
>> To understand below, I'm listing struct page_pool_item_block for other
>> reviewers:
>>
>> pahole  -C page_pool_item_block vmlinux
>> struct page_pool_item_block {
>>      struct page_pool *         pp;               /*     0     8 */
>>      struct list_head           list;             /*     8    16 */
>>      unsigned int               flags;            /*    24     4 */
>>      refcount_t                 ref;              /*    28     4 */
>>      struct page_pool_item      items[];          /*    32     0 */
>>
>>      /* size: 32, cachelines: 1, members: 5 */
>>      /* last cacheline: 32 bytes */
>> };
>>
>>> static inline struct page_pool_item_block *
>>> page_pool_item_to_block(struct page_pool_item *item)
>>> {
>>>      return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK);
>>
>> This trick requires some comments explaining what is going on!
>> Please correct me if I'm wrong: Here you a masking off the lower bits of
>> the pointer to page_pool_item *item, as you know that a struct
>> page_pool_item_block is stored in the top of a struct page.  This trick
>> is like a "container_of" for going from page_pool_item to
>> page_pool_item_block, right?
> 
> Yes, you are right.
> 
>>
>> I do notice that you have a comment above struct page_pool_item_block
>> (that says "item_block is always PAGE_SIZE"), which is nice, but to be
>> more explicit/clear:
>>   I want a big comment block (placed above the main code here) that
>> explains the design and intention behind this newly invented
>> "item-block" scheme, like e.g. the connection between
>> page_pool_item_block and page_pool_item. Like the advantage/trick that
>> allows page->pp pointer to be an "item" and be mapped back to a "block"
>> to find the page_pool object it belongs to.  Don't write *what* the code
>> does, but write about the intended purpose and design reasons behind the
>> code.
> 
> The comment for page_pool_item_block is below, it seems I also wrote about
> intended purpose and design reasons here.
> 
> /* The size of item_block is always PAGE_SIZE, so that the address of item_block
>   * for a specific item can be calculated using 'item & PAGE_MASK'
>   */
> 
> Anyway, If putting something like above for page_pool_item_to_block() does
> make it clearer, will add some comment for page_pool_item_to_block() too.
> 
>>
>>
>>> }
>>>
>>>    static inline struct page_pool *page_pool_get_pp(struct page *page)
>>>    {
>>>         return page_pool_item_to_block(page->pp_item)->pp;
>>>    }
>>>
>>>
>>>>
>>>>> +static void __page_pool_item_init(struct page_pool *pool, struct page *page)
>>>>> +{
>>>>
>>>> Function name is confusing.  First I though this was init'ing a single
>>>> item, but looking at the code it is iterating over ITEMS_PER_PAGE.
>>>>
>>>> Maybe it should be called page_pool_item_block_init ?
>>>
>>> The __page_pool_item_init() is added to make the below
>>> page_pool_item_init() function more readable or maintainable, changing
>>> it to page_pool_item_block_init doesn't seems consistent?
>>
>> You (of-cause) also have to rename the other function, I though that was
>> implicitly understood.
>>
>> BUT does my suggested rename make sense?  What I'm seeing is that all
>> the *items* in the "block" is getting inited. But we are also setting up
>> the "block" (e.g.  "block->pp=pool").
> 
> I am not really sure about that, as using the PAGE_SIZE block to hold the
> item seems like a implementation detail, which might change in the future,
> renaming other function to something like that doesn't seem right to me IMHO.
> 
> Also the next patch will add page_pool_item_blk_add() to support unlimited
> inflight pages, it seems a better name is needed for that too, perheps rename
> page_pool_item_blk_add() to page_pool_dynamic_item_add()?
> 

Hmmm... not sure about this.
I think I prefer page_pool_item_blk_add() over page_pool_dynamic_item_add().

> For __page_pool_item_init(), perhaps just inline it back to page_pool_item_init()
> as __page_pool_item_init() is only used by page_pool_item_init(), and both of them
> are not really large function.

I like that you had a helper function. So, don't merge 
__page_pool_item_init() into page_pool_item_init() just to avoid naming 
it differently.

Let me be more explicit what I'm asking for:

IMHO you should rename:
  - __page_pool_item_init() to __page_pool_item_block_init()
and rename:
  - page_pool_item_init() to page_pool_item_block_init()

I hope this make it more clear what I'm saying.

--Jesper


  reply	other threads:[~2025-01-17 16:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-15 16:29   ` Jesper Dangaard Brouer
2025-01-16 12:52     ` Yunsheng Lin
2025-01-16 16:09       ` Jesper Dangaard Brouer
2025-01-17 11:56         ` Yunsheng Lin
2025-01-17 16:56           ` Jesper Dangaard Brouer [this message]
2025-01-18 13:36             ` Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-15 11:33   ` Yunsheng Lin
2025-01-15 17:40     ` Jesper Dangaard Brouer
2025-01-16 12:52       ` Yunsheng Lin
2025-01-16 18:02         ` Jesper Dangaard Brouer
2025-01-17 11:35           ` Yunsheng Lin
2025-01-18  8:04             ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb2381d2-34a4-4915-b0b5-b07cc81646d3@kernel.org \
    --to=hawk@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fanghaiqing@huawei.com \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=iommu@lists.linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=zhangkun09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox