From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE66BC02183 for ; Fri, 17 Jan 2025 16:56:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F8B3280006; Fri, 17 Jan 2025 11:56:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A845280002; Fri, 17 Jan 2025 11:56:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57024280006; Fri, 17 Jan 2025 11:56:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 39A89280002 for ; Fri, 17 Jan 2025 11:56:39 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E6C4E80534 for ; Fri, 17 Jan 2025 16:56:38 +0000 (UTC) X-FDA: 83017547676.22.C302631 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf22.hostedemail.com (Postfix) with ESMTP id 3AEB9C0007 for ; Fri, 17 Jan 2025 16:56:37 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bb6YIGt0; spf=pass (imf22.hostedemail.com: domain of hawk@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737132997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+n7Q1eOBEIFzGXycPMibBmoYP1S1PXRFTOExKTXZ7Gs=; b=6Kkq+Jt/06N4Y4VNWAyenKsLdzjCC9/vGSwTpJdnvUhci8hyCnAORRK24lbp4cwPcvL72Q Va80pUVRRrT2xUxxNQPsvQXScg8Cp7GBWyAX8zcaqOifWRF1XzhAZPOafPktytu5GIQoKC e0XrzDhh4HTTM152oYsF/TCl+hcg/aQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bb6YIGt0; spf=pass (imf22.hostedemail.com: domain of hawk@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737132997; a=rsa-sha256; cv=none; b=Z/fp859W3EC8J+/9SaN2KQHMv3oD81/r8yc3Vc7C4SKX7xtJPRHnMGPhvqDC6zramQ4Fet /WKUo+evIy50SIDaiTdYd+WW7KBO6cVHfcTQBROvPeBh93ixMBJitB8Knq5ZUkbx/u4Lib RM9A24ouIvH3ikGuMw7/c1D9nilE+vk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 759CEA43238; Fri, 17 Jan 2025 16:54:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E93E5C4CEDD; Fri, 17 Jan 2025 16:56:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737132996; bh=zcK8WB4ZaeZlWpWWyW2l5kp3TFVT9q3fShc6jxCxxIY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=bb6YIGt0sTIzbYBBsMO6q1tiNahm67G18z2l6sNOqX2eN+qeVvyCapie3vOm3YNkc NSPdxJAZyk5vpWT8H8OfMoNtSDf6aa9OV/EUUiONZtRngj21qQDt8EiK7ffyJYnRts ArS8w77jxFubn3ytE4/vD7zSe/IvGJXpOp4Fm8O6N1S/oarFpQxX9aW4oPlB+kS1Lu T2IZlwshC9xtUyQPamQxYQoraNqsDBvQDLZyrATTmn1z5XG3QLla27QAb8TbuA3IjD 9nSIXx80DDPulxnKHoBBgGp2nO3oY78eTlKe2BDnpB5egabBnxjby1utKSUTHgQ0g5 deI9gnMIYNj7w== Message-ID: Date: Fri, 17 Jan 2025 17:56:29 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound To: Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: zhangkun09@huawei.com, liuyonglong@huawei.com, fanghaiqing@huawei.com, Robin Murphy , Alexander Duyck , IOMMU , Andrew Morton , Eric Dumazet , Simon Horman , Ilias Apalodimas , linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kernel-team References: <20250110130703.3814407-1-linyunsheng@huawei.com> <20250110130703.3814407-4-linyunsheng@huawei.com> <921c827c-41b7-40af-8c01-c21adbe8f41f@kernel.org> <2b5a58f3-d67a-4bf7-921a-033326958ac6@huawei.com> <95f258b2-52f5-4a80-a670-b9a182caec7c@kernel.org> <92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3AEB9C0007 X-Stat-Signature: 11rurtr68bbgi5uzmqomiutkp73s71s7 X-HE-Tag: 1737132997-989335 X-HE-Meta: U2FsdGVkX1+OzJy0l8jEt+a8+aJZmgBXJIMG0SDhvr0rOvCcG9o50vZrtqQTuJgSTHOBnHqwBWJZMeZU9PszLDPz+TgXOYPfmu9KSYJeX/e3e1b1X779cv9cHBDz2FfbMCs1wmjM0X6Tv/zeIvo72tZKLn5swlMIlDBICJzjVTfbW0M6Oa9yli9xTO5czp3pACPX9FD3jfoU19A4pBsj13I1jQQV7rotvXydc3mqJa7KrlXCGKAdTXr/GbCOj582VKxZn/FOZY2M8I1x9lImqxvSu2dKn9XCWCNYJpj9Q1yrbvIRlfVnZZf3YPcYBhG13G4bKFZoWjQSUH66hHaq5u6b9/5t0t92JsMsCkkd5+cCRCvg2b8k9pVI71+Wb3az98hhTHvIrXE1MwEayCZqPGW5GZIruJqRUDdIzjaF/7EmJwY1Tv7WjO6vPZ8XEzeArBuzLr6p4rGz02w8Hlfwn8gQgP1smfTA3d8k/U0WXT0k9mSH8mYPf+25FfDOXaR2/YReHd5sUUFnuHIIyKht5q/J9c1sNA4uAlP7xozcaCPmJK3Ue1kTr8btz2jrBDBMqwgRXhQek38QacBW12lP5YPWoWu0Cr5ONqJL4LDYMg4eVoX+RLHjmlYh080mI1fpZ0Ub+Kd3BqyjCd9c00RUmF/qXX5GN7Bl8cDNwBQIkcJkA7ksESXBOQedheeAn+BBRegue16oVewOvJvsyqE6IgoQQR/SuVOb+bT9C6g9yZH7u631TbAH87LUJnlr9oOuK0GbotN+u45Aa/T7u/qihOS1L5d7jErskaYHCozw1R5VircS8pyGsBMLt+FkHTocyKaI4zrE69JAtCnH5N4Fv6KxGhFm66huxVTBH7+wHV2k3S/UNIHPqCSZbftxpU5tmVH9/+EJ1pH4D/NmdvfP0se3CrY/F6RdA3TvpDjZ05R+J8vYGAqsV9fcHZDcEp3b/WeDfl/X9t4t00K3Hk9 QCiL33cv QyvtDkppcMrwz0B28v0DIoxMPTqkBJlxxShjt1dK+3WbHe4YLmdgsDltGjcZo9DBCdPbCpmcIhYBDvFr8HwjU6flYlIz3UuQeEatvUgxZHYLRsVX5SA6xK/YbPO58OFYWP5pkyrXpaLtfnGdXcoF1euQqKwiDHww2IkkmKp+nEdFHPgeabSx32iKQgxzgi7Z5EJ/B2un/2SO/mCviRas0zYXZgcJxmyq8gIRw8HkInNr503XKLUYxUW1/Ei9Me7TsRxFo6XOBboxBxEpppnaEDvRDa2wkB+aCNNH1L2PkNpuKzLRVXvBtUvwa9om3jxSe3Xk0yrra4ZIgJJT+tHnwx8awrYhrbb3udA+sS7cxucWa1Y9ci0Fq2cw6ko6T1FGOd0u4vhPH2Tdimkg+/UxZ7VvAtA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17/01/2025 12.56, Yunsheng Lin wrote: > On 2025/1/17 0:09, Jesper Dangaard Brouer wrote: > > ... > >>> Mainly because there is no space available for keeping tracking of inflight >>> pages, using page->pp can only find the page_pool owning the page, but page_pool >>> is not able to keep track of the inflight page when the page is handled by >>> networking stack. >>> >>> By using page_pool_item as below, the state is used to tell if a specific >>> item is being used/dma mapped or not by scanning all the item blocks in >>> pool->item_blocks. If a specific item is used by a page, then 'pp_netmem' >>> will point to that page so that dma unmapping can be done for that page >>> when page_pool_destroy() is called, otherwise free items sit in the >>> pool->hold_items or pool->release_items by using 'lentry': >>> >>> struct page_pool_item { >>>     unsigned long state; >>> >>>     union { >>>         netmem_ref pp_netmem; >>>         struct llist_node lentry; >>>     }; >>> }; >> >> pahole  -C page_pool_item vmlinux >> struct page_pool_item { >>     /* An 'encoded_next' is a pointer to next item, lower 2 bits is used to >>      * indicate the state of current item. >>      */ >>     long unsigned int          encoded_next;     /*     0     8 */ >>     union { >>         netmem_ref         pp_netmem;        /*     8     8 */ >>         struct llist_node  lentry;           /*     8     8 */ >>     };                                           /*     8     8 */ >> >>     /* size: 16, cachelines: 1, members: 2 */ >>     /* last cacheline: 16 bytes */ >> }; >> >> >>> When a page is added to the page_pool, a item is deleted from pool->hold_items >>> or pool->release_items and set the 'pp_netmem' pointing to that page and set >>> 'state' accordingly in order to keep track of that page. >>> >>> When a page is deleted from the page_pool, it is able to tell which page_pool >>> this page belong to by using the below function, and after clearing the 'state', >>> the item is added back to pool->release_items so that the item is reused for new >>> pages. >>> >> >> To understand below, I'm listing struct page_pool_item_block for other >> reviewers: >> >> pahole  -C page_pool_item_block vmlinux >> struct page_pool_item_block { >>     struct page_pool *         pp;               /*     0     8 */ >>     struct list_head           list;             /*     8    16 */ >>     unsigned int               flags;            /*    24     4 */ >>     refcount_t                 ref;              /*    28     4 */ >>     struct page_pool_item      items[];          /*    32     0 */ >> >>     /* size: 32, cachelines: 1, members: 5 */ >>     /* last cacheline: 32 bytes */ >> }; >> >>> static inline struct page_pool_item_block * >>> page_pool_item_to_block(struct page_pool_item *item) >>> { >>>     return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK); >> >> This trick requires some comments explaining what is going on! >> Please correct me if I'm wrong: Here you a masking off the lower bits of >> the pointer to page_pool_item *item, as you know that a struct >> page_pool_item_block is stored in the top of a struct page.  This trick >> is like a "container_of" for going from page_pool_item to >> page_pool_item_block, right? > > Yes, you are right. > >> >> I do notice that you have a comment above struct page_pool_item_block >> (that says "item_block is always PAGE_SIZE"), which is nice, but to be >> more explicit/clear: >>  I want a big comment block (placed above the main code here) that >> explains the design and intention behind this newly invented >> "item-block" scheme, like e.g. the connection between >> page_pool_item_block and page_pool_item. Like the advantage/trick that >> allows page->pp pointer to be an "item" and be mapped back to a "block" >> to find the page_pool object it belongs to.  Don't write *what* the code >> does, but write about the intended purpose and design reasons behind the >> code. > > The comment for page_pool_item_block is below, it seems I also wrote about > intended purpose and design reasons here. > > /* The size of item_block is always PAGE_SIZE, so that the address of item_block > * for a specific item can be calculated using 'item & PAGE_MASK' > */ > > Anyway, If putting something like above for page_pool_item_to_block() does > make it clearer, will add some comment for page_pool_item_to_block() too. > >> >> >>> } >>> >>>   static inline struct page_pool *page_pool_get_pp(struct page *page) >>>   { >>>        return page_pool_item_to_block(page->pp_item)->pp; >>>   } >>> >>> >>>> >>>>> +static void __page_pool_item_init(struct page_pool *pool, struct page *page) >>>>> +{ >>>> >>>> Function name is confusing.  First I though this was init'ing a single >>>> item, but looking at the code it is iterating over ITEMS_PER_PAGE. >>>> >>>> Maybe it should be called page_pool_item_block_init ? >>> >>> The __page_pool_item_init() is added to make the below >>> page_pool_item_init() function more readable or maintainable, changing >>> it to page_pool_item_block_init doesn't seems consistent? >> >> You (of-cause) also have to rename the other function, I though that was >> implicitly understood. >> >> BUT does my suggested rename make sense?  What I'm seeing is that all >> the *items* in the "block" is getting inited. But we are also setting up >> the "block" (e.g.  "block->pp=pool"). > > I am not really sure about that, as using the PAGE_SIZE block to hold the > item seems like a implementation detail, which might change in the future, > renaming other function to something like that doesn't seem right to me IMHO. > > Also the next patch will add page_pool_item_blk_add() to support unlimited > inflight pages, it seems a better name is needed for that too, perheps rename > page_pool_item_blk_add() to page_pool_dynamic_item_add()? > Hmmm... not sure about this. I think I prefer page_pool_item_blk_add() over page_pool_dynamic_item_add(). > For __page_pool_item_init(), perhaps just inline it back to page_pool_item_init() > as __page_pool_item_init() is only used by page_pool_item_init(), and both of them > are not really large function. I like that you had a helper function. So, don't merge __page_pool_item_init() into page_pool_item_init() just to avoid naming it differently. Let me be more explicit what I'm asking for: IMHO you should rename: - __page_pool_item_init() to __page_pool_item_block_init() and rename: - page_pool_item_init() to page_pool_item_block_init() I hope this make it more clear what I'm saying. --Jesper