From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 241BAC02183 for ; Fri, 17 Jan 2025 11:56:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD3A8280003; Fri, 17 Jan 2025 06:56:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A5D4E280001; Fri, 17 Jan 2025 06:56:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FD4E280003; Fri, 17 Jan 2025 06:56:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 686A6280001 for ; Fri, 17 Jan 2025 06:56:54 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1B8861613C4 for ; Fri, 17 Jan 2025 11:56:54 +0000 (UTC) X-FDA: 83016792348.22.29F54B0 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf02.hostedemail.com (Postfix) with ESMTP id 8E77280008 for ; Fri, 17 Jan 2025 11:56:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737115012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ssHexGuLbqfBHylJ1jH+TZRwdnQC1tVSiBknCJtEcq8=; b=q+zx/6Ns93t2vq7sDNXlIN9TeD9NZekdANsfHVQgo4b+oWxXIK0zHnPjwwRDD0QxaiYTam De8ndSHzhjjZHVhWJRdngxskTlGQxJ3zYTLZGjT4TZ3/vsMt37qgAcSqGc6sWMxLSujpvN eT6AbAeeWn172MG71a+uVEIN1iHuZok= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737115012; a=rsa-sha256; cv=none; b=sfll5Sq3VMZ2YRfAsDGN9mleKMr3oKIdqHIqVwKcEQxQ3c5VDYWtTRqWEJ4rIBBYPIyXxB 3QPSwtqeGhqsWWzL//kGrHDyrS5WmEZlmL+Rm0ZN7XnzHay/uryfkZaQwdsUiYP+iOkm1o UqZM4c5S5h398K5NQBxcxyGsfZPxmcQ= Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4YZJ7117hBz22l21; Fri, 17 Jan 2025 19:54:21 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id F21F11400CB; Fri, 17 Jan 2025 19:56:45 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 17 Jan 2025 19:56:43 +0800 Message-ID: <92bb3a19-a619-4bf7-9ef5-b9eb12a57983@huawei.com> Date: Fri, 17 Jan 2025 19:56:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound To: Jesper Dangaard Brouer , , , CC: , , , Robin Murphy , Alexander Duyck , IOMMU , Andrew Morton , Eric Dumazet , Simon Horman , Ilias Apalodimas , , , , kernel-team References: <20250110130703.3814407-1-linyunsheng@huawei.com> <20250110130703.3814407-4-linyunsheng@huawei.com> <921c827c-41b7-40af-8c01-c21adbe8f41f@kernel.org> <2b5a58f3-d67a-4bf7-921a-033326958ac6@huawei.com> <95f258b2-52f5-4a80-a670-b9a182caec7c@kernel.org> Content-Language: en-US From: Yunsheng Lin In-Reply-To: <95f258b2-52f5-4a80-a670-b9a182caec7c@kernel.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8E77280008 X-Stat-Signature: 6axs4j46fphfg7tp86b3f874jptiew7x X-HE-Tag: 1737115010-107088 X-HE-Meta: U2FsdGVkX196PFhC/D6+btfoIDJj4k4nI0tR35HtvV2wjjRDCk/bFfvrDHVXJKFKII4nbsl46exm3Nv4kI+4ktmbiZkv2qRZSvHINr1khyIjUvP1Ck3jyv0SrTis5qdw8rOIyOI4Pm5AdWcFbiScfK6/wt4S2972YdL0vp1tH6oJFUKM+b+ZDCojyBzY5/WsEq2st+3d9i2ACY2TRJnN2IRr28+nKDeNlS4NVLQBR6hIPSh8UDPW3HZqIbI6SWhB1vp28gyyRQNfQadri/4kOg4El/MzmKCSqTE+gLyMbpZ0oy9PtOVzd7wad7OmWfXRT3N38lI1G3EBVLJhkGkK6MNdQvehhWbMIw5wan9q/dR1djGuS+VyDrdidR427pOlbHVGW+PwJdXB0pozin4dDogmyR+Rh75AciQ85E+RVYcZBGjbExAusoDdkSwHKKIgZxQf4HOC69RNByQTLWdlcQ13JmHgxEWMFiQvw357qKNP6uQGGl+hVGGZ4RRx6+fUGATmaAIK3EloA90ywsiv6cuXxoMUu6Kh7yan9cEY2b8D1a93m/bd4f7UIH/Lx2lpBNECrP9kKSxRt8xlfjA3H0QRoSHRIce3PwthRTc1tXIMqQqeasH4ElT+jr8H1/hG+PoF08u+gYMDhyjizKMlXztnMTP5s7WPb5xjOndUbzrczA/EENEyJlbFrwuyIgoNhFgSf9WC37Fymn1mKrOhct1Fdjfhl89PEw9ExCOJeHWM/2us+nWWQFNlrSru1Snu50QQwViGQUHOoiFVnPRkhDHVN2r9hy7LRrDPg8GwIpe3AXuFhhQGv/7YSE3zIGAJWBAC8sAjFDJjXdlfjbOTgfxnbV1WA9+xQJpxpRWRiNyE7Li1bPm1yVSeFvRZAXXQRBzvFughV4mX/YAAR2BzC0l706chqgQfYvOb6rtDqJ9bEp51g3P/FcFZ6XI/LD4TOmPOPEWwdCoqTcT9cTS N9uU3Cio w9KQCXHW+KZ0EZQyr/LFhl59Uf0vS3XD79gHAp0Jr0Xmgb/p1ICo6ZQAhapuLnlpkuc9o7biehCS8CijKd9rvGlaDniWbyu1YyF12yv3+6OLCQxe2qUbcCzdwZPq7DHM/V89Qmg2l0wqyx3Oou6HR5W16olYCnbBnZTuHdIYxsJUEB+mNyS3RAsBTxJI6UXCIbXcweedTne7nfXxVfWxZ1SFdZsLMwqiR16479LE4zGFciKucpEsM/ULLB0jRiP2eJiMZwO+jZIgRCY9b8bfz4r0CBb0wv0v/m5jFu7Sxf9Z6Xz/qBCFybmGVb8XLWi8st6Ps X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/1/17 0:09, Jesper Dangaard Brouer wrote: ... >> Mainly because there is no space available for keeping tracking of inflight >> pages, using page->pp can only find the page_pool owning the page, but page_pool >> is not able to keep track of the inflight page when the page is handled by >> networking stack. >> >> By using page_pool_item as below, the state is used to tell if a specific >> item is being used/dma mapped or not by scanning all the item blocks in >> pool->item_blocks. If a specific item is used by a page, then 'pp_netmem' >> will point to that page so that dma unmapping can be done for that page >> when page_pool_destroy() is called, otherwise free items sit in the >> pool->hold_items or pool->release_items by using 'lentry': >> >> struct page_pool_item { >>     unsigned long state; >>      >>     union { >>         netmem_ref pp_netmem; >>         struct llist_node lentry; >>     }; >> }; > > pahole  -C page_pool_item vmlinux > struct page_pool_item { >     /* An 'encoded_next' is a pointer to next item, lower 2 bits is used to >      * indicate the state of current item. >      */    >     long unsigned int          encoded_next;     /*     0     8 */ >     union { >         netmem_ref         pp_netmem;        /*     8     8 */ >         struct llist_node  lentry;           /*     8     8 */ >     };                                           /*     8     8 */ > >     /* size: 16, cachelines: 1, members: 2 */ >     /* last cacheline: 16 bytes */ > }; > > >> When a page is added to the page_pool, a item is deleted from pool->hold_items >> or pool->release_items and set the 'pp_netmem' pointing to that page and set >> 'state' accordingly in order to keep track of that page. >> >> When a page is deleted from the page_pool, it is able to tell which page_pool >> this page belong to by using the below function, and after clearing the 'state', >> the item is added back to pool->release_items so that the item is reused for new >> pages. >> > > To understand below, I'm listing struct page_pool_item_block for other > reviewers: > > pahole  -C page_pool_item_block vmlinux > struct page_pool_item_block { >     struct page_pool *         pp;               /*     0     8 */ >     struct list_head           list;             /*     8    16 */ >     unsigned int               flags;            /*    24     4 */ >     refcount_t                 ref;              /*    28     4 */ >     struct page_pool_item      items[];          /*    32     0 */ > >     /* size: 32, cachelines: 1, members: 5 */ >     /* last cacheline: 32 bytes */ > }; > >> static inline struct page_pool_item_block * >> page_pool_item_to_block(struct page_pool_item *item) >> { >>     return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK); > > This trick requires some comments explaining what is going on! > Please correct me if I'm wrong: Here you a masking off the lower bits of > the pointer to page_pool_item *item, as you know that a struct > page_pool_item_block is stored in the top of a struct page.  This trick > is like a "container_of" for going from page_pool_item to > page_pool_item_block, right? Yes, you are right. > > I do notice that you have a comment above struct page_pool_item_block > (that says "item_block is always PAGE_SIZE"), which is nice, but to be > more explicit/clear: >  I want a big comment block (placed above the main code here) that > explains the design and intention behind this newly invented > "item-block" scheme, like e.g. the connection between > page_pool_item_block and page_pool_item. Like the advantage/trick that > allows page->pp pointer to be an "item" and be mapped back to a "block" > to find the page_pool object it belongs to.  Don't write *what* the code > does, but write about the intended purpose and design reasons behind the > code. The comment for page_pool_item_block is below, it seems I also wrote about intended purpose and design reasons here. /* The size of item_block is always PAGE_SIZE, so that the address of item_block * for a specific item can be calculated using 'item & PAGE_MASK' */ Anyway, If putting something like above for page_pool_item_to_block() does make it clearer, will add some comment for page_pool_item_to_block() too. > > >> } >> >>   static inline struct page_pool *page_pool_get_pp(struct page *page) >>   { >>        return page_pool_item_to_block(page->pp_item)->pp; >>   } >> >> >>> >>>> +static void __page_pool_item_init(struct page_pool *pool, struct page *page) >>>> +{ >>> >>> Function name is confusing.  First I though this was init'ing a single >>> item, but looking at the code it is iterating over ITEMS_PER_PAGE. >>> >>> Maybe it should be called page_pool_item_block_init ? >> >> The __page_pool_item_init() is added to make the below >> page_pool_item_init() function more readable or maintainable, changing >> it to page_pool_item_block_init doesn't seems consistent? > > You (of-cause) also have to rename the other function, I though that was > implicitly understood. > > BUT does my suggested rename make sense?  What I'm seeing is that all > the *items* in the "block" is getting inited. But we are also setting up > the "block" (e.g.  "block->pp=pool"). I am not really sure about that, as using the PAGE_SIZE block to hold the item seems like a implementation detail, which might change in the future, renaming other function to something like that doesn't seem right to me IMHO. Also the next patch will add page_pool_item_blk_add() to support unlimited inflight pages, it seems a better name is needed for that too, perheps rename page_pool_item_blk_add() to page_pool_dynamic_item_add()? For __page_pool_item_init(), perhaps just inline it back to page_pool_item_init() as __page_pool_item_init() is only used by page_pool_item_init(), and both of them are not really large function.