From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C864CC02198 for ; Fri, 14 Feb 2025 20:58:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 442D1280003; Fri, 14 Feb 2025 15:58:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F275280001; Fri, 14 Feb 2025 15:58:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BA80280003; Fri, 14 Feb 2025 15:58:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 09AD1280001 for ; Fri, 14 Feb 2025 15:58:23 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 89A7A140A55 for ; Fri, 14 Feb 2025 20:58:22 +0000 (UTC) X-FDA: 83119763244.16.3841405 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf26.hostedemail.com (Postfix) with ESMTP id 9C30A140007 for ; Fri, 14 Feb 2025 20:58:20 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uzP4d4fD; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739566700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2YgySpnqFZLhvEyVlrQicLthetC3cVznXnvyAWuRKPc=; b=ulJL3qCTPk9wxi9xlTq7P79llLGtXqxBhB+FKiIUyeG4oSkST8YFiX2/1UfpYUXs4zEJtF MPf/hbT31KKtrIT8P3XKlzcG+RaTtdr1IyVugYqxiVgUWyXGNeIX2KuOhh9cUrKhWAlYq1 9NjD4JjlpX9/uo9u24RkhmaxnNCJS3I= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uzP4d4fD; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739566700; a=rsa-sha256; cv=none; b=7ESSd4KeznScPzc9bmbiG6C3kGAxUBea2hR+dDy/ofRzrfSWI5pXy4UZMvoSYj/AYW+Msa LlH3ui1EyRplzoiEFvYR8B6n0VcJAfwVvZ4nqnqLowOraluzMSUZGQtJZy7cOBWs/Kj78l pnHx0mXQC4ZO3/ZtUBAv9qR66vsUMto= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-21f8c280472so7475ad.1 for ; Fri, 14 Feb 2025 12:58:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739566699; x=1740171499; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2YgySpnqFZLhvEyVlrQicLthetC3cVznXnvyAWuRKPc=; b=uzP4d4fDqw2xUFQiSJ19MeLXzUu4omJI18ykJ20a7WGsSLopXdeYhz+QRoyAyaXGfG UqdWXoAeKlTLmFIMmzqDb7jH+wKh/eIZlyRrYyqYa0+BGgZtfJst07UXBmZ+akhco6Eu v9S4YtQ7amcNOS7P83cKSJY8Os+sxXgbLmkiF4D8gwNC/Fx/Dz5FEL4f5MeS8FidV37E GCZTk6nkJX1bdUdjjzVAMPD5mipBqDpbBn2MoHsrGA3jYt5N9s3NWHrjUGbu6xF9um9c nsMdUtG4Utk75LjSXY9wktyz7j8hgi5bMHX+TSDhgxrLZE/yfjGBu5Gj3e+2MOh4eYLa zaxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739566699; x=1740171499; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2YgySpnqFZLhvEyVlrQicLthetC3cVznXnvyAWuRKPc=; b=WcexKtXfNAQ9kHNiNSVCU7JbMRw16sgi5qf2WGjp2RX5DzJU3dD7kzuEHNZ6DTLe/z U6T4h1jwVtV+tRqArmZK9EmTZvizBpGY6MoTisVuHcKdgSMOCAsyCzY99W1vXpmHiTwC woRrB2h/J8K0QX3R65NQIQ0puQDthv1CuT+hrZT2Az7zCQiXqvHJ9Y9r0olCIjayUeei fbUIS+RMuK+eOWd+6jqBGR6FGVQfYWiJN+6GAVoojm5T2rggmDfEH7CUjyF9S0ABI5Fm kJq3+kfvpg12u1Pr9lapxnbNYRaHuF/T4pSNO06MVr8OI0RwHCC0DyVePMR77Hi6A331 zBZA== X-Forwarded-Encrypted: i=1; AJvYcCWtRgX1xT6FKTlQqiQwE9iCQmqAZYkx5CQd+S+ZvWxiLLG7TPWke3vD5fQjEC6rxUliYvfrbGkoKQ==@kvack.org X-Gm-Message-State: AOJu0YzhYQMgop2QmQcKtxyZhhIOPPiPkWvQKEdTZxt81YSWRmo/dYVs gnC8FGaEtgFqc/1ri4lj/wlUHYKPVBwSjzux3MrxbHGnNa80h+v9pxplwsUYYuT19e0HLTfAoBU WWmd0lom7FYlMKO0eJnGVJlSrXzsOMvktnw0+ X-Gm-Gg: ASbGnctIMimwyadv95yd+mC0NAOebContsmnkVzDtIQcJii10Y/ET95TvI6P6JCcRuP y8gRySH7oEOspoIXPa2KBdyRGxwOSYkLCDA/WugNL/M/HhSXaec02xQfu2eDjqCQA1in3hailZ9 zHooPiFGE3yzTpM9O7MHC6kmpldGA= X-Google-Smtp-Source: AGHT+IHMo7WsTDcyDI2IJxk30OmWpNkq4lqd/LDPkpSnJQoBmryh3jggnLVgBNK9KRLZbjUR8YF1jff3s8KWZZmox3M= X-Received: by 2002:a17:902:d50d:b0:220:c905:689f with SMTP id d9443c01a7336-22104ec0a38mr369985ad.25.1739566699325; Fri, 14 Feb 2025 12:58:19 -0800 (PST) MIME-Version: 1.0 References: <20250212092552.1779679-1-linyunsheng@huawei.com> <20250212092552.1779679-3-linyunsheng@huawei.com> In-Reply-To: <20250212092552.1779679-3-linyunsheng@huawei.com> From: Mina Almasry Date: Fri, 14 Feb 2025 12:58:05 -0800 X-Gm-Features: AWEUYZkrCnzHRX-UM7YzAnXpZTXuHcolUoh7nm-MSMVw8gpzJwYFTphmx0X5nYE Message-ID: Subject: Re: [PATCH net-next v9 2/4] page_pool: fix IOMMU crash when driver has already unbound To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, zhangkun09@huawei.com, liuyonglong@huawei.com, fanghaiqing@huawei.com, Robin Murphy , Alexander Duyck , IOMMU , Andrew Morton , Eric Dumazet , Simon Horman , Jesper Dangaard Brouer , Ilias Apalodimas , linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 9C30A140007 X-Stat-Signature: j7ru1uxhfa3cy6ax8p1t4umc6ez8oyu9 X-HE-Tag: 1739566700-390340 X-HE-Meta: U2FsdGVkX18FIyn5G1jH+jBnrox6TmlOnB+mOI5Y4V1qD7RNkfzi2N0wzlzQaveIoDAbnExJQ991VvkbC/J9wrgOx+Sd9N1WJh+WIiRpHS5na8H/aAUDgyczTrcViI4EPodusca+OQZIiYCOOF65K9x+mxb3eKfFgaW6dlbZbBa73Ij6Myb0Nm8LX0j++Xnqo47ke6HV0C0oEvdgYdLmDK8jh9BoEZogTRO3AvwkJVfg4r9wnt1Q5RzQtJKpb9wU00E70QKFAatL/s0NjUOWij9Lru0N4FJ6Km7lMVWhr2CO7XotlLhZPYYiTnrpevedaWxTE99PRWngaS293/8cqQUlye5AdDEXZSePOLuCdpbrcYkZrV5VpyCR0Htm4P7cJhgU2JjUESgOMs+bYQ2Rbz9/USvz8dq1g/TWs5VdiccnbcdGssKUOiDqCKQ586S9HI4/wrgZc12DdOWeQqroLT5+dcDgXFyofWEW94U3ZX/Q4a2DW3kgpNzpdSakpbdxTfHxzI2aYAu0ydAW8c+q0Yk+0bjGCaDfxDKp6Xq5/WaERM66yoAM3Uoa+EQi4nYmS2Srfc/x3EIe22QU6ArEr1zjMNh/mVSETO2r4q1u19/l+KzPImfsHf/quJfEPJnotFg8ltgjg65U6yr/eJj/4uq0JBruTMSNf3WnXX1MD9Re5wi5djQNXH6HD8dmfUj/MrHkriIp0Bjiy9OIgJgUlY0bnbfVeIdragz0bQJHNDWwyn2aALC63sgP1RgqFlvXLUwc+TPELg1GjOWihn7juZVeGfDmHVcT61P+y/HrbfSySxhU711tZUCwzr0G977TH24RSxOAsDT5A9jBRdMtBii9I+9wUJ8/f2HIwXcmgZC3nPIBcRDDWWHxCSDaJnsUsMBpJsgzpDWowTOE3wgDWImejLWpg+feUtE9NFWrnlWsioMVuLW+0C3VqDp/O1xSkMOZwkvgTzLjf3ewD5p l8zdzhiJ V+0dm8pRrGGMk1ZetthxssroZkPitVMMfdrgUG/JAyTufnX7Nw0ZnTrq9vK8elRC4ZISh2/xLhwARBEjF+C/ib7CZmCDoG7cdnKNvbZCT2tUfQnUaFmYHb1y2fBL7njAUpLe51BiFo60f6QyVE0N84u9PKqLuCakZoCrVBBWrMQfTn1JSgPkTzA2ueNcNSWvjPt8UdxIKrNR00diqLUYqxKa3FKstf8sBNEdh+Bhi7NKcHpGiRravDX8Es6o2AmOIU8/KYncgnaCjuq5XwbfvQ7VgZzfRzWPvB0N+CzeYRUWygWagKrStvraHUPbGkN0uhPd7owTl6Cllq8BDGRfUxK9W7qWw1AOD/wLMlS5LvgXE/wdnvMupskI5tDB2gQxT+Ok/xhzGwYOwuILjpgOC+uDqI9+/mFYgV9Y4N8iZ68lSSJv0QDJWNsqNDAS5TWF9AitK4DqNPwR0pWddvwOTX1RbwA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 12, 2025 at 1:34=E2=80=AFAM Yunsheng Lin wrote: > > Networking driver with page_pool support may hand over page > still with dma mapping to network stack and try to reuse that > page after network stack is done with it and passes it back > to page_pool to avoid the penalty of dma mapping/unmapping. > With all the caching in the network stack, some pages may be > held in the network stack without returning to the page_pool > soon enough, and with VF disable causing the driver unbound, > the page_pool does not stop the driver from doing it's > unbounding work, instead page_pool uses workqueue to check > if there is some pages coming back from the network stack > periodically, if there is any, it will do the dma unmmapping > related cleanup work. > > As mentioned in [1], attempting DMA unmaps after the driver > has already unbound may leak resources or at worst corrupt > memory. Fundamentally, the page pool code cannot allow DMA > mappings to outlive the driver they belong to. > > Currently it seems there are at least two cases that the page > is not released fast enough causing dma unmmapping done after > driver has already unbound: > 1. ipv4 packet defragmentation timeout: this seems to cause > delay up to 30 secs. > 2. skb_defer_free_flush(): this may cause infinite delay if > there is no triggering for net_rx_action(). > > In order not to call DMA APIs to do DMA unmmapping after driver > has already unbound and stall the unloading of the networking > driver, use some pre-allocated item blocks to record inflight > pages including the ones which are handed over to network stack, > so the page_pool can do the DMA unmmapping for those pages when > page_pool_destroy() is called. As the pre-allocated item blocks > need to be large enough to avoid performance degradation, add a > 'item_fast_empty' stat to indicate the unavailability of the > pre-allocated item blocks. > > By using the 'struct page_pool_item' referenced by page->pp_item, > page_pool is not only able to keep track of the inflight page to > do dma unmmaping if some pages are still handled in networking > stack when page_pool_destroy() is called, and networking stack is > also able to find the page_pool owning the page when returning > pages back into page_pool: > 1. When a page is added to the page_pool, an item is deleted from > pool->hold_items and set the 'pp_netmem' pointing to that page > and set item->state and item->pp_netmem accordingly in order to > keep track of that page, refill from pool->release_items when > pool->hold_items is empty or use the item from pool->slow_items > when fast items run out. > 2. When a page is released from the page_pool, it is able to tell > which page_pool this page belongs to by masking off the lower > bits of the pointer to page_pool_item *item, as the 'struct > page_pool_item_block' is stored in the top of a struct page. And > after clearing the pp_item->state', the item for the released page > is added back to pool->release_items so that it can be reused for > new pages or just free it when it is from the pool->slow_items. > 3. When page_pool_destroy() is called, item->state is used to tell if > a specific item is being used/dma mapped or not by scanning all the > item blocks in pool->item_blocks, then item->netmem can be used to > do the dma unmmaping if the corresponding inflight page is dma > mapped. > > The overhead of tracking of inflight pages is about 10ns~20ns, > which causes about 10% performance degradation for the test case > of time_bench_page_pool03_slow() in [2]. > > Note, the devmem patchset seems to make the bug harder to fix, > and may make backporting harder too. As there is no actual user > for the devmem and the fixing for devmem is unclear for now, > this patch does not consider fixing the case for devmem yet. > > 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kern= el.org/T/ > 2. https://github.com/netoptimizer/prototype-kernel > CC: Robin Murphy > CC: Alexander Duyck > CC: IOMMU > Fixes: f71fec47c2df ("page_pool: make sure struct device is stable") > Signed-off-by: Yunsheng Lin > Tested-by: Yonglong Liu [...] > + > +/* The size of item_block is always PAGE_SIZE, so that the address of it= em_block > + * for a specific item can be calculated using 'item & PAGE_MASK' > + */ > +struct page_pool_item_block { > + struct page_pool *pp; > + struct list_head list; > + struct page_pool_item items[]; > +}; > + I think this feedback was mentioned in earlier iterations of the series: Can we not hold a struct list_head in the page_pool that keeps track of inflight netmems that we need to dma-unmap on page_pool_destroy? Why do we have to modify the pp entry in the struct page and struct net_iov? The decision to modify pp entry in struct page and struct net_iov is making this patchset bigger and harder to review IMO. --=20 Thanks, Mina