From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97064C36008 for ; Wed, 26 Mar 2025 13:54:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4ED7D280082; Wed, 26 Mar 2025 09:54:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49D5A280069; Wed, 26 Mar 2025 09:54:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38E23280082; Wed, 26 Mar 2025 09:54:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1A961280069 for ; Wed, 26 Mar 2025 09:54:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 415ED14017F for ; Wed, 26 Mar 2025 13:54:37 +0000 (UTC) X-FDA: 83263847394.11.BCAF90E Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf23.hostedemail.com (Postfix) with ESMTP id 7902414000A for ; Wed, 26 Mar 2025 13:54:35 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PbyAtgkG; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of hawk@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=hawk@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742997275; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Gv0HLqBxkV6t7HR+bQcUZ2Ojms5FTREQiGmb1w0vPpg=; b=HG9+yLfjZkZyUa2iN3ddpyqw7sDWkyrOztamWouicaZPpWw3SGN2MrgQ+O9bz8WR1titKX L4gjPiHqDGze6qylW0uwJRDsHmP02JHnH4T62h6O2Xal2N0lXMHb1py2JjXOS1+s+aFCVA EVdtNPHmDbVL5bOser3DenV+ZzvgyeQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PbyAtgkG; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of hawk@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=hawk@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742997275; a=rsa-sha256; cv=none; b=HHAMWXkSTzojyrmZ6nI4zKIsm/f9j7kRpEki6JGNMRFG8YMhEihp2OYFOD5N6znOkXGn38 fnE+o1BCkPL/FYHHoxvX3nMlKLzSrBDwEa05/Xcw+0H+CI7Ujpi8GVMrfIRr7r+oOjenl/ F78pAX/oZBHHkQ9owD7WvKlSF+AmCU0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id E10F86112F; Wed, 26 Mar 2025 13:54:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C49BC4CEE2; Wed, 26 Mar 2025 13:54:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742997274; bh=MITqSr2jxezS0QyojVv0ctDgJnQhqNl2NQ/2YZtbfcg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=PbyAtgkGc+aKaImYhRMQjv+zRBhwIak2tCy9xSaIpgTDPzvjlfsgexNGHffaTrHQ4 Qa42mNEgB5FWzqoY/Kcobv23eDqoxmSOyahMfSZeiuCDjXQYIvz+ChC4ZKZ80ro1EW z3d2wMl49SRmGb0guBH9CyEvZ+BGMw76RpFBxL8DMYfpv3ROji0In7+P+KVpc/1JsP SpZe1W+c4V6JxmoUozZo8XRQrUMd+fRcimXegdLuL7YmOX7+0TniZRuCGVHQDUfvge gZziZstnk7KgSv7qSjhWb+6b/DhcgSYpfcRXub9Nj6LRNpoXVaqaeOHmUWJ8HBKO9P eBH3H93mNqU/w== Message-ID: Date: Wed, 26 Mar 2025 14:54:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v2 3/3] page_pool: Track DMA-mapped pages and unmap them when destroying the pool To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , "David S. Miller" , Jakub Kicinski , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Andrew Lunn , Eric Dumazet , Paolo Abeni , Ilias Apalodimas , Simon Horman , Andrew Morton , Mina Almasry , Yonglong Liu , Yunsheng Lin , Pavel Begunkov , Matthew Wilcox Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-rdma@vger.kernel.org, linux-mm@kvack.org, Qiuling Ren , Yuying Ma References: <20250325-page-pool-track-dma-v2-0-113ebc1946f3@redhat.com> <20250325-page-pool-track-dma-v2-3-113ebc1946f3@redhat.com> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <20250325-page-pool-track-dma-v2-3-113ebc1946f3@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7902414000A X-Stat-Signature: cuc6qn1jop7egzyyi43prnpi8eufkx35 X-Rspam-User: X-HE-Tag: 1742997275-898381 X-HE-Meta: U2FsdGVkX1/DHkGQEKmisyjNX9EWPoXQ6pu9HLbrykN8i8JZCaJAsSNAchkKPjXfgTMy9A1yUPI+t70EYzZ2x9sBZsNiPm/wvYhv1QZLoy+H0sy2e/M2ZEcN7yNhA+vkezaOkWC2JHQB3FGH72OYN6tuy+B6KCcYeImr42av+4PUPmLOg+f02/Ou7KPZaxDn4FHbCiX/5IqWCZPVJ0Wm+Vak47PdnucK/MfMkWRajybVtLJ92C6Cqx7h0UJqeGs0CrlqrUhGHJMWjXGLrtSefRDqGcTq0D27rLfiMu8/ClFX7c6hVmtUcMeXnnYXIudWnWtQfYFt2Pvh42Px8bSfuhe7SFraQe2xJhcR57i75+wOq7bckt0RfoHkJZRPyL0oJepfeHMg/vGtFWBYbf7gXoE/nhjkLlp4MXHFYgSScBTAP1LUpHnd4/XsPv5aMgT453cyfNczKNAjkFti96YwVhzxuj9vlvhHsPOiGfSG2xXzyRP6PT4iB1J4Y/E0wOhhqbopiawOBqIsngXGXJaTMw0Ax3mSX/uTL2MbHqGzWiAzUweUAp4QdKrV6ngL9s97MDlwVYRXpTMUU1sQZxPiIiKXZX5oOnT/xXntGnWUNnbFmEtjE8N5vVaiHABnyciNPGUhPKSkov70q95IUnSLOczqx/1tn6s+Iki4goxLy2RIOsJI575yZ6TZunUSxbhgUBmsLA0PqctFBthQ5uoZyb0sEfw7yLSB8HQA9Y6oRdw4d6p79s9pF6UKgwJ777jQajkgY0UDLdBm2P6ThLFchnM/9gnPJejzN+F00OYhows/plHlPXwwZbCCDAQmKvE8Qlsaebk38YfWXqn4+GKKJA0kE6cFMyQ+BVZ1Dmodr3bD3hYYIe50VCgAtzE17iDXTfVetAlKga+fR000gDeDNPB9rTcVQzWsOT2GZwQ1zlA56IEZLmPQnMPlRDmYx9OjfLa7/a+LgzQifKQYZbV j8yBRbXt y4FuKzBFMQJENtbENnsTezTpLYA7SJ7yOUICb6oeW5jqZtn58cBGdtDwKY0TNcEm37UilWU7+ZWGXD2VFRW5nigPTusBNco6Hr6iRp3HofJB3oj8IvyDlAfAzPWq3BkbilPIA3XZ9+8HtTCZ88u9VgPCW/WFdmMLQ+fqtaq01FtRSrkTpl0qfaMQ5PHcDSCq8xAE47ZPiPQk/s1zA3YzLQFN5W1nYqNl3l4Ts5pz6RdHACSW7lLKxs8/2NXKYq3cWHKYTcEWyROY6tvwqAOXNrAeNOKNTxhiE93wiS5zAXDGA/cys1Ki9PT636yz1sO0Yz41PVZVSSonFe17FpDU1rYoHa9Aag1pq15qdIkz0ZVg+P3AW2DtxCNShSvMvrYAWXpkEdR7WO642aCg+d3IFctg9IjfZ+VEVRsgXAkg32FhsDITPXboR+zjwtQZxf4sWdRgebCqz+zoWoirAPc0KrfjC3rNpbs1umjYEy12hbZ6lCagstqqyYxxht2fiIQgK2d0VutyALmcbK26mzHgcsuVHE9J0Ys+wIwo5wFGPhchbbMzbFzzy9kflztm2kZF/gVgC4M0sry2uaI0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 25/03/2025 16.45, Toke Høiland-Jørgensen wrote: > When enabling DMA mapping in page_pool, pages are kept DMA mapped until > they are released from the pool, to avoid the overhead of re-mapping the > pages every time they are used. This causes resource leaks and/or > crashes when there are pages still outstanding while the device is torn > down, because page_pool will attempt an unmap through a non-existent DMA > device on the subsequent page return. > > To fix this, implement a simple tracking of outstanding DMA-mapped pages > in page pool using an xarray. This was first suggested by Mina[0], and > turns out to be fairly straight forward: We simply store pointers to > pages directly in the xarray with xa_alloc() when they are first DMA > mapped, and remove them from the array on unmap. Then, when a page pool > is torn down, it can simply walk the xarray and unmap all pages still > present there before returning, which also allows us to get rid of the > get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional > synchronisation is needed, as a page will only ever be unmapped once. > > To avoid having to walk the entire xarray on unmap to find the page > reference, we stash the ID assigned by xa_alloc() into the page > structure itself, using the upper bits of the pp_magic field. This > requires a couple of defines to avoid conflicting with the > POINTER_POISON_DELTA define, but this is all evaluated at compile-time, > so does not affect run-time performance. The bitmap calculations in this > patch gives the following number of bits for different architectures: > > - 23 bits on 32-bit architectures > - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE) > - 32 bits on other 64-bit architectures > > Stashing a value into the unused bits of pp_magic does have the effect > that it can make the value stored there lie outside the unmappable > range (as governed by the mmap_min_addr sysctl), for architectures that > don't define ILLEGAL_POINTER_VALUE. This means that if one of the > pointers that is aliased to the pp_magic field (such as page->lru.next) > is dereferenced while the page is owned by page_pool, that could lead to > a dereference into userspace, which is a security concern. The risk of > this is mitigated by the fact that (a) we always clear pp_magic before > releasing a page from page_pool, and (b) this would need a > use-after-free bug for struct page, which can have many other risks > since page->lru.next is used as a generic list pointer in multiple > places in the kernel. As such, with this patch we take the position that > this risk is negligible in practice. For more discussion, see[1]. > > Since all the tracking added in this patch is performed on DMA > map/unmap, no additional code is needed in the fast path, meaning the > performance overhead of this tracking is negligible there. A > micro-benchmark shows that the total overhead of the tracking itself is > about 400 ns (39 cycles(tsc) 395.218 ns; sum for both map and unmap[2]). > Since this cost is only paid on DMA map and unmap, it seems like an > acceptable cost to fix the late unmap issue. Further optimisation can > narrow the cases where this cost is paid (for instance by eliding the > tracking when DMA map/unmap is a no-op). > > The extra memory needed to track the pages is neatly encapsulated inside > xarray, which uses the 'struct xa_node' structure to track items. This > structure is 576 bytes long, with slots for 64 items, meaning that a > full node occurs only 9 bytes of overhead per slot it tracks (in > practice, it probably won't be this efficient, but in any case it should > be an acceptable overhead). > > [0]https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/ > [1]https://lore.kernel.org/r/20250320023202.GA25514@openwall.com > [2]https://lore.kernel.org/r/ae07144c-9295-4c9d-a400-153bb689fe9e@huawei.com > > Reported-by: Yonglong Liu > Closes:https://lore.kernel.org/r/8743264a-9700-4227-a556-5f931c720211@huawei.com > Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code") > Suggested-by: Mina Almasry > Reviewed-by: Mina Almasry > Reviewed-by: Jesper Dangaard Brouer > Tested-by: Jesper Dangaard Brouer > Tested-by: Qiuling Ren > Tested-by: Yuying Ma > Tested-by: Yonglong Liu > Signed-off-by: Toke Høiland-Jørgensen > --- > include/linux/poison.h | 4 +++ > include/net/page_pool/types.h | 49 +++++++++++++++++++++++--- > net/core/netmem_priv.h | 28 ++++++++++++++- > net/core/page_pool.c | 82 ++++++++++++++++++++++++++++++++++++------- > 4 files changed, 145 insertions(+), 18 deletions(-) Acked-by: Jesper Dangaard Brouer