From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 109F4CCF9E3 for ; Tue, 11 Nov 2025 09:58:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6837E8E0017; Tue, 11 Nov 2025 04:58:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65B838E0002; Tue, 11 Nov 2025 04:58:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 571228E0017; Tue, 11 Nov 2025 04:58:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 472C88E0002 for ; Tue, 11 Nov 2025 04:58:14 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 357FEC049E for ; Tue, 11 Nov 2025 09:58:13 +0000 (UTC) X-FDA: 84097875666.02.0D3A6B0 Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf05.hostedemail.com (Postfix) with ESMTP id 67F71100002 for ; Tue, 11 Nov 2025 09:58:11 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PmgEhAdi; spf=pass (imf05.hostedemail.com: domain of leon@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1762855091; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e+fnm6mQjjfI6CLNPrnbRhwzlAxLZ9sUR0EdS1YV8rw=; b=JILuZ7ZIp/VqxLPH2fIK2MLgUIWNffKUC2cebJkh58VOj+rFxvOosaYs4KvwhhuRvUdlqx C2jotX8vBzojuxGdXE5bValjmtoS110v9axcY2my+JQwN+Zn+oaNobaqaYMo1ofMrnlsmh uhRgJtA33dvqsj79Gs5zXdTS46P4b7Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1762855091; a=rsa-sha256; cv=none; b=41SuXzkpxyVaN0nlOwEtVmJlIQgTL9AVXBFJnb3D38mSfC99yYu6sS5FuWj3QZ3YFbgNJZ Z/5tN1GKcsea+sKiR6D4NYMOdWUxsd3GOxxYCuW0UYafxtO4vqHN92Khn9sYEyonbigKsl auLLW4NHWyz8dOCEOZo7lkkEuSgLjxM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PmgEhAdi; spf=pass (imf05.hostedemail.com: domain of leon@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=leon@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 360984474D; Tue, 11 Nov 2025 09:58:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E9D5C116D0; Tue, 11 Nov 2025 09:58:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762855090; bh=kzHmI0xoFIpZL37C7mUc4yRfRqSiPY61Bd6I2CAG5BQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PmgEhAdiAuknJ3d5ajb7CJ44R+1A0rSDPklVME4iaUomQB31FG+KCOBCAgci2vmj3 ZasIkE5ZuEL9SL0HvrQyyEHFyDpEh8lpKvE/50NK1x18vogItDofv7zp29/P6/Fu3g QSEsMLijkj0dRSfia8G4TgdmsUPOXUB85WHwqEOgLu9EjOQM+q6bAQC6rTc0JKrI0h t3VUfR+6g14cA2jQoElHhNtw621wt34HZLMUp7+U57f+Zxxs8dQ5O38/GVv9rrdDVa 3aLJ8KD/Z/IW1YC9PZ8KSnugB02gI6R4GRg1Jv9Zj0gCz9HJtlfayA4NsVfTmn1/AR auPnbCkVPmpmA== From: Leon Romanovsky To: Bjorn Helgaas , Logan Gunthorpe , Jens Axboe , Robin Murphy , Joerg Roedel , Will Deacon , Marek Szyprowski , Jason Gunthorpe , Leon Romanovsky , Andrew Morton , Jonathan Corbet , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= , Kees Cook , "Gustavo A. R. Silva" , Ankit Agrawal , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson Cc: Krishnakant Jaju , Matt Ochs , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, iommu@lists.linux.dev, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, Alex Mastro , Nicolin Chen Subject: [PATCH v8 01/11] PCI/P2PDMA: Separate the mmap() support from the core logic Date: Tue, 11 Nov 2025 11:57:43 +0200 Message-ID: <20251111-dmabuf-vfio-v8-1-fd9aa5df478f@nvidia.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: <20251111-dmabuf-vfio-v8-0-fd9aa5df478f@nvidia.com> References: <20251111-dmabuf-vfio-v8-0-fd9aa5df478f@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Mailer: b4 0.15-dev-3ae27 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspam-User: X-Rspamd-Queue-Id: 67F71100002 X-Stat-Signature: cm1xn338yh3y68wkrsm4366ponfsd13n X-HE-Tag: 1762855091-796215 X-HE-Meta: U2FsdGVkX184SIieohBnMqXd+dYQfp1GitEun0FNzL6kyPAn+PfZv2Ihe6s6iEcEVIQbOqpudmpDp/5vjFRl7LO5h7PVu8eG/TpdekRjbXhg0lvP8k5paYoAqmL2Ct7kPfTJ7iTyMWzXFe3EgchIpcLgPn1fIAXFSPg4hFr1YAzt+SdzvTRZiMXfu2p0CSmz1BXZk8AmrN3vryp0R0TulwUmEeaNLT1RBn1UUoT/NNI8I/BmXmzOnubNPvSlyFnFfyGISmAEE/P/NjkXEdRyesjO+IfhOOQHaPmD6moxhazizYF48+X5LzRcnEbakwcINUtNdU4zbQ62yiBgtqb/odjlIq4GgRxeP6L1TwcLQDFeBxcfvhFWIND/oK+tYAvHjLF9tw1nTkT8cXwowSG1ZiDAql7Mgsxhvuic7FxE+40faf/1xanc2g/CX9/vwoCweVpXRi5FJdBpkjhrJZoEJOIM9hZD12hqcbmAaECBwlymOk2pOn1ydJr0rW27k3UjS9fzw2Jl7Sja9W7MRh7Wc7f/+q5amWjK/G9se4bPheG6l1Ypal6RNPAinFkQyDQdYnAKZ6Or+z9QI1bXD87QClvLTvXdnhFZpEEo/IGnSlwk/7JdGcGunzj4cCdW6wzf/YlOnHkwM1saFlV0gHAkeNQeBBuBxV+zP80Qzy+jRiVHHGOamAy6LhiuCe01CWvif5TuIU737/KeN5SAtMYC9mZJSPgUJdj74wV5IIRHV+I7azebSzHyp9/Cl6qvux3zmGDd6I9bx4jLIGJ5g4TOLNSZYa8UrrQNkNFfpgbz2onRBAB+b6f4rUd8Kq/Ta+SBzBPh6UDIo9a4O2yITq+O8h7JdrivJWvUelKBCNYabNP/PVkn/CVJMT4y2UsKw5+YVHeAPa+/doOhqbJqmYAH+EDcS1XYmxTsd3RfznIQP6of7xazbiBTIze7mPAcgAvvK7ki7517W0RjsvycQbR 451ex57d udq/H8xzkjSZ/e6FpXSK4HcJb1Yq7TJeuBCXkLv4MK1fzcaUtPPJm2kbKQl/NMnKKI+bc9x8B6daQSfWK/24vEeJX6z3KudLqrgXVcN/dCiPKAotSM3IOg9xkXv9JSofqUx/UpGJ7mMRJ8KEeYBOambZwg3qL1nNZjIljwdf9hqy4EAtda8MXiL00weuXYxlI79lQdgYeMD4rukGcg75gDGu9ojVhqImPugcz4kGLfur34BNsOc3RLGpdsU6WmTy6pFc01OeYC+jSePoIf+0m0zyKdzTB+Wj3L2UVvDcICPkvxi/aelO8POoyuP4pXHNZG8n1FlCeiZ91K9lrHYgSCqT8ww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Leon Romanovsky Currently the P2PDMA code requires a pgmap and a struct page to function. The was serving three important purposes: - DMA API compatibility, where scatterlist required a struct page as input - Life cycle management, the percpu_ref is used to prevent UAF during device hot unplug - A way to get the P2P provider data through the pci_p2pdma_pagemap The DMA API now has a new flow, and has gained phys_addr_t support, so it no longer needs struct pages to perform P2P mapping. Lifecycle management can be delegated to the user, DMABUF for instance has a suitable invalidation protocol that does not require struct page. Finding the P2P provider data can also be managed by the caller without need to look it up from the phys_addr. Split the P2PDMA code into two layers. The optional upper layer, effectively, provides a way to mmap() P2P memory into a VMA by providing struct page, pgmap, a genalloc and sysfs. The lower layer provides the actual P2P infrastructure and is wrapped up in a new struct p2pdma_provider. Rework the mmap layer to use new p2pdma_provider based APIs. Drivers that do not want to put P2P memory into VMA's can allocate a struct p2pdma_provider after probe() starts and free it before remove() completes. When DMA mapping the driver must convey the struct p2pdma_provider to the DMA mapping code along with a phys_addr of the MMIO BAR slice to map. The driver must ensure that no DMA mapping outlives the lifetime of the struct p2pdma_provider. The intended target of this new API layer is DMABUF. There is usually only a single p2pdma_provider for a DMABUF exporter. Most drivers can establish the p2pdma_provider during probe, access the single instance during DMABUF attach and use that to drive the DMA mapping. DMABUF provides an invalidation mechanism that can guarantee all DMA is halted and the DMA mappings are undone prior to destroying the struct p2pdma_provider. This ensures there is no UAF through DMABUFs that are lingering past driver removal. The new p2pdma_provider layer cannot be used to create P2P memory that can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and so on. These use cases must still use the mmap() layer. The p2pdma_provider layer is principally for DMABUF-like use cases where DMABUF natively manages the life cycle and access instead of vmas/pin_user_pages()/struct page. In addition, remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Signed-off-by: Jason Gunthorpe Tested-by: Alex Mastro Tested-by: Nicolin Chen Signed-off-by: Leon Romanovsky --- drivers/pci/p2pdma.c | 43 +++++++++++++++++++++++-------------------- include/linux/pci-p2pdma.h | 19 ++++++++++++++----- 2 files changed, 37 insertions(+), 25 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 78e108e47254..59cd6fb40e83 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -28,9 +28,8 @@ struct pci_p2pdma { }; struct pci_p2pdma_pagemap { - struct pci_dev *provider; - u64 bus_offset; struct dev_pagemap pgmap; + struct p2pdma_provider mem; }; static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) @@ -204,8 +203,8 @@ static void p2pdma_page_free(struct page *page) { struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_pgmap(page)); /* safe to dereference while a reference is held to the percpu ref */ - struct pci_p2pdma *p2pdma = - rcu_dereference_protected(pgmap->provider->p2pdma, 1); + struct pci_p2pdma *p2pdma = rcu_dereference_protected( + to_pci_dev(pgmap->mem.owner)->p2pdma, 1); struct percpu_ref *ref; gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page), @@ -270,14 +269,15 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) static void pci_p2pdma_unmap_mappings(void *data) { - struct pci_dev *pdev = data; + struct pci_p2pdma_pagemap *p2p_pgmap = data; /* * Removing the alloc attribute from sysfs will call * unmap_mapping_range() on the inode, teardown any existing userspace * mappings and prevent new ones from being created. */ - sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr, + sysfs_remove_file_from_group(&p2p_pgmap->mem.owner->kobj, + &p2pmem_alloc_attr.attr, p2pmem_group.name); } @@ -328,10 +328,9 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, pgmap->nr_range = 1; pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; pgmap->ops = &p2pdma_pgmap_ops; - - p2p_pgmap->provider = pdev; - p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) - - pci_resource_start(pdev, bar); + p2p_pgmap->mem.owner = &pdev->dev; + p2p_pgmap->mem.bus_offset = + pci_bus_address(pdev, bar) - pci_resource_start(pdev, bar); addr = devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -340,7 +339,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, } error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings, - pdev); + p2p_pgmap); if (error) goto pages_free; @@ -972,16 +971,16 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, - struct device *dev) +static enum pci_p2pdma_map_type +pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev) { enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED; - struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; + struct pci_dev *pdev = to_pci_dev(provider->owner); struct pci_dev *client; struct pci_p2pdma *p2pdma; int dist; - if (!provider->p2pdma) + if (!pdev->p2pdma) return PCI_P2PDMA_MAP_NOT_SUPPORTED; if (!dev_is_pci(dev)) @@ -990,7 +989,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, client = to_pci_dev(dev); rcu_read_lock(); - p2pdma = rcu_dereference(provider->p2pdma); + p2pdma = rcu_dereference(pdev->p2pdma); if (p2pdma) type = xa_to_value(xa_load(&p2pdma->map_types, @@ -998,7 +997,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, rcu_read_unlock(); if (type == PCI_P2PDMA_MAP_UNKNOWN) - return calc_map_type_and_dist(provider, client, &dist, true); + return calc_map_type_and_dist(pdev, client, &dist, true); return type; } @@ -1006,9 +1005,13 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page) { - state->pgmap = page_pgmap(page); - state->map = pci_p2pdma_map_type(state->pgmap, dev); - state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page_pgmap(page)); + + if (state->mem == &p2p_pgmap->mem) + return; + + state->mem = &p2p_pgmap->mem; + state->map = pci_p2pdma_map_type(&p2p_pgmap->mem, dev); } /** diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 951f81a38f3a..1400f3ad4299 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -16,6 +16,16 @@ struct block_device; struct scatterlist; +/** + * struct p2pdma_provider + * + * A p2pdma provider is a range of MMIO address space available to the CPU. + */ +struct p2pdma_provider { + struct device *owner; + u64 bus_offset; +}; + #ifdef CONFIG_PCI_P2PDMA int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); @@ -139,11 +149,11 @@ enum pci_p2pdma_map_type { }; struct pci_p2pdma_map_state { - struct dev_pagemap *pgmap; + struct p2pdma_provider *mem; enum pci_p2pdma_map_type map; - u64 bus_off; }; + /* helper for pci_p2pdma_state(), do not use directly */ void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page); @@ -162,8 +172,7 @@ pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev, struct page *page) { if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { - if (state->pgmap != page_pgmap(page)) - __pci_p2pdma_update_state(state, dev, page); + __pci_p2pdma_update_state(state, dev, page); return state->map; } return PCI_P2PDMA_MAP_NONE; @@ -181,7 +190,7 @@ static inline dma_addr_t pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t paddr) { WARN_ON_ONCE(state->map != PCI_P2PDMA_MAP_BUS_ADDR); - return paddr + state->bus_off; + return paddr + state->mem->bus_offset; } #endif /* _LINUX_PCI_P2P_H */ -- 2.51.1