Re: [PATCH 10/13] PCI/P2PDMA: support compound page in p2pmem_alloc_mmap()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	 linux-mm@kvack.org, linux-nvme@lists.infradead.org,
	 Bjorn Helgaas <bhelgaas@google.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	 Leon Romanovsky <leonro@nvidia.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	 Danilo Krummrich <dakr@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>,  Sagi Grimberg <sagi@grimberg.me>,
	houtao1@huawei.com
Subject: Re: [PATCH 10/13] PCI/P2PDMA: support compound page in p2pmem_alloc_mmap()
Date: Thu, 8 Jan 2026 16:20:58 +1100	[thread overview]
Message-ID: <ru35ev2clily7277fh2uwxuiellerlocfexhjkqim7stixuact@7fp5h7fdmz5h> (raw)
In-Reply-To: <20251220040446.274991-11-houtao@huaweicloud.com>

On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> From: Hou Tao <houtao1@huawei.com>
> 
> P2PDMA memory has already supported compound page and the helpers which
> support inserting compound page into vma is also ready, therefore, add
> support for compound page in p2pmem_alloc_mmap() as well. It will reduce
> the overhead of mmap() and get_user_pages() a lot when compound page is
> enabled for p2pdma memory.
> 
> The use of vm_private_data to save the alignment of p2pdma memory needs
> explanation. The normal way to get the alignment is through pci_dev. It
> can be achieved by either invoking kernfs_of() and sysfs_file_kobj() or
> defining a new struct kernfs_vm_ops to pass the kobject to the
> may_split() and ->pagesize() callbacks. The former approach depends too
> much on kernfs implementation details, and the latter would lead to
> excessive churn. Therefore, choose the simpler way of saving alignment
> in vm_private_data instead.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  drivers/pci/p2pdma.c | 48 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index e97f5da73458..4a133219ac43 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,25 @@ static unsigned long p2pmem_get_unmapped_area(struct file *filp, struct kobject
>  	return mm_get_unmapped_area(filp, uaddr, len, pgoff, flags);
>  }
>  
> +static int p2pmem_may_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	size_t align = (uintptr_t)vma->vm_private_data;
> +
> +	if (!IS_ALIGNED(addr, align))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static unsigned long p2pmem_pagesize(struct vm_area_struct *vma)
> +{
> +	return (uintptr_t)vma->vm_private_data;
> +}
> +
> +static const struct vm_operations_struct p2pmem_vm_ops = {
> +	.may_split = p2pmem_may_split,
> +	.pagesize = p2pmem_pagesize,
> +};
> +
>  static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		const struct bin_attribute *attr, struct vm_area_struct *vma)
>  {
> @@ -136,6 +155,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  	struct pci_p2pdma *p2pdma;
>  	struct percpu_ref *ref;
>  	unsigned long vaddr;
> +	size_t align;
>  	void *kaddr;
>  	int ret;
>  
> @@ -161,6 +181,16 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		goto out;
>  	}
>  
> +	align = p2pdma->align;
> +	if (vma->vm_start & (align - 1) || vma->vm_end & (align - 1)) {
> +		pci_info_ratelimited(pdev,
> +				     "%s: unaligned vma (%#lx~%#lx, %#lx)\n",
> +				     current->comm, vma->vm_start, vma->vm_end,
> +				     align);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>  	kaddr = (void *)gen_pool_alloc_owner(p2pdma->pool, len, (void **)&ref);
>  	if (!kaddr) {
>  		ret = -ENOMEM;
> @@ -178,7 +208,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  	}
>  	rcu_read_unlock();
>  
> -	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) {
> +	for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += align) {
>  		struct page *page = virt_to_page(kaddr);
>  
>  		/*
> @@ -188,7 +218,12 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		 */
>  		VM_WARN_ON_ONCE_PAGE(page_ref_count(page), page);
>  		set_page_count(page, 1);
> -		ret = vm_insert_page(vma, vaddr, page);
> +		if (align == PUD_SIZE)
> +			ret = vm_insert_folio_pud(vma, vaddr, page_folio(page));
> +		else if (align == PMD_SIZE)
> +			ret = vm_insert_folio_pmd(vma, vaddr, page_folio(page));

This doesn't look quite right to me - where do you initialise the folio
metadata? I'd expect a call to prep_compound_page() or some equivalent somewhere
- for example calling something like zone_device_page_init() to set the correct
folio order, etc.

 - Alistair

> +		else
> +			ret = vm_insert_page(vma, vaddr, page);
>  		if (ret) {
>  			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
>  			percpu_ref_put(ref);
> @@ -196,10 +231,15 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		}
>  		percpu_ref_get(ref);
>  		put_page(page);
> -		kaddr += PAGE_SIZE;
> -		len -= PAGE_SIZE;
> +		kaddr += align;
> +		len -= align;
>  	}
>  
> +	/* Disable unaligned splitting due to vma merge */
> +	vm_flags_set(vma, VM_DONTEXPAND);
> +	vma->vm_ops = &p2pmem_vm_ops;
> +	vma->vm_private_data = (void *)(uintptr_t)align;
> +
>  	percpu_ref_put(ref);
>  
>  	return 0;
> -- 
> 2.29.2
>

next prev parent reply	other threads:[~2026-01-08  5:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-20  4:04 [PATCH 00/13] Enable compound page for p2pdma memory Hou Tao
2025-12-20  4:04 ` [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails Hou Tao
2025-12-22 16:49   ` Logan Gunthorpe
2026-01-08  3:23   ` Alistair Popple
2026-01-08 15:55     ` Bjorn Helgaas
2026-01-09  0:41       ` Alistair Popple
2026-01-09 15:03         ` Bjorn Helgaas
2026-01-11 23:21           ` Alistair Popple
2026-01-12  0:12             ` Alistair Popple
2026-01-12  0:23               ` Alistair Popple
2025-12-20  4:04 ` [PATCH 02/13] PCI/P2PDMA: Fix the warning condition in p2pmem_alloc_mmap() Hou Tao
2025-12-22 16:50   ` Logan Gunthorpe
2026-01-07 14:39     ` Christoph Hellwig
2026-01-07 17:17       ` Bjorn Helgaas
2026-01-07 20:34         ` Bjorn Helgaas
2026-01-08 10:17           ` Christoph Hellwig
2026-01-08  3:28   ` Alistair Popple
2025-12-20  4:04 ` [PATCH 03/13] kernfs: add support for get_unmapped_area callback Hou Tao
2025-12-20 15:43   ` kernel test robot
2025-12-20 15:57   ` kernel test robot
2025-12-20  4:04 ` [PATCH 04/13] kernfs: add support for may_split and pagesize callbacks Hou Tao
2025-12-20  4:04 ` [PATCH 05/13] sysfs: support get_unmapped_area callback for binary file Hou Tao
2025-12-20  4:04 ` [PATCH 06/13] PCI/P2PDMA: add align parameter for pci_p2pdma_add_resource() Hou Tao
2025-12-20  4:04 ` [PATCH 07/13] PCI/P2PDMA: create compound page for aligned p2pdma memory Hou Tao
2026-01-08  5:14   ` Alistair Popple
2025-12-20  4:04 ` [PATCH 08/13] mm/huge_memory: add helpers to insert huge page during mmap Hou Tao
2025-12-20  4:04 ` [PATCH 09/13] PCI/P2PDMA: support get_unmapped_area to return aligned vaddr Hou Tao
2025-12-20  4:04 ` [PATCH 10/13] PCI/P2PDMA: support compound page in p2pmem_alloc_mmap() Hou Tao
2025-12-22 17:04   ` Logan Gunthorpe
2025-12-24  2:20     ` Hou Tao
2026-01-05 17:24       ` Logan Gunthorpe
2026-01-07 20:24     ` Jason Gunthorpe
2026-01-07 21:22       ` Logan Gunthorpe
2026-01-08  5:20   ` Alistair Popple [this message]
2025-12-20  4:04 ` [PATCH 11/13] PCI/P2PDMA: add helper pci_p2pdma_max_pagemap_align() Hou Tao
2025-12-20  4:04 ` [PATCH 12/13] nvme-pci: introduce cmb_devmap_align module parameter Hou Tao
2025-12-20 22:22   ` kernel test robot
2025-12-20  4:04 ` [PATCH 13/13] PCI/P2PDMA: enable compound page support for p2pdma memory Hou Tao
2025-12-22 17:10   ` Logan Gunthorpe
2025-12-21 12:19 ` [PATCH 00/13] Enable compound page " Leon Romanovsky
     [not found]   ` <416b2575-f5e7-7faf-9e7c-6e9df170bf1a@huaweicloud.com>
2025-12-24  1:37     ` Hou Tao
2025-12-24  9:22       ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ru35ev2clily7277fh2uwxuiellerlocfexhjkqim7stixuact@7fp5h7fdmz5h \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=kbusch@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=rafael@kernel.org \
    --cc=sagi@grimberg.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox