From: Zi Yan <ziy@nvidia.com>
To: Jordan Niethe <jniethe@nvidia.com>
Cc: linux-mm@kvack.org, balbirs@nvidia.com, matthew.brost@intel.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org, david@redhat.com,
apopple@nvidia.com, lorenzo.stoakes@oracle.com, lyude@redhat.com,
dakr@kernel.org, airlied@gmail.com, simona@ffwll.ch,
rcampbell@nvidia.com, mpenttil@redhat.com, jgg@nvidia.com,
willy@infradead.org, linuxppc-dev@lists.ozlabs.org,
intel-xe@lists.freedesktop.org, jgg@ziepe.ca,
Felix.Kuehling@amd.com
Subject: Re: [PATCH v2 11/11] mm: Remove device private pages from the physical address space
Date: Tue, 20 Jan 2026 17:53:41 -0500 [thread overview]
Message-ID: <6C5F185E-BB12-4B01-8283-F2C956E84AA3@nvidia.com> (raw)
In-Reply-To: <c9afedc6-f763-410f-b78b-522b98122f06@nvidia.com>
On 20 Jan 2026, at 17:33, Jordan Niethe wrote:
> On 14/1/26 07:04, Zi Yan wrote:
>> On 7 Jan 2026, at 4:18, Jordan Niethe wrote:
>>
>>> Currently when creating these device private struct pages, the first
>>> step is to use request_free_mem_region() to get a range of physical
>>> address space large enough to represent the devices memory. This
>>> allocated physical address range is then remapped as device private
>>> memory using memremap_pages().
>>>
>>> Needing allocation of physical address space has some problems:
>>>
>>> 1) There may be insufficient physical address space to represent the
>>> device memory. KASLR reducing the physical address space and VM
>>> configurations with limited physical address space increase the
>>> likelihood of hitting this especially as device memory increases. This
>>> has been observed to prevent device private from being initialized.
>>>
>>> 2) Attempting to add the device private pages to the linear map at
>>> addresses beyond the actual physical memory causes issues on
>>> architectures like aarch64 meaning the feature does not work there.
>>>
>>> Instead of using the physical address space, introduce a device private
>>> address space and allocate devices regions from there to represent the
>>> device private pages.
>>>
>>> Introduce a new interface memremap_device_private_pagemap() that
>>> allocates a requested amount of device private address space and creates
>>> the necessary device private pages.
>>>
>>> To support this new interface, struct dev_pagemap needs some changes:
>>>
>>> - Add a new dev_pagemap::nr_pages field as an input parameter.
>>> - Add a new dev_pagemap::pages array to store the device
>>> private pages.
>>>
>>> When using memremap_device_private_pagemap(), rather then passing in
>>> dev_pagemap::ranges[dev_pagemap::nr_ranges] of physical address space to
>>> be remapped, dev_pagemap::nr_ranges will always be 1, and the device
>>> private range that is reserved is returned in dev_pagemap::range.
>>>
>>> Forbid calling memremap_pages() with dev_pagemap::ranges::type =
>>> MEMORY_DEVICE_PRIVATE.
>>>
>>> Represent this device private address space using a new
>>> device_private_pgmap_tree maple tree. This tree maps a given device
>>> private address to a struct dev_pagemap, where a specific device private
>>> page may then be looked up in that dev_pagemap::pages array.
>>>
>>> Device private address space can be reclaimed and the assoicated device
>>> private pages freed using the corresponding new
>>> memunmap_device_private_pagemap() interface.
>>>
>>> Because the device private pages now live outside the physical address
>>> space, they no longer have a normal PFN. This means that page_to_pfn(),
>>> et al. are no longer meaningful.
>>>
>>> Introduce helpers:
>>>
>>> - device_private_page_to_offset()
>>> - device_private_folio_to_offset()
>>>
>>> to take a given device private page / folio and return its offset within
>>> the device private address space.
>>>
>>> Update the places where we previously converted a device private page to
>>> a PFN to use these new helpers. When we encounter a device private
>>> offset, instead of looking up its page within the pagemap use
>>> device_private_offset_to_page() instead.
>>>
>>> Update the existing users:
>>>
>>> - lib/test_hmm.c
>>> - ppc ultravisor
>>> - drm/amd/amdkfd
>>> - gpu/drm/xe
>>> - gpu/drm/nouveau
>>>
>>> to use the new memremap_device_private_pagemap() interface.
>>>
>>> Signed-off-by: Jordan Niethe <jniethe@nvidia.com>
>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>>
>>> ---
>>>
>>> NOTE: The updates to the existing drivers have only been compile tested.
>>> I'll need some help in testing these drivers.
>>>
>>> v1:
>>> - Include NUMA node paramater for memremap_device_private_pagemap()
>>> - Add devm_memremap_device_private_pagemap() and friends
>>> - Update existing users of memremap_pages():
>>> - ppc ultravisor
>>> - drm/amd/amdkfd
>>> - gpu/drm/xe
>>> - gpu/drm/nouveau
>>> - Update for HMM huge page support
>>> - Guard device_private_offset_to_page and friends with CONFIG_ZONE_DEVICE
>>>
>>> v2:
>>> - Make sure last member of struct dev_pagemap remains DECLARE_FLEX_ARRAY(struct range, ranges);
>>> ---
>>> Documentation/mm/hmm.rst | 11 +-
>>> arch/powerpc/kvm/book3s_hv_uvmem.c | 41 ++---
>>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 23 +--
>>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 35 ++--
>>> drivers/gpu/drm/xe/xe_svm.c | 28 +---
>>> include/linux/hmm.h | 3 +
>>> include/linux/leafops.h | 16 +-
>>> include/linux/memremap.h | 64 +++++++-
>>> include/linux/migrate.h | 6 +-
>>> include/linux/mm.h | 2 +
>>> include/linux/rmap.h | 5 +-
>>> include/linux/swapops.h | 10 +-
>>> lib/test_hmm.c | 69 ++++----
>>> mm/debug.c | 9 +-
>>> mm/memremap.c | 193 ++++++++++++++++++-----
>>> mm/mm_init.c | 8 +-
>>> mm/page_vma_mapped.c | 19 ++-
>>> mm/rmap.c | 43 +++--
>>> mm/util.c | 5 +-
>>> 19 files changed, 391 insertions(+), 199 deletions(-)
>>>
>> <snip>
>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index e65329e1969f..b36599ab41ba 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2038,6 +2038,8 @@ static inline unsigned long memdesc_section(memdesc_flags_t mdf)
>>> */
>>> static inline unsigned long folio_pfn(const struct folio *folio)
>>> {
>>> + VM_BUG_ON(folio_is_device_private(folio));
>>
>> Please use VM_WARN_ON instead.
>
> ack.
>
>>
>>> +
>>> return page_to_pfn(&folio->page);
>>> }
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 57c63b6a8f65..c1561a92864f 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -951,7 +951,7 @@ static inline unsigned long page_vma_walk_pfn(unsigned long pfn)
>>> static inline unsigned long folio_page_vma_walk_pfn(const struct folio *folio)
>>> {
>>> if (folio_is_device_private(folio))
>>> - return page_vma_walk_pfn(folio_pfn(folio)) |
>>> + return page_vma_walk_pfn(device_private_folio_to_offset(folio)) |
>>> PVMW_PFN_DEVICE_PRIVATE;
>>>
>>> return page_vma_walk_pfn(folio_pfn(folio));
>>> @@ -959,6 +959,9 @@ static inline unsigned long folio_page_vma_walk_pfn(const struct folio *folio)
>>>
>>> static inline struct page *page_vma_walk_pfn_to_page(unsigned long pvmw_pfn)
>>> {
>>> + if (pvmw_pfn & PVMW_PFN_DEVICE_PRIVATE)
>>> + return device_private_offset_to_page(pvmw_pfn >> PVMW_PFN_SHIFT);
>>> +
>>> return pfn_to_page(pvmw_pfn >> PVMW_PFN_SHIFT);
>>> }
>>
>> <snip>
>>
>>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>>> index 96c525785d78..141fe5abd33f 100644
>>> --- a/mm/page_vma_mapped.c
>>> +++ b/mm/page_vma_mapped.c
>>> @@ -107,6 +107,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
>>> static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>> {
>>> unsigned long pfn;
>>> + bool device_private = false;
>>> pte_t ptent = ptep_get(pvmw->pte);
>>>
>>> if (pvmw->flags & PVMW_MIGRATION) {
>>> @@ -115,6 +116,9 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>> if (!softleaf_is_migration(entry))
>>> return false;
>>>
>>> + if (softleaf_is_migration_device_private(entry))
>>> + device_private = true;
>>> +
>>> pfn = softleaf_to_pfn(entry);
>>> } else if (pte_present(ptent)) {
>>> pfn = pte_pfn(ptent);
>>> @@ -127,8 +131,14 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>> return false;
>>>
>>> pfn = softleaf_to_pfn(entry);
>>> +
>>> + if (softleaf_is_device_private(entry))
>>> + device_private = true;
>>> }
>>>
>>> + if ((device_private) ^ !!(pvmw->pfn & PVMW_PFN_DEVICE_PRIVATE))
>>> + return false;
>>> +
>>> if ((pfn + pte_nr - 1) < (pvmw->pfn >> PVMW_PFN_SHIFT))
>>> return false;
>>> if (pfn > ((pvmw->pfn >> PVMW_PFN_SHIFT) + pvmw->nr_pages - 1))
>>> @@ -137,8 +147,11 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>>> }
>>>
>>> /* Returns true if the two ranges overlap. Careful to not overflow. */
>>> -static bool check_pmd(unsigned long pfn, struct page_vma_mapped_walk *pvmw)
>>> +static bool check_pmd(unsigned long pfn, bool device_private, struct page_vma_mapped_walk *pvmw)
>>> {
>>> + if ((device_private) ^ !!(pvmw->pfn & PVMW_PFN_DEVICE_PRIVATE))
>>> + return false;
>>> +
>>> if ((pfn + HPAGE_PMD_NR - 1) < (pvmw->pfn >> PVMW_PFN_SHIFT))
>>> return false;
>>> if (pfn > (pvmw->pfn >> PVMW_PFN_SHIFT) + pvmw->nr_pages - 1)
>>> @@ -255,6 +268,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>>
>>> if (!softleaf_is_migration(entry) ||
>>> !check_pmd(softleaf_to_pfn(entry),
>>> + softleaf_is_device_private(entry) ||
>>> + softleaf_is_migration_device_private(entry),
>>> pvmw))
>>> return not_found(pvmw);
>>> return true;
>>> @@ -262,7 +277,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>>> if (likely(pmd_trans_huge(pmde))) {
>>> if (pvmw->flags & PVMW_MIGRATION)
>>> return not_found(pvmw);
>>> - if (!check_pmd(pmd_pfn(pmde), pvmw))
>>> + if (!check_pmd(pmd_pfn(pmde), false, pvmw))
>>> return not_found(pvmw);
>>> return true;
>>> }
>>
>> It seems to me that you can add a new flag like “bool is_device_private” to
>> indicate whether pfn is a device private index instead of pfn without
>> manipulating pvmw->pfn itself.
>
> We could do it like that, however my concern with using a new param was that
> storing this info seperately might make it easier to misuse a device private
> index as a regular pfn.
>
> It seemed like it could be easy to overlook both when creating the pvmw and
> then when accessing the pfn.
That is why I asked for a helper function like page_vma_walk_pfn(pvmw) to
return the converted pfn instead of pvmw->pfn directly. You can add a comment
to ask people to use helper function and even mark pvmw->pfn /* do not use
directly */.
In addition, your patch manipulates pfn by left shifting it by 1. Are you sure
there is no weird arch having pfns with bit 63 being 1? Your change could
break it, right?
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2026-01-20 22:53 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 9:18 [PATCH v2 00/11] Remove device private pages from " Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 01/11] mm/migrate_device: Introduce migrate_pfn_from_page() helper Jordan Niethe
2026-01-08 20:03 ` Felix Kuehling
2026-01-08 23:49 ` Jordan Niethe
2026-01-09 21:03 ` Kuehling, Felix
2026-01-09 22:47 ` Balbir Singh
2026-01-07 9:18 ` [PATCH v2 02/11] drm/amdkfd: Use migrate pfns internally Jordan Niethe
2026-01-08 22:00 ` Felix Kuehling
2026-01-08 23:56 ` Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 03/11] mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 04/11] mm/migrate_device: Add migrate PFN flag to track device private pages Jordan Niethe
2026-01-08 20:01 ` Felix Kuehling
2026-01-08 23:41 ` Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 05/11] mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn " Jordan Niethe
2026-01-13 19:44 ` Zi Yan
2026-01-20 22:37 ` Jordan Niethe
2026-01-20 22:49 ` Zi Yan
2026-01-20 22:52 ` Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 06/11] mm: Add helpers to create migration entries from struct pages Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 07/11] mm: Add a new swap type for migration entries of device private pages Jordan Niethe
2026-01-12 1:00 ` Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 08/11] mm: Add helpers to create device private entries from struct pages Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 09/11] mm/util: Add flag to track device private pages in page snapshots Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 10/11] mm/hmm: Add flag to track device private pages Jordan Niethe
2026-01-07 9:18 ` [PATCH v2 11/11] mm: Remove device private pages from the physical address space Jordan Niethe
2026-01-13 20:04 ` Zi Yan
2026-01-20 22:33 ` Jordan Niethe
2026-01-20 22:53 ` Zi Yan [this message]
2026-01-20 23:02 ` Jordan Niethe
2026-01-20 23:06 ` Zi Yan
2026-01-20 23:34 ` Jordan Niethe
2026-01-21 2:41 ` Zi Yan
2026-01-21 4:04 ` Jordan Niethe
2026-01-22 6:24 ` Jordan Niethe
2026-01-23 2:02 ` Alistair Popple
2026-01-23 3:06 ` Zi Yan
2026-01-23 3:09 ` Zi Yan
2026-01-23 5:38 ` Alistair Popple
2026-01-23 13:50 ` Jason Gunthorpe
2026-01-07 18:36 ` [PATCH v2 00/11] Remove device private pages from " Matthew Brost
2026-01-07 20:21 ` Zi Yan
2026-01-08 2:25 ` Jordan Niethe
2026-01-08 5:42 ` Jordan Niethe
2026-01-09 0:01 ` Jordan Niethe
2026-01-09 0:31 ` Matthew Brost
2026-01-09 1:27 ` Jordan Niethe
2026-01-09 6:22 ` Matthew Brost
2026-01-14 5:41 ` Jordan Niethe
2026-01-23 6:25 ` Jordan Niethe
2026-01-07 20:06 ` Andrew Morton
2026-01-07 20:54 ` Jason Gunthorpe
2026-01-07 21:02 ` Balbir Singh
2026-01-08 1:29 ` Alistair Popple
2026-01-08 1:08 ` John Hubbard
2026-01-08 1:49 ` Alistair Popple
2026-01-08 2:55 ` Jordan Niethe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6C5F185E-BB12-4B01-8283-F2C956E84AA3@nvidia.com \
--to=ziy@nvidia.com \
--cc=Felix.Kuehling@amd.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=dakr@kernel.org \
--cc=david@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jniethe@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=matthew.brost@intel.com \
--cc=mpenttil@redhat.com \
--cc=rcampbell@nvidia.com \
--cc=simona@ffwll.ch \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox