linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Jordan Niethe <jniethe@nvidia.com>, linux-mm@kvack.org
Cc: balbirs@nvidia.com, matthew.brost@intel.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, ziy@nvidia.com,
	apopple@nvidia.com, lorenzo.stoakes@oracle.com, lyude@redhat.com,
	dakr@kernel.org, airlied@gmail.com, simona@ffwll.ch,
	rcampbell@nvidia.com, mpenttil@redhat.com, jgg@nvidia.com,
	willy@infradead.org, linuxppc-dev@lists.ozlabs.org,
	intel-xe@lists.freedesktop.org, jgg@ziepe.ca,
	Felix.Kuehling@amd.com, jhubbard@nvidia.com, maddy@linux.ibm.com,
	mpe@ellerman.id.au, ying.huang@linux.alibaba.com
Subject: Re: [PATCH v6 00/13] Remove device private pages from physical address space
Date: Fri, 6 Mar 2026 17:16:09 +0100	[thread overview]
Message-ID: <4b5b222a-18e8-4d48-9acb-39e5bfe4e5f7@kernel.org> (raw)
In-Reply-To: <20260202113642.59295-1-jniethe@nvidia.com>

On 2/2/26 12:36, Jordan Niethe wrote:
> Introduction
> ------------
> 
> The existing design of device private memory imposes limitations which
> render it non functional for certain systems and configurations where
> the physical address space is limited. 
> 
> Limited available address space
> -------------------------------
> 
> Device private memory is implemented by first reserving a region of the
> physical address space. This is a problem. The physical address space is
> not a resource that is directly under the kernel's control. Availability
> of suitable physical address space is constrained by the underlying
> hardware and firmware and may not always be available. 
> 
> Device private memory assumes that it will be able to reserve a device
> memory sized chunk of physical address space. However, there is nothing
> guaranteeing that this will succeed, and there a number of factors that
> increase the likelihood of failure. We need to consider what else may
> exist in the physical address space. It is observed that certain VM
> configurations place very large PCI windows immediately after RAM. Large
> enough that there is no physical address space available at all for
> device private memory. This is more likely to occur on 43 bit physical
> width systems which have less physical address space.
> 
> The fundamental issue is the physical address space is not a resource
> the kernel can rely on being to allocate from at will.  
> 
> New implementation
> ------------------
> 
> This series changes device private memory so that it does not require
> allocation of physical address space and these problems are avoided.
> Instead of using the physical address space, we introduce a "device
> private address space" and allocate from there.
> 
> A consequence of placing the device private pages outside of the
> physical address space is that they no longer have a PFN. However, it is
> still necessary to be able to look up a corresponding device private
> page from a device private PTE entry, which means that we still require
> some way to index into this device private address space. Instead of a
> PFN, device private pages use an offset into this device private address
> space to look up device private struct pages.
> 
> The problem that then needs to be addressed is how to avoid confusing
> these device private offsets with PFNs. It is the limited usage
> of the device private pages themselves which make this possible. A
> device private page is only used for userspace mappings, we do not need
> to be concerned with them being used within the mm more broadly. This
> means that the only way that the core kernel looks up these pages is via
> the page table, where their PTE already indicates if they refer to a
> device private page via their swap type, e.g.  SWP_DEVICE_WRITE. We can
> use this information to determine if the PTE contains a PFN which should
> be looked up in the page map, or a device private offset which should be
> looked up elsewhere.
> 
> This applies when we are creating PTE entries for device private pages -
> because they have their own type there are already must be handled
> separately, so it is a small step to convert them to a device private
> PFN now too.
> 
> The first part of the series updates callers where device private
> offsets might now be encountered to track this extra state.
> 
> The last patch contains the bulk of the work where we change how we
> convert between device private pages to device private offsets and then
> use a new interface for allocating device private pages without the need
> for reserving physical address space.
> 
> By removing the device private pages from the physical address space,
> this series also opens up the possibility to moving away from tracking
> device private memory using struct pages in the future. This is
> desirable as on systems with large amounts of memory these device
> private struct pages use a signifiant amount of memory and take a
> significant amount of time to initialize.

I now went through all of the patches (skimming a bit over some parts
that need splitting or rework).

In general, a noble goal and a reasonable approach.

But I get the sense that we are just hacking in yet another zone-device
thing. This series certainly makes core-mm more complicated. I provided
some inputs on how to make some things less hacky, and will provide
further input as you move forward.

We really have to minimize the impact, otherwise we'll just keep
breaking stuff all the time when we forget a single test for
device-private pages in one magical path.

I am not 100% sure how much the additional tests for device-private
pages all over the place will cost us. At least it can get compiled out,
but most distros will just always have it compiled in.

-- 
Cheers,

David


      parent reply	other threads:[~2026-03-06 16:16 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02 11:36 Jordan Niethe
2026-02-02 11:36 ` [PATCH v6 01/13] mm/migrate_device: Introduce migrate_pfn_from_page() helper Jordan Niethe
2026-02-27 21:11   ` David Hildenbrand (Arm)
2026-03-01 23:38     ` Jordan Niethe
2026-03-02  9:22       ` David Hildenbrand (Arm)
2026-03-03  5:52         ` Jordan Niethe
2026-03-03 16:32           ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 02/13] drm/amdkfd: Use migrate pfns internally Jordan Niethe
2026-03-03 16:40   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 03/13] mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns Jordan Niethe
2026-03-03 16:52   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 04/13] mm/migrate_device: Add migrate PFN flag to track device private pages Jordan Niethe
2026-03-03 16:58   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 05/13] mm/page_vma_mapped: Add flag to page_vma_mapped_walk::flags " Jordan Niethe
2026-03-06 15:44   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 06/13] mm: Add helpers to create migration entries from struct pages Jordan Niethe
2026-03-06 15:59   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 07/13] mm: Add a new swap type for migration entries of device private pages Jordan Niethe
2026-02-02 11:36 ` [PATCH v6 08/13] mm: Add softleaf support for device private migration entries Jordan Niethe
2026-02-02 11:36 ` [PATCH v6 09/13] mm: Begin creating " Jordan Niethe
2026-02-02 11:36 ` [PATCH v6 10/13] mm: Add helpers to create device private entries from struct pages Jordan Niethe
2026-02-02 11:36 ` [PATCH v6 11/13] mm/util: Add flag to track device private pages in page snapshots Jordan Niethe
2026-03-06 16:02   ` David Hildenbrand (Arm)
2026-03-06 16:03     ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 12/13] mm/hmm: Add flag to track device private pages Jordan Niethe
2026-03-06 16:05   ` David Hildenbrand (Arm)
2026-02-02 11:36 ` [PATCH v6 13/13] mm: Remove device private pages from the physical address space Jordan Niethe
2026-03-06 16:11   ` David Hildenbrand (Arm)
2026-02-06 13:08 ` [PATCH v6 00/13] Remove device private pages from " David Hildenbrand (Arm)
2026-03-06 16:16 ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b5b222a-18e8-4d48-9acb-39e5bfe4e5f7@kernel.org \
    --to=david@kernel.org \
    --cc=Felix.Kuehling@amd.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jniethe@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=maddy@linux.ibm.com \
    --cc=matthew.brost@intel.com \
    --cc=mpe@ellerman.id.au \
    --cc=mpenttil@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=simona@ffwll.ch \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox