linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, akpm@linux-foundation.org
Cc: "Matthew Wilcox" <willy@infradead.org>, "Jan Kara" <jack@suse.cz>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"Christoph Hellwig" <hch@lst.de>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
	nvdimm@lists.linux.dev
Subject: Re: [PATCH] mm/memremap: Introduce pgmap_request_folio() using pgmap offsets
Date: Fri, 21 Oct 2022 14:31:59 -0400	[thread overview]
Message-ID: <0d2efd01-956c-3e61-6bd0-81e449fad4f9@amd.com> (raw)
In-Reply-To: <6351d7105fe92_4da329467@dwillia2-xfh.jf.intel.com.notmuch>


On 2022-10-20 19:17, Dan Williams wrote:
> Felix Kuehling wrote:
>> Am 2022-10-20 um 17:56 schrieb Dan Williams:
>>> A 'struct dev_pagemap' (pgmap) represents a collection of ZONE_DEVICE
>>> pages. The pgmap is a reference counted object that serves a similar
>>> role as a 'struct request_queue'. Live references are obtained for each
>>> in flight request / page, and once a page's reference count drops to
>>> zero the associated pin of the pgmap is dropped as well. While a page is
>>> idle nothing should be accessing it because that is effectively a
>>> use-after-free situation. Unfortunately, all current ZONE_DEVICE
>>> implementations deploy a layering violation to manage requests to
>>> activate pages owned by a pgmap. Specifically, they take steps like walk
>>> the pfns that were previously assigned at memremap_pages() time and use
>>> pfn_to_page() to recall metadata like page->pgmap, or make use of other
>>> data like page->zone_device_data.
>>>
>>> The first step towards correcting that situation is to provide a
>>> API to get access to a pgmap page that does not require the caller to
>>> know the pfn, nor access any fields of an idle page. Ideally this API
>>> would be able to support dynamic page creation instead of the current
>>> status quo of pre-allocating and initializing pages.
>> If I understand it correctly, the current code works because the struct
>> pages are pre-allocated. Is the end-goal here to make the struct page
>> allocation for ZONE_DEVICE pages dynamic.
> Some DEVICE_PRIVATE users have already open-coded their own coarse
> grained dynamic ZONE_DEVICE pages by waiting to allocate chunks on
> demand.
>
> The users that would benefit from a general dynamic ZONE_DEVICE facility
> are cases like VMs backed by device-dax instances. Unless the VM calls
> for bare metal services there is no need to map pages for the device-dax
> instance in the hypervisor.
>
> So, the end goal here is to just add some sanity to ZONE_DEVICE page
> reference counting to allow for requiring an arbitration for page access
> rather than just pfn_to_page() and assuming the page is already there.
> Dynamic ZONE_DEVICE becomes something that is possible once that sanity
> is in place.
>
>>> On a prompt from Jason, introduce pgmap_request_folio() that operates on
>>> an offset into a pgmap.
>> This looks like it would make it fairly easy to request larger (higher
>> order) folios for physically contiguous device memory allocations in the
>> future.
>>
>>
>>>    It replaces the shortlived
>>> pgmap_request_folios() that was continuing the layering violation of
>>> assuming pages are available to be consulted before asking the pgmap to
>>> make them available.
>>>
>>> For now this only converts the callers to lookup the pgmap and generate
>>> the pgmap offset, but it does not do the deeper cleanup of teaching
>>> those call sites to generate those arguments without walking the page
>>> metadata. For next steps it appears the DEVICE_PRIVATE implementations
>>> could plumb the pgmap into the necessary callsites and switch to using
>>> gen_pool_alloc() to track which offsets of a pgmap are allocated.
>> Wouldn't that duplicate whatever device memory allocator we already have
>> in our driver? Couldn't I just take the memory allocation from our TTM
>> allocator and make necessary pgmap_request_folio calls to allocate the
>> corresponding pages from the pgmap?
> I think you could, as long as the output from that allocator is a
> pgmap_offset rather than a pfn.
>
>> Or does the pgmap allocation need a finer granularity than the device
>> memory allocation?
> I would say the pgmap *allocation* happens at memremap_pages() time.
> pgmap_request_folio() is a request to put a pgmap page into service.
>
> So, yes, I think you can bring your own allocator for what offsets are
> in/out of service in pgmap space.

Thank you for the explanation. The patch is

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>





  reply	other threads:[~2022-10-21 18:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-20 21:56 Dan Williams
2022-10-20 22:32 ` Felix Kuehling
2022-10-20 23:17   ` Dan Williams
2022-10-21 18:31     ` Felix Kuehling [this message]
2022-10-24  1:44       ` Alistair Popple
2022-10-21 20:36 ` Lyude Paul
2022-10-28 18:33 ` Jason Gunthorpe
2022-12-01  0:22   ` Dan Williams
2022-12-01 23:12     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d2efd01-956c-3e61-6bd0-81e449fad4f9@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=apopple@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=djwong@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kherbst@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=lyude@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox