linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Balbir Singh" <balbirs@nvidia.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Madhavan Srinivasan" <maddy@linux.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	nouveau@lists.freedesktop.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, linux-cxl@vger.kernel.org
Subject: Re: [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free callback
Date: Mon, 12 Jan 2026 15:22:19 -0800	[thread overview]
Message-ID: <aWWCK0C23CUl9zEq@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <45A4E73B-F6C2-44B7-8C81-13E24ED12127@nvidia.com>

On Mon, Jan 12, 2026 at 06:15:26PM -0500, Zi Yan wrote:
> On 12 Jan 2026, at 16:49, Matthew Brost wrote:
> 
> > On Mon, Jan 12, 2026 at 09:45:10AM -0400, Jason Gunthorpe wrote:
> >
> > Hi, catching up here.
> >
> >> On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
> >>> On 11 Jan 2026, at 19:19, Balbir Singh wrote:
> >>>
> >>>> On 1/12/26 08:35, Matthew Wilcox wrote:
> >>>>> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
> >>>>>> The core MM splits the folio before calling folio_free, restoring the
> >>>>>> zone pages associated with the folio to an initialized state (e.g.,
> >>>>>> non-compound, pgmap valid, etc...). The order argument represents the
> >>>>>> folio’s order prior to the split which can be used driver side to know
> >>>>>> how many pages are being freed.
> >>>>>
> >>>>> This really feels like the wrong way to fix this problem.
> >>>>>
> >>>
> >>> Hi Matthew,
> >>>
> >>> I think the wording is confusing, since the actual issue is that:
> >>>
> >>> 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
> >>> 2. but free_zone_device_folio() never reverse the course,
> >>> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
> >>>    be done before driver callback ->folio_free(), since once ->folio_free()
> >>>    is called, the folio can be reallocated immediately,
> >>> 4. after the undo of prep_compound_page(), folio_order() can no longer provide
> >>>    the original order information, thus, folio_free() needs that for proper
> >>>    device side ref manipulation.
> >>
> >> There is something wrong with the driver if the "folio can be
> >> reallocated immediately".
> >>
> >> The flow generally expects there to be a driver allocator linked to
> >> folio_free()
> >>
> >> 1) Allocator finds free memory
> >> 2) zone_device_page_init() allocates the memory and makes refcount=1
> >> 3) __folio_put() knows the recount 0.
> >> 4) free_zone_device_folio() calls folio_free(), but it doesn't
> >>    actually need to undo prep_compound_page() because *NOTHING* can
> >>    use the page pointer at this point.
> >
> > Correct—nothing can use the folio prior to calling folio_free(). Once
> > folio_free() returns, the driver side is free to immediately reallocate
> > the folio (or a subset of its pages).
> >
> >> 5) Driver puts the memory back into the allocator and now #1 can
> >>    happen. It knows how much memory to put back because folio->order
> >>    is valid from #2
> >> 6) #1 happens again, then #2 happens again and the folio is in the
> >>    right state for use. The successor #2 fully undoes the work of the
> >>    predecessor #2.
> >>
> >> If you have races where #1 can happen immediately after #3 then the
> >> driver design is fundamentally broken and passing around order isn't
> >> going to help anything.
> >>
> >
> > The above race does not exist; if it did, I agree we’d be solving
> > nothing here.
> >
> >> If the allocator is using the struct page memory then step #5 should
> >> also clean up the struct page with the allocator data before returning
> >> it to the allocator.
> >>
> >
> > We could move the call to free_zone_device_folio_prepare() [1] into the
> > driver-side implementation of ->folio_free() and drop the order argument
> > here. Zi didn’t particularly like that; he preferred calling
> > free_zone_device_folio_prepare() [2] before invoking ->folio_free(),
> > which is why this patch exists.
> 
> On a second thought, if calling free_zone_device_folio_prepare() in
> ->folio_free() works, feel free to do so.
> 

+1, testing this change right now and it does indeed work.

Matt

> >
> > FWIW, I do not have a strong opinion here—either way works. Xe doesn’t
> > actually need the order regardless of where
> > free_zone_device_folio_prepare() is called, but Nouveau does need the
> > order if free_zone_device_folio_prepare() is called before
> > ->folio_free().
> >
> > [1] https://patchwork.freedesktop.org/patch/697877/?series=159120&rev=4
> > [2] https://patchwork.freedesktop.org/patch/697709/?series=159120&rev=3#comment_1282405
> >
> >> I vaugely remember talking about this before in the context of the Xe
> >> driver.. You can't just take an existing VRAM allocator and layer it
> >> on top of the folios and have it broadly ignore the folio_free
> >> callback.
> >>
> >
> > We are definitely not ignoring the ->folio_free callback—that is the
> > point at which we tell our VRAM allocator (DRM buddy) it is okay to
> > release the allocation and make it available for reuse.
> >
> > Matt
> >
> >> Jsaon
> 
> 
> Best Regards,
> Yan, Zi


  reply	other threads:[~2026-01-12 23:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-11 20:55 [PATCH v4 0/7] Enable THP support in drm_pagemap Francois Dugast
2026-01-11 20:55 ` [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free callback Francois Dugast
2026-01-11 22:35   ` Matthew Wilcox
2026-01-12  0:19     ` Balbir Singh
2026-01-12  0:51       ` Zi Yan
2026-01-12  1:37         ` Matthew Brost
2026-01-12  4:50         ` Balbir Singh
2026-01-12 13:45         ` Jason Gunthorpe
2026-01-12 16:31           ` Zi Yan
2026-01-12 16:50             ` Jason Gunthorpe
2026-01-12 17:46               ` Zi Yan
2026-01-12 18:25                 ` Jason Gunthorpe
2026-01-12 18:55                   ` Zi Yan
2026-01-12 19:28                     ` Jason Gunthorpe
2026-01-12 23:34                       ` Zi Yan
2026-01-12 23:53                         ` Jason Gunthorpe
2026-01-13  0:35                           ` Zi Yan
2026-01-12 23:07               ` Matthew Brost
2026-01-12 21:49           ` Matthew Brost
2026-01-12 23:15             ` Zi Yan
2026-01-12 23:22               ` Matthew Brost [this message]
2026-01-12 23:44                 ` Alistair Popple
2026-01-12 23:54                   ` Jason Gunthorpe
2026-01-12 23:31               ` Jason Gunthorpe
2026-01-11 20:55 ` [PATCH v4 2/7] mm/zone_device: Add free_zone_device_folio_prepare() helper Francois Dugast
2026-01-12  0:44   ` Balbir Singh
2026-01-12  1:16     ` Matthew Brost
2026-01-12  2:15       ` Balbir Singh
2026-01-12  2:37         ` Matthew Brost
2026-01-12  2:50           ` Matthew Brost
2026-01-12 23:58       ` Alistair Popple
2026-01-13  0:23         ` Matthew Brost
2026-01-13  0:43           ` Alistair Popple
2026-01-11 20:55 ` [PATCH v4 3/7] fs/dax: Use " Francois Dugast
2026-01-12  4:14   ` kernel test robot
2026-01-11 20:55 ` [PATCH v4 4/7] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-11 20:55 ` [PATCH v4 5/7] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-11 20:55 ` [PATCH v4 6/7] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-12 14:17   ` Francois Dugast
2026-01-11 20:55 ` [PATCH v4 7/7] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
2026-01-11 21:37   ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWWCK0C23CUl9zEq@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=chleroy@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=maddy@linux.ibm.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox