linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Zi Yan" <ziy@nvidia.com>,
	"adhavan Srinivasan" <maddy@linux.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Balbir Singh" <balbirs@nvidia.com>,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	nouveau@lists.freedesktop.org, linux-mm@kvack.org,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Mon, 19 Jan 2026 16:41:42 +1100	[thread overview]
Message-ID: <y7dm2sqf5t5txirxkbu7hlmsfsnlbtdirgn4ts2l4st3z4kawo@qpa56ysy5v3t> (raw)
In-Reply-To: <20260117001921.GB1134360@nvidia.com>

On 2026-01-17 at 11:19 +1100, Jason Gunthorpe <jgg@nvidia.com> wrote...
> On Fri, Jan 16, 2026 at 08:17:22PM +0100, Vlastimil Babka wrote:
> > >> +#ifdef NR_PAGES_IN_LARGE_FOLIO
> > >> +		/*
> > >> +		 * This pointer math looks odd, but new_page could have been
> > >> +		 * part of a previous higher order folio, which sets _nr_pages
> > >> +		 * in page + 1 (new_page). Therefore, we use pointer casting to
> > >> +		 * correctly locate the _nr_pages bits within new_page which
> > >> +		 * could have modified by previous higher order folio.
> > >> +		 */
> > >> +		((struct folio *)(new_page - 1))->_nr_pages = 0;
> > >> +#endif
> > > 
> > > This seems too weird, why is it in the loop?  There is only one
> > > _nr_pages per folio.

Yeah, I don't really know what the motivation is for going via the folio
field which needs the odd pointer math versus just setting page->memcg_data
= 0 directly which would work equally well and would have avoided a lot of
confusion.

> > I suppose we could be getting say an order-9 folio that was previously used
> > as two order-8 folios? And each of them had their _nr_pages in their head
> > and we can't know that at this point so we have to reset everything?
> 
> Er, did I miss something - who reads _nr_pages from a random tail
> page? Doesn't everything working with random tail pages read order,
> compute the head page, cast to folio and then access _nr_pages?
> 
> > Or maybe you mean that stray _nr_pages in some tail page from previous
> > lifetimes can't affect the current lifetime in a wrong way for something
> > looking at said page? I don't know immediately.
> 
> Yes, exactly.
> 
> Basically, what bytes exactly need to be set to what in tail pages for
> the system to work? Those should be set.
> 
> And if we want to have things set on free that's fine too, but there
> should be reasons for doing stuff, and this weird thing above makes
> zero sense.

You can't think of these as tail pages or head pages. They are just random
struct pages, possibly order-0 or PageHead or PageTail, with fields in a
"random" state based on what they were last used for.

All this function should be trying to do is initialising this random state to
something sane as defined by the core-mm for it to consume. Yes, some might
later end up being tail (or head) pages if order > 0 and prep_compound_page()
is called. But the point of this function and the loop is to initialise the
struct page as an order-0 page with "sane" fields to pass to core-mm or call
prep_compound_page() on.

This could for example just use memset(new_page, 0, sizeof(struct page)) and
then refill all the fields correctly (although Vlastimil pointed out some page
flags need preservation). But a big part of the problem is there is no single
definition (AFAIK) of what state a struct page should be in before handing it to
the core-mm via either vm_insert_page()/pages()/etc. or migrate_vma_*() nor what
state the kernel leaves it in once freed.

I would like to see this addressed because it leads to all sorts of weirdness -
for example vm_insert_page() and migrate_vma_*() both require the page refcount
to be 1 for no good reason (drivers usually have to drop it immediately after
the call and they implicitly own the ZONE_DEVICE page lifetimes anyway so why make them
hold a reference just to map the page). Yet only migrate_vma_*() requires the
page to be locked (so other ZONE_DEVICE users just have to immediately unlock).

And I presume page->memcg_data must be set to zero, or Matthew wouldn't have
run into problems prompting him to reinit it. But I don't really know what other
requirements there are for setting page fields, they all sort of come implicitly
from the vm_insert_page/migrate_vma APIs.

 - Alistair

> Jason


  reply	other threads:[~2026-01-19  5:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 13:10   ` Balbir Singh
2026-01-16 16:07   ` Vlastimil Babka
2026-01-16 17:20     ` Jason Gunthorpe
2026-01-16 17:27       ` Vlastimil Babka
2026-01-22  8:02     ` Vlastimil Babka
2026-01-16 17:49   ` Jason Gunthorpe
2026-01-16 19:17     ` Vlastimil Babka
2026-01-16 20:31       ` Matthew Brost
2026-01-17  0:51         ` Jason Gunthorpe
2026-01-17  3:55           ` Matthew Brost
2026-01-17  4:42             ` Balbir Singh
2026-01-17  5:27               ` Matthew Brost
2026-01-19  5:59                 ` Alistair Popple
2026-01-19 14:20                   ` Jason Gunthorpe
2026-01-19 20:09                     ` Zi Yan
2026-01-19 20:35                       ` Jason Gunthorpe
2026-01-19 22:15                         ` Balbir Singh
2026-01-20  2:50                           ` Zi Yan
2026-01-20 13:53                             ` Jason Gunthorpe
2026-01-21  3:01                               ` Zi Yan
2026-01-22  7:19                                 ` Matthew Brost
2026-01-22  8:00                                   ` Vlastimil Babka
2026-01-22  9:10                                     ` Balbir Singh
2026-01-22 21:41                                       ` Andrew Morton
2026-01-22 22:53                                         ` Alistair Popple
2026-01-23  6:45                                         ` Vlastimil Babka
2026-01-22 14:29                                   ` Jason Gunthorpe
2026-01-22 15:46                                 ` Jason Gunthorpe
2026-01-23  2:41                                   ` Zi Yan
2026-01-23 14:19                                     ` Jason Gunthorpe
2026-01-21  3:51                             ` Balbir Singh
2026-01-17  0:19       ` Jason Gunthorpe
2026-01-19  5:41         ` Alistair Popple [this message]
2026-01-19 14:24           ` Jason Gunthorpe
2026-01-16 22:34   ` Andrew Morton
2026-01-16 22:36     ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37   ` Balbir Singh
2026-01-16 12:02     ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=y7dm2sqf5t5txirxkbu7hlmsfsnlbtdirgn4ts2l4st3z4kawo@qpa56ysy5v3t \
    --to=apopple@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=balbirs@nvidia.com \
    --cc=chleroy@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=maddy@linux.ibm.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox