From: Jason Gunthorpe <jgg@nvidia.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Matthew Wilcox" <willy@infradead.org>,
"Alistair Popple" <apopple@nvidia.com>,
"Matthew Brost" <matthew.brost@intel.com>,
"Balbir Singh" <balbirs@nvidia.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Francois Dugast" <francois.dugast@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"adhavan Srinivasan" <maddy@linux.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Mon, 19 Jan 2026 16:35:51 -0400 [thread overview]
Message-ID: <20260119203551.GQ1134360@nvidia.com> (raw)
In-Reply-To: <96926697-070C-45DE-AD26-559652625859@nvidia.com>
On Mon, Jan 19, 2026 at 03:09:00PM -0500, Zi Yan wrote:
> > diff --git a/mm/internal.h b/mm/internal.h
> > index e430da900430a1..a7d3f5e4b85e49 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -806,14 +806,21 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
> > atomic_set(&folio->_pincount, 0);
> > atomic_set(&folio->_entire_mapcount, -1);
> > }
> > - if (order > 1)
> > + if (order > 1) {
> > INIT_LIST_HEAD(&folio->_deferred_list);
> > + } else {
> > + folio->mapping = NULL;
> > +#ifdef CONFIG_MEMCG
> > + folio->memcg_data = 0;
> > +#endif
> > + }
>
> prep_compound_head() is only called on >0 order pages. The above
> code means when order == 1, folio->mapping and folio->memcg_data are
> assigned NULL.
OK, fair enough, the conditionals would have to change and maybe it
shouldn't be called "compound_head" if it also cleans up normal pages.
> > static inline void prep_compound_tail(struct page *head, int tail_idx)
> > {
> > struct page *p = head + tail_idx;
> >
> > + p->flags.f &= ~0xffUL; /* Clear possible order, page head */
>
> No one cares about tail page flags if it is not checked in check_new_page()
> from mm/page_alloc.c.
At least page_fixed_fake_head() does check PG_head in some
configurations. It does seem safer to clear it. Possibly order is
never used, but it is free to clear it.
> > - if (order)
> > - prep_compound_page(page, order);
> > + prep_compound_page(page, order);
>
> prep_compound_page() should only be called for >0 order pages. This creates
> another weirdness in device pages by assuming all pages are
> compound.
OK
> > + folio = page_folio(page);
> > + folio->pgmap = pgmap;
> > + folio_lock(folio);
> > + folio_set_count(folio, 1);
>
> /* clear possible previous page->mapping */
> folio->mapping = NULL;
>
> /* clear possible previous page->_nr_pages */
> #ifdef CONFIG_MEMCG
> folio->memcg_data = 0;
> #endif
This is reasonable too, but prep_compound_head() was doing more than
that, it is also clearing the order, and this needs to clear the head
bit. That's why it was apppealing to reuse those functions, but you
are right they are not ideal.
I suppose we want some prep_single_page(page) and some reorg to share
code with the other prep function.
> This patch mixed the concept of page and folio together, thus
> causing confusion. Core MM sees page and folio two separate things:
> 1. page is the smallest internal physical memory management unit,
> 2. folio is an abstraction on top of pages, and other abstractions can be
> slab, ptdesc, and more (https://kernelnewbies.org/MatthewWilcox/Memdescs).
I think the users of zone_device_page_init() are principally trying to
create something that can be installed in a non-special PTE. Meaning
the output is always a folio because it is going to be read as a folio
in the page walkers.
Thus, the job of this function is to take the memory range starting at
page for 2^order and turn it into a single valid folio with refcount
of 1.
> If device pages have to initialize on top of pages with obsolete states,
> at least it should be first initialized as pages, then as folios to avoid
> confusion.
I don't think so. It should do the above job efficiently and iterate
over the page list exactly once.
Jason
next prev parent reply other threads:[~2026-01-19 20:36 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 13:10 ` Balbir Singh
2026-01-16 16:07 ` Vlastimil Babka
2026-01-16 17:20 ` Jason Gunthorpe
2026-01-16 17:27 ` Vlastimil Babka
2026-01-22 8:02 ` Vlastimil Babka
2026-01-16 17:49 ` Jason Gunthorpe
2026-01-16 19:17 ` Vlastimil Babka
2026-01-16 20:31 ` Matthew Brost
2026-01-17 0:51 ` Jason Gunthorpe
2026-01-17 3:55 ` Matthew Brost
2026-01-17 4:42 ` Balbir Singh
2026-01-17 5:27 ` Matthew Brost
2026-01-19 5:59 ` Alistair Popple
2026-01-19 14:20 ` Jason Gunthorpe
2026-01-19 20:09 ` Zi Yan
2026-01-19 20:35 ` Jason Gunthorpe [this message]
2026-01-19 22:15 ` Balbir Singh
2026-01-20 2:50 ` Zi Yan
2026-01-20 13:53 ` Jason Gunthorpe
2026-01-21 3:01 ` Zi Yan
2026-01-22 7:19 ` Matthew Brost
2026-01-22 8:00 ` Vlastimil Babka
2026-01-22 9:10 ` Balbir Singh
2026-01-22 21:41 ` Andrew Morton
2026-01-22 22:53 ` Alistair Popple
2026-01-23 6:45 ` Vlastimil Babka
2026-01-22 14:29 ` Jason Gunthorpe
2026-01-22 15:46 ` Jason Gunthorpe
2026-01-23 2:41 ` Zi Yan
2026-01-23 14:19 ` Jason Gunthorpe
2026-01-21 3:51 ` Balbir Singh
2026-01-17 0:19 ` Jason Gunthorpe
2026-01-19 5:41 ` Alistair Popple
2026-01-19 14:24 ` Jason Gunthorpe
2026-01-16 22:34 ` Andrew Morton
2026-01-16 22:36 ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37 ` Balbir Singh
2026-01-16 12:02 ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260119203551.GQ1134360@nvidia.com \
--to=jgg@nvidia.com \
--cc=Felix.Kuehling@amd.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=maddy@linux.ibm.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=simona@ffwll.ch \
--cc=surenb@google.com \
--cc=tzimmermann@suse.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox