From: Vlastimil Babka <vbabka@suse.cz>
To: Matthew Brost <matthew.brost@intel.com>, Zi Yan <ziy@nvidia.com>
Cc: "Jason Gunthorpe" <jgg@nvidia.com>,
"Balbir Singh" <balbirs@nvidia.com>,
"Matthew Wilcox" <willy@infradead.org>,
"Alistair Popple" <apopple@nvidia.com>,
"Francois Dugast" <francois.dugast@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"adhavan Srinivasan" <maddy@linux.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Thu, 22 Jan 2026 09:00:49 +0100 [thread overview]
Message-ID: <9077ab5b-f2c8-4c8d-8441-631e7c2cf384@suse.cz> (raw)
In-Reply-To: <aXHPkQfwhMHU/oP6@lstrano-desk.jf.intel.com>
On 1/22/26 08:19, Matthew Brost wrote:
> On Tue, Jan 20, 2026 at 10:01:18PM -0500, Zi Yan wrote:
>> On 20 Jan 2026, at 8:53, Jason Gunthorpe wrote:
>>
>
> This whole thread makes my head hurt, as does core MM.
>
> IMO the TL;DR is:
>
> - Why is Intel the only one proving this stuff works? We can debate all
> day about what should or should not work — but someone else needs to
> actually prove it.i, rather than type hypotheticals.
>
> - Intel has demonstrated that this works and is still getting blocked.
>
> - This entire thread is about a fixes patch for large device pages.
> Changing prep_compound_page is completely out of scope for a fixes
> patch, and honestly so is most of the rest of what’s being proposed.
FWIW I'm ok if this lands as a fix patch, and perceived the discussion to be
about how refactor things more properly afterwards, going forward.
> - At a minimum, you must clear every page’s flags in the loop. So why not
> conservatively clear anything else a folio might have set before calling
> an existing core-MM function, ensuring the pages are in a known state?
> This is a fixes patch.
>
> - Given the current state of the discussion, I don’t think large device
> pages should be in 6.19. And if so, why didn’t the entire device pages
> series receive this level of scrutiny earlier? It’s my mistake for not
> saying “no” until the reallocation at different sizes issue was resolved.
>
> @Andrew. - I'd revert large device pages in 6.19 as it doesn't work and
> we seemly cannot close on this.
>
> Matt
>
>> > On Mon, Jan 19, 2026 at 09:50:16PM -0500, Zi Yan wrote:
>> >>>> I suppose we want some prep_single_page(page) and some reorg to share
>> >>>> code with the other prep function.
>> >>
>> >> This is just an unnecessary need due to lack of knowledge of/do not want
>> >> to investigate core MM page and folio initialization code.
>> >
>> > It will be better to keep this related code together, not spread all
>> > around.
>>
>> Or clarify what code is for preparing pages, which would go away at memdesc
>> time, and what code is for preparing folios, which would stay.
>>
>> >
>> >>>> I don't think so. It should do the above job efficiently and iterate
>> >>>> over the page list exactly once.
>> >>
>> >> folio initialization should not iterate over any page list, since folio is
>> >> supposed to be treated as a whole instead of individual pages.
>> >
>> > The tail pages need to have the right data in them or compound_head
>> > won't work.
>>
>> That is done by set_compound_head() in prep_compound_tail().
>> prep_compound_page() take cares of it. As long as it is called, even if
>> the pages in that compound page have random states before, the compound
>> page should function correctly afterwards.
>>
>> >
>> >> folio->mapping = NULL;
>> >> folio->memcg_data = 0;
>> >> folio->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>> >>
>> >> should be enough.
>> >
>> > This seems believable to me for setting up an order 0 page.
>>
>> It works for any folio, regardless of its order. fields used in second
>> or third subpages are all taken care of by prep_compound_page().
>>
>> >
>> >> if (order)
>> >> folio_set_large_rmappable(folio);
>> >
>> > That one is in zone_device_folio_init()
>>
>> Yes. And the code location looks right to me.
>>
>> >
>> > And maybe the naming has got really confused if we have both functions
>> > now :\
>>
>> Yes. One of the issues is that device private code used to only handles
>> order-0 pages and was converted to use high order folio directly without
>> using high order page (namely compound page) as an intermediate step.
>> This two-step-in-one caused confusion. But the key thing to avoid the
>> confusion is that to form a high order folio, a list of contiguous pages
>> would become a compound page by calling prep_compound_page(), then
>> the compound page becomes a folio by calling folio_set_large_rmappable().
>>
>> BTW, the code in prep_compound_head() after folio_set_order(folio, order)
>> should belong to folio_set_large_rmappable() and they are causing confusion,
>> since they are only applicable to rmappable large folios. I am going to
>> send a patch to fix it.
>>
>>
>> Best Regards,
>> Yan, Zi
next prev parent reply other threads:[~2026-01-22 8:00 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 13:10 ` Balbir Singh
2026-01-16 16:07 ` Vlastimil Babka
2026-01-16 17:20 ` Jason Gunthorpe
2026-01-16 17:27 ` Vlastimil Babka
2026-01-22 8:02 ` Vlastimil Babka
2026-01-16 17:49 ` Jason Gunthorpe
2026-01-16 19:17 ` Vlastimil Babka
2026-01-16 20:31 ` Matthew Brost
2026-01-17 0:51 ` Jason Gunthorpe
2026-01-17 3:55 ` Matthew Brost
2026-01-17 4:42 ` Balbir Singh
2026-01-17 5:27 ` Matthew Brost
2026-01-19 5:59 ` Alistair Popple
2026-01-19 14:20 ` Jason Gunthorpe
2026-01-19 20:09 ` Zi Yan
2026-01-19 20:35 ` Jason Gunthorpe
2026-01-19 22:15 ` Balbir Singh
2026-01-20 2:50 ` Zi Yan
2026-01-20 13:53 ` Jason Gunthorpe
2026-01-21 3:01 ` Zi Yan
2026-01-22 7:19 ` Matthew Brost
2026-01-22 8:00 ` Vlastimil Babka [this message]
2026-01-22 9:10 ` Balbir Singh
2026-01-22 21:41 ` Andrew Morton
2026-01-22 22:53 ` Alistair Popple
2026-01-23 6:45 ` Vlastimil Babka
2026-01-22 14:29 ` Jason Gunthorpe
2026-01-22 15:46 ` Jason Gunthorpe
2026-01-23 2:41 ` Zi Yan
2026-01-23 14:19 ` Jason Gunthorpe
2026-01-21 3:51 ` Balbir Singh
2026-01-17 0:19 ` Jason Gunthorpe
2026-01-19 5:41 ` Alistair Popple
2026-01-19 14:24 ` Jason Gunthorpe
2026-01-16 22:34 ` Andrew Morton
2026-01-16 22:36 ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37 ` Balbir Singh
2026-01-16 12:02 ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9077ab5b-f2c8-4c8d-8441-631e7c2cf384@suse.cz \
--to=vbabka@suse.cz \
--cc=Felix.Kuehling@amd.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=maddy@linux.ibm.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=simona@ffwll.ch \
--cc=surenb@google.com \
--cc=tzimmermann@suse.de \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox