Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Balbir Singh <balbirs@nvidia.com>
To: Matthew Brost <matthew.brost@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Zi Yan" <ziy@nvidia.com>, "Alistair Popple" <apopple@nvidia.com>,
	"adhavan Srinivasan" <maddy@linux.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	nouveau@lists.freedesktop.org, linux-mm@kvack.org,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Sat, 17 Jan 2026 15:42:16 +1100	[thread overview]
Message-ID: <eb94d115-18a6-455b-b020-f18f372e283a@nvidia.com> (raw)
In-Reply-To: <aWsIT8A2dLciFvhj@lstrano-desk.jf.intel.com>

On 1/17/26 14:55, Matthew Brost wrote:
> On Fri, Jan 16, 2026 at 08:51:14PM -0400, Jason Gunthorpe wrote:
>> On Fri, Jan 16, 2026 at 12:31:25PM -0800, Matthew Brost wrote:
>>>> I suppose we could be getting say an order-9 folio that was previously used
>>>> as two order-8 folios? And each of them had their _nr_pages in their head
>>>
>>> Yes, this is a good example. At this point we have idea what previous
>>> allocation(s) order(s) were - we could have multiple places in the loop
>>> where _nr_pages is populated, thus we have to clear this everywhere. 
>>
>> Why? The fact you have to use such a crazy expression to even access
>> _nr_pages strongly says nothing will read it as _nr_pages.
>>
>> Explain each thing:
>>
>> 		new_page->flags.f &= ~0xffUL;	/* Clear possible order, page head */
>>
>> OK, the tail page flags need to be set right, and prep_compound_page()
>> called later depends on them being zero.
>>
>> 		((struct folio *)(new_page - 1))->_nr_pages = 0;
>>
>> Can't see a reason, nothing reads _nr_pages from a random tail
>> page. _nr_pages is the last 8 bytes of struct page so it overlaps
>> memcg_data, which is also not supposed to be read from a tail page?
>>
>> 		new_folio->mapping = NULL;
>>
>> Pointless, prep_compound_page() -> prep_compound_tail() -> p->mapping = TAIL_MAPPING;
>>
>> 		new_folio->pgmap = pgmap;	/* Also clear compound head */
>>
>> Pointless, compound_head is set in prep_compound_tail(): set_compound_head(p, head);
>>
>> 		new_folio->share = 0;   /* fsdax only, unused for device private */
>>
>> Not sure, certainly share isn't read from a tail page..
>>
>>>>> Why can't this use the normal helpers, like memmap_init_compound()?
>>>>>
>>>>>  struct folio *new_folio = page
>>>>>
>>>>>  /* First 4 tail pages are part of struct folio */
>>>>>  for (i = 4; i < (1UL << order); i++) {
>>>>>      prep_compound_tail(..)
>>>>>  }
>>>>>
>>>>>  prep_comound_head(page, order)
>>>>>  new_folio->_nr_pages = 0
>>>>>
>>>>> ??
>>>
>>> I've beat this to death with Alistair, normal helpers do not work here.
>>
>> What do you mean? It already calls prep_compound_page()! The issue
>> seems to be that prep_compound_page() makes assumptions about what
>> values are in flags already?
>>
>> So how about move that page flags mask logic into
>> prep_compound_tail()? I think that would help Vlastimil's
>> concern. That function is already touching most of the cache line so
>> an extra word shouldn't make a performance difference.
>>
>>> An order zero allocation could have _nr_pages set in its page,
>>> new_folio->_nr_pages is page + 1 memory.
>>
>> An order zero allocation does not have _nr_pages because it is in page
>> +1 memory that doesn't exist.
>>
>> An order zero allocation might have memcg_data in the same slot, does
>> it need zeroing? If so why not add that to prep_compound_head() ?
>>
>> Also, prep_compound_head() handles order 0 too:
>>
>> 	if (IS_ENABLED(CONFIG_64BIT) || order > 1) {
>> 		atomic_set(&folio->_pincount, 0);
>> 		atomic_set(&folio->_entire_mapcount, -1);
>> 	}
>> 	if (order > 1)
>> 		INIT_LIST_HEAD(&folio->_deferred_list);
>>
>> So some of the problem here looks to be not calling it:
>>
>> 	if (order)
>> 		prep_compound_page(page, order);
>>
>> So, remove that if ? Also shouldn't it be moved above the
>> set_page_count/lock_page ?
>>
> 
> I'm not addressing each comment, some might be valid, others are not.
> 
> Ok, can I rework this in a follow-up - I will commit to that? Anything
> we touch here is extremely sensitive to failures - Intel is the primary
> test vector for any modification to device pages for what I can tell.
> 
> The fact is that large device pages do not really work without this
> patch, or prior revs. I’ve spent a lot of time getting large device
> pages stable — both here and in the initial series, commiting to help in
> follow on series touch SVM related things.
> 

Matthew, I feel your frustration and appreciate your help.
For the current state of 6.19, your changes work for me, I added a
Reviewed-by to the patch. It affects a small number of drivers and makes
them work for zone device folios. I am happy to maintain the changes
sent out as a part of zone_device_page_init()

We can rework the details in a follow up series, there are many ideas
and ways of doing this (Jason, Alistair, Zi have good ideas as well).

> I’m going to miss my merge window with this (RB’d) patch blocked for
> large device pages. Expect my commitment to helping other vendors to
> drop if this happens. I’ll maybe just say: that doesn’t work in my CI,
> try again.
> 
> Or perhaps we just revert large device pages in 6.19 if we can't get a
> consensus here as we shouldn't ship a non-functional kernel.
> 
> Matt
> 
>> Jason

next prev parent reply	other threads:[~2026-01-17  4:42 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 13:10   ` Balbir Singh
2026-01-16 16:07   ` Vlastimil Babka
2026-01-16 17:20     ` Jason Gunthorpe
2026-01-16 17:27       ` Vlastimil Babka
2026-01-22  8:02     ` Vlastimil Babka
2026-01-16 17:49   ` Jason Gunthorpe
2026-01-16 19:17     ` Vlastimil Babka
2026-01-16 20:31       ` Matthew Brost
2026-01-17  0:51         ` Jason Gunthorpe
2026-01-17  3:55           ` Matthew Brost
2026-01-17  4:42             ` Balbir Singh [this message]
2026-01-17  5:27               ` Matthew Brost
2026-01-19  5:59                 ` Alistair Popple
2026-01-19 14:20                   ` Jason Gunthorpe
2026-01-19 20:09                     ` Zi Yan
2026-01-19 20:35                       ` Jason Gunthorpe
2026-01-19 22:15                         ` Balbir Singh
2026-01-20  2:50                           ` Zi Yan
2026-01-20 13:53                             ` Jason Gunthorpe
2026-01-21  3:01                               ` Zi Yan
2026-01-22  7:19                                 ` Matthew Brost
2026-01-22  8:00                                   ` Vlastimil Babka
2026-01-22  9:10                                     ` Balbir Singh
2026-01-22 21:41                                       ` Andrew Morton
2026-01-22 22:53                                         ` Alistair Popple
2026-01-23  6:45                                         ` Vlastimil Babka
2026-01-22 14:29                                   ` Jason Gunthorpe
2026-01-22 15:46                                 ` Jason Gunthorpe
2026-01-23  2:41                                   ` Zi Yan
2026-01-23 14:19                                     ` Jason Gunthorpe
2026-01-21  3:51                             ` Balbir Singh
2026-01-17  0:19       ` Jason Gunthorpe
2026-01-19  5:41         ` Alistair Popple
2026-01-19 14:24           ` Jason Gunthorpe
2026-01-16 22:34   ` Andrew Morton
2026-01-16 22:36     ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37   ` Balbir Singh
2026-01-16 12:02     ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb94d115-18a6-455b-b020-f18f372e283a@nvidia.com \
    --to=balbirs@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=chleroy@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=maddy@linux.ibm.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox