linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Zi Yan" <ziy@nvidia.com>, "Alistair Popple" <apopple@nvidia.com>,
	"adhavan Srinivasan" <maddy@linux.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Balbir Singh" <balbirs@nvidia.com>,
	linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	nouveau@lists.freedesktop.org, linux-mm@kvack.org,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Fri, 16 Jan 2026 19:55:59 -0800	[thread overview]
Message-ID: <aWsIT8A2dLciFvhj@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260117005114.GC1134360@nvidia.com>

On Fri, Jan 16, 2026 at 08:51:14PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 16, 2026 at 12:31:25PM -0800, Matthew Brost wrote:
> > > I suppose we could be getting say an order-9 folio that was previously used
> > > as two order-8 folios? And each of them had their _nr_pages in their head
> > 
> > Yes, this is a good example. At this point we have idea what previous
> > allocation(s) order(s) were - we could have multiple places in the loop
> > where _nr_pages is populated, thus we have to clear this everywhere. 
> 
> Why? The fact you have to use such a crazy expression to even access
> _nr_pages strongly says nothing will read it as _nr_pages.
> 
> Explain each thing:
> 
> 		new_page->flags.f &= ~0xffUL;	/* Clear possible order, page head */
> 
> OK, the tail page flags need to be set right, and prep_compound_page()
> called later depends on them being zero.
> 
> 		((struct folio *)(new_page - 1))->_nr_pages = 0;
> 
> Can't see a reason, nothing reads _nr_pages from a random tail
> page. _nr_pages is the last 8 bytes of struct page so it overlaps
> memcg_data, which is also not supposed to be read from a tail page?
> 
> 		new_folio->mapping = NULL;
> 
> Pointless, prep_compound_page() -> prep_compound_tail() -> p->mapping = TAIL_MAPPING;
> 
> 		new_folio->pgmap = pgmap;	/* Also clear compound head */
> 
> Pointless, compound_head is set in prep_compound_tail(): set_compound_head(p, head);
> 
> 		new_folio->share = 0;   /* fsdax only, unused for device private */
> 
> Not sure, certainly share isn't read from a tail page..
> 
> > > > Why can't this use the normal helpers, like memmap_init_compound()?
> > > > 
> > > >  struct folio *new_folio = page
> > > > 
> > > >  /* First 4 tail pages are part of struct folio */
> > > >  for (i = 4; i < (1UL << order); i++) {
> > > >      prep_compound_tail(..)
> > > >  }
> > > > 
> > > >  prep_comound_head(page, order)
> > > >  new_folio->_nr_pages = 0
> > > > 
> > > > ??
> > 
> > I've beat this to death with Alistair, normal helpers do not work here.
> 
> What do you mean? It already calls prep_compound_page()! The issue
> seems to be that prep_compound_page() makes assumptions about what
> values are in flags already?
> 
> So how about move that page flags mask logic into
> prep_compound_tail()? I think that would help Vlastimil's
> concern. That function is already touching most of the cache line so
> an extra word shouldn't make a performance difference.
> 
> > An order zero allocation could have _nr_pages set in its page,
> > new_folio->_nr_pages is page + 1 memory.
> 
> An order zero allocation does not have _nr_pages because it is in page
> +1 memory that doesn't exist.
> 
> An order zero allocation might have memcg_data in the same slot, does
> it need zeroing? If so why not add that to prep_compound_head() ?
> 
> Also, prep_compound_head() handles order 0 too:
> 
> 	if (IS_ENABLED(CONFIG_64BIT) || order > 1) {
> 		atomic_set(&folio->_pincount, 0);
> 		atomic_set(&folio->_entire_mapcount, -1);
> 	}
> 	if (order > 1)
> 		INIT_LIST_HEAD(&folio->_deferred_list);
> 
> So some of the problem here looks to be not calling it:
> 
> 	if (order)
> 		prep_compound_page(page, order);
> 
> So, remove that if ? Also shouldn't it be moved above the
> set_page_count/lock_page ?
> 

I'm not addressing each comment, some might be valid, others are not.

Ok, can I rework this in a follow-up - I will commit to that? Anything
we touch here is extremely sensitive to failures - Intel is the primary
test vector for any modification to device pages for what I can tell.

The fact is that large device pages do not really work without this
patch, or prior revs. I’ve spent a lot of time getting large device
pages stable — both here and in the initial series, commiting to help in
follow on series touch SVM related things.

I’m going to miss my merge window with this (RB’d) patch blocked for
large device pages. Expect my commitment to helping other vendors to
drop if this happens. I’ll maybe just say: that doesn’t work in my CI,
try again.

Or perhaps we just revert large device pages in 6.19 if we can't get a
consensus here as we shouldn't ship a non-functional kernel.

Matt

> Jason


  reply	other threads:[~2026-01-17  3:56 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 13:10   ` Balbir Singh
2026-01-16 16:07   ` Vlastimil Babka
2026-01-16 17:20     ` Jason Gunthorpe
2026-01-16 17:27       ` Vlastimil Babka
2026-01-22  8:02     ` Vlastimil Babka
2026-01-16 17:49   ` Jason Gunthorpe
2026-01-16 19:17     ` Vlastimil Babka
2026-01-16 20:31       ` Matthew Brost
2026-01-17  0:51         ` Jason Gunthorpe
2026-01-17  3:55           ` Matthew Brost [this message]
2026-01-17  4:42             ` Balbir Singh
2026-01-17  5:27               ` Matthew Brost
2026-01-19  5:59                 ` Alistair Popple
2026-01-19 14:20                   ` Jason Gunthorpe
2026-01-19 20:09                     ` Zi Yan
2026-01-19 20:35                       ` Jason Gunthorpe
2026-01-19 22:15                         ` Balbir Singh
2026-01-20  2:50                           ` Zi Yan
2026-01-20 13:53                             ` Jason Gunthorpe
2026-01-21  3:01                               ` Zi Yan
2026-01-22  7:19                                 ` Matthew Brost
2026-01-22  8:00                                   ` Vlastimil Babka
2026-01-22  9:10                                     ` Balbir Singh
2026-01-22 21:41                                       ` Andrew Morton
2026-01-22 22:53                                         ` Alistair Popple
2026-01-23  6:45                                         ` Vlastimil Babka
2026-01-22 14:29                                   ` Jason Gunthorpe
2026-01-22 15:46                                 ` Jason Gunthorpe
2026-01-23  2:41                                   ` Zi Yan
2026-01-23 14:19                                     ` Jason Gunthorpe
2026-01-21  3:51                             ` Balbir Singh
2026-01-17  0:19       ` Jason Gunthorpe
2026-01-19  5:41         ` Alistair Popple
2026-01-19 14:24           ` Jason Gunthorpe
2026-01-16 22:34   ` Andrew Morton
2026-01-16 22:36     ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37   ` Balbir Singh
2026-01-16 12:02     ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWsIT8A2dLciFvhj@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=chleroy@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=maddy@linux.ibm.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox