From: Alistair Popple <apopple@nvidia.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Oscar Salvador <osalvador@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Balbir Singh <balbirs@nvidia.com>,
linux-mm@kvack.org, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/zone_device: Do not touch device folio after calling ->folio_free()
Date: Fri, 17 Apr 2026 09:40:05 +1000 [thread overview]
Message-ID: <aeFva5qU7gSmlQpE@nvdebian.thelocal> (raw)
In-Reply-To: <7c5cffc5-f4db-490f-b8aa-6604c32b34f2@kernel.org>
On 2026-04-16 at 18:52 +1000, "David Hildenbrand (Arm)" <david@kernel.org> wrote...
> On 4/13/26 06:06, Alistair Popple wrote:
> > On 2026-04-11 at 09:03 +1000, Matthew Brost <matthew.brost@intel.com> wrote...
> >> The contents of a device folio can immediately change after calling
> >> ->folio_free(), as the folio may be reallocated by a driver with a
> >> different order. Instead of touching the folio again to extract the
> >> pgmap, use the local stack variable when calling percpu_ref_put_many().
> >>
> >> Cc: David Hildenbrand <david@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Balbir Singh <balbirs@nvidia.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: linux-cxl@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios")
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>
> >> ---
> >> Stack trace:
> >>
> >> [ 631.875165] [IGT] xe_exec_system_allocator: starting subtest threads-many-new-prefetch
> >> [ 632.282992] Oops: general protection fault, probably for non-canonical address 0x900000000000000: 0000 [#1] SMP NOPTI
> >> [ 632.293469] CPU: 8 UID: 0 PID: 59267 Comm: xe_exec_system_ Not tainted 7.0.0-rc7-xe+ #281 PREEMPT(full)
> >> [ 632.316023] RIP: 0010:free_zone_device_folio+0x149/0x240
> >> [ 632.339782] RSP: 0000:ffffc90023d1fd00 EFLAGS: 00010206
> >> [ 632.344947] RAX: 0900000000000000 RBX: 0000000000000001 RCX: 0000000094472d4d
> >> [ 632.351991] RDX: ffffffff8155c76f RSI: 000000006f2213bf RDI: 000000008e84943a
> >> [ 632.359042] RBP: ffffea0ff4030001 R08: 0000000000000000 R09: 0000000000000001
> >> [ 632.366094] R10: 0000000000000028 R11: 0000000000000000 R12: ffff88811828e400
> >> [ 632.373145] R13: 0000000000000000 R14: 000fffffc0000000 R15: 0000000000100073
> >> [ 632.380194] FS: 00007f2f0fdfe6c0(0000) GS:ffff88890a7e7000(0000) knlGS:0000000000000000
> >> [ 632.388186] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 632.393870] CR2: 00007f2f002e90f8 CR3: 0000000106708002 CR4: 0000000000f70ef0
> >> [ 632.400919] PKRU: 55555554
> >> [ 632.403605] Call Trace:
> >> [ 632.406039] <TASK>
> >> [ 632.408131] do_swap_page+0x146d/0x18c0
> >> [ 632.411938] ? __pte_offset_map+0x3e/0x190
> >> [ 632.415994] __handle_mm_fault+0x6e8/0x8d0
> >> [ 632.420053] handle_mm_fault+0xbf/0x250
> >> [ 632.423855] ? lock_mm_and_find_vma+0x41/0x6f0
> >> [ 632.428256] do_user_addr_fault+0x168/0x690
> >> [ 632.432399] exc_page_fault+0x74/0x200
> >> [ 632.436117] asm_exc_page_fault+0x26/0x30
> >> [ 632.440092] RIP: 0033:0x5587554ff70d
> >> [ 632.462142] RSP: 002b:00007f2f0fdfc970 EFLAGS: 00010246
> >> [ 632.467308] RAX: 0000000000003fc0 RBX: 00007f2f082e1fc0 RCX: 00007f2f12b3287d
> >> [ 632.474355] RDX: 0000000000000000 RSI: 00000000c048644a RDI: 0000000000000003
> >> [ 632.481404] RBP: 00007f2f082e1fc0 R08: 00007f2f0fdfc958 R09: 0000000000000066
> >> [ 632.488450] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> >> [ 632.495495] R13: 00007f2f082de000 R14: 0000000000c00002 R15: 00007f2f1319e000
> >> [ 632.502547] </TASK>
> >
> > I'm not sure, but I think Andrew likes the stack traces included in the actual
> > commit messages. I've certainly found it helpful when debugging traces reported
> > from the field so would prefer it there.
>
> Agreed.
>
> >
> >> ---
> >> mm/memremap.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/memremap.c b/mm/memremap.c
> >> index ac7be07e3361..053842d45cb1 100644
> >> --- a/mm/memremap.c
> >> +++ b/mm/memremap.c
> >> @@ -454,7 +454,7 @@ void free_zone_device_folio(struct folio *folio)
> >> if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >> break;
> >> pgmap->ops->folio_free(folio);
> >> - percpu_ref_put_many(&folio->pgmap->ref, nr);
> >> + percpu_ref_put_many(&pgmap->ref, nr);
> >
>
> I assume the ref keeps pgmap alive, such that that cannot go away after
> the folio_free().
Drivers keep the pgmap alive by holding the initial pgmap->ref from the
percpu_ref_init() initialisation in memremap_pages(). They release it when done
with the range as part of memunmap_pages() which the driver should only call
when all pages have been freed (at least for PRIVATE/COHERENT pages).
In practice we could drop the whole pgmap->ref counting for ZONE_DEVICE pages
(certainly for PRIVATE and COHERENT variants anyway). Now they are refcounted
normally we could just scan the page range as a BUG/WARN_ON check to see if any
are in use. I haven't really felt the need to do that though because the check
already exists and scanning the whole pgmap range for pages with a non-zero
refcount would be slow just for a debug check.
- Alistair
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
> --
> Cheers,
>
> David
prev parent reply other threads:[~2026-04-16 23:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 23:03 Matthew Brost
2026-04-10 23:26 ` Matthew Brost
2026-04-12 1:32 ` Balbir Singh
2026-04-12 4:39 ` Vishal Moola
2026-04-13 4:06 ` Alistair Popple
2026-04-16 8:52 ` David Hildenbrand (Arm)
2026-04-16 23:40 ` Alistair Popple [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeFva5qU7gSmlQpE@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.brost@intel.com \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox