* [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages [not found] <CABXGCs03XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com> @ 2026-02-06 17:40 ` Mikhail Gavrilov 2026-02-06 18:08 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-06 17:40 UTC (permalink / raw) To: linux-mm Cc: akpm, vbabka, chrisl, kasong, hughd, ryncsn, Mikhail Gavrilov, stable When vmalloc allocates high-order pages and splits them via split_page(), tail pages may retain stale page->private values from previous use by the buddy allocator. This causes a use-after-free in the swap subsystem. The swap code uses vmalloc_to_page() to get struct page pointers for swap_map, then uses page->private to track swap count continuations. In add_swap_count_ continuation(), the condition "if (!page_private(head))" assumes fresh pages have page->private == 0, but tail pages from split_page() may have non-zero stale values. When page->private accidentally contains a value like SWP_CONTINUED (32), swap_count_continued() incorrectly assumes the continuation list is valid and iterates over uninitialized page->lru, which may contain LIST_POISON values from a previous list_del(), causing a crash: KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] RIP: 0010:__do_sys_swapoff+0x1151/0x1860 Fix this by clearing page->private for tail pages in split_page(). Note that we don't touch page->lru to avoid breaking split_free_page() which may have the head page on a list. Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> --- mm/page_alloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..3604a00e2118 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3122,8 +3122,14 @@ void split_page(struct page *page, unsigned int order) VM_BUG_ON_PAGE(PageCompound(page), page); VM_BUG_ON_PAGE(!page_count(page), page); - for (i = 1; i < (1 << order); i++) + for (i = 1; i < (1 << order); i++) { set_page_refcounted(page + i); + /* + * Tail pages may have stale page->private from buddy + * allocator or previous use. Clear it. + */ + set_page_private(page + i, 0); + } split_page_owner(page, order, 0); pgalloc_tag_split(page_folio(page), order, 0); split_page_memcg(page, order); -- 2.53.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 17:40 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Mikhail Gavrilov @ 2026-02-06 18:08 ` Zi Yan 2026-02-06 18:21 ` Mikhail Gavrilov 2026-02-06 18:24 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Kairui Song 0 siblings, 2 replies; 42+ messages in thread From: Zi Yan @ 2026-02-06 18:08 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb +willy, david, and others included in Andrew’s mm-commit email. On 6 Feb 2026, at 12:40, Mikhail Gavrilov wrote: > When vmalloc allocates high-order pages and splits them via split_page(), > tail pages may retain stale page->private values from previous use by the > buddy allocator. Do you have a reproducer for this issue? Last time I checked page->private usage, I find users clears ->private before free a page. I wonder which one I was missing. The comment above page_private() does say ->private can be used on tail pages. If pages are freed with non-zero private in tail pages, we need to either correct the violating user or clear all pages ->private in post_alloc_hook() in addition to the head one. Clearing ->private in split_page() looks like a hack instead of a fix. > > This causes a use-after-free in the swap subsystem. The swap code uses > vmalloc_to_page() to get struct page pointers for swap_map, then uses > page->private to track swap count continuations. In add_swap_count_ > continuation(), the condition "if (!page_private(head))" assumes fresh > pages have page->private == 0, but tail pages from split_page() may have > non-zero stale values. > > When page->private accidentally contains a value like SWP_CONTINUED (32), > swap_count_continued() incorrectly assumes the continuation list is valid > and iterates over uninitialized page->lru, which may contain LIST_POISON > values from a previous list_del(), causing a crash: > > KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] > RIP: 0010:__do_sys_swapoff+0x1151/0x1860 > > Fix this by clearing page->private for tail pages in split_page(). Note > that we don't touch page->lru to avoid breaking split_free_page() which > may have the head page on a list. > > Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") > Cc: stable@vger.kernel.org > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- > mm/page_alloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..3604a00e2118 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3122,8 +3122,14 @@ void split_page(struct page *page, unsigned int order) > VM_BUG_ON_PAGE(PageCompound(page), page); > VM_BUG_ON_PAGE(!page_count(page), page); > > - for (i = 1; i < (1 << order); i++) > + for (i = 1; i < (1 << order); i++) { > set_page_refcounted(page + i); > + /* > + * Tail pages may have stale page->private from buddy > + * allocator or previous use. Clear it. > + */ > + set_page_private(page + i, 0); > + } > split_page_owner(page, order, 0); > pgalloc_tag_split(page_folio(page), order, 0); > split_page_memcg(page, order); > -- > 2.53.0 Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 18:08 ` Zi Yan @ 2026-02-06 18:21 ` Mikhail Gavrilov 2026-02-06 18:29 ` Zi Yan 2026-02-06 18:24 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Kairui Song 1 sibling, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-06 18:21 UTC (permalink / raw) To: Zi Yan Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb Hi, Yan On Fri, Feb 6, 2026 at 11:08 PM Zi Yan <ziy@nvidia.com> wrote: > > Do you have a reproducer for this issue? Yes, I have a stress test that reliably reproduces the crash. It cycles swapon/swapoff on 8GB zram under memory pressure: https://gist.github.com/NTMan/4ed363793ebd36bd702a39283f06cee1 > Last time I checked page->private usage, I find users clears ->private before free a page. > I wonder which one I was missing. The issue is not about freeing - it's about allocation. When buddy allocator merges/splits pages, it uses page->private to store order. When a high-order page is later allocated and split via split_page(), tail pages still have their old page->private values. The path is: 1. Page freed → free_pages_prepare() does NOT clear page->private 2. Page goes to buddy allocator → buddy uses page->private for order 3. Page allocated as high-order → post_alloc_hook() only clears head page's private 4. split_page() called → tail pages keep stale page->private > Clearing ->private in split_page() looks like a hack instead of a fix. I discussed this with Kairui Song earlier in the thread. We considered: 1. Fix in post_alloc_hook() - would need to clear all pages, not just head 2. Fix in swapfile.c - doesn't work because stale value could accidentally equal SWP_CONTINUED 3. Fix in split_page() - ensures pages are properly initialized for independent use The comment in vmalloc.c says split pages should be usable independently ("some use page->mapping, page->lru, etc."), so split_page() initializing the pages seems appropriate. But I agree post_alloc_hook() might be a cleaner place. Would you prefer a patch there instead? -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 18:21 ` Mikhail Gavrilov @ 2026-02-06 18:29 ` Zi Yan 2026-02-06 18:33 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-06 18:29 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On 6 Feb 2026, at 13:21, Mikhail Gavrilov wrote: > Hi, Yan > > On Fri, Feb 6, 2026 at 11:08 PM Zi Yan <ziy@nvidia.com> wrote: >> >> Do you have a reproducer for this issue? > > Yes, I have a stress test that reliably reproduces the crash. > It cycles swapon/swapoff on 8GB zram under memory pressure: > https://gist.github.com/NTMan/4ed363793ebd36bd702a39283f06cee1 Got it. Merging replies from Kairui from another email: This patch is from previous discussion: https://lore.kernel.org/linux-mm/CABXGCsO3XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com/ > >> Last time I checked page->private usage, I find users clears ->private before free a page. >> I wonder which one I was missing. > > The issue is not about freeing - it's about allocation. > When buddy allocator merges/splits pages, it uses page->private to store order. > When a high-order page is later allocated and split via split_page(), > tail pages still have their old page->private values. > The path is: > 1. Page freed → free_pages_prepare() does NOT clear page->private > 2. Page goes to buddy allocator → buddy uses page->private for order > 3. Page allocated as high-order → post_alloc_hook() only clears head > page's private > 4. split_page() called → tail pages keep stale page->private > >> Clearing ->private in split_page() looks like a hack instead of a fix. > > I discussed this with Kairui Song earlier in the thread. We considered: > > 1. Fix in post_alloc_hook() - would need to clear all pages, not just head > 2. Fix in swapfile.c - doesn't work because stale value could > accidentally equal SWP_CONTINUED > 3. Fix in split_page() - ensures pages are properly initialized for > independent use > > The comment in vmalloc.c says split pages should be usable > independently ("some use page->mapping, page->lru, etc."), so > split_page() initializing the pages seems appropriate. > > But I agree post_alloc_hook() might be a cleaner place. Would you > prefer a patch there instead? > > -- > Best Regards, > Mike Gavrilov. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 18:29 ` Zi Yan @ 2026-02-06 18:33 ` Zi Yan 2026-02-06 19:58 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-06 18:33 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song Hit send too soon, sorry about that. On 6 Feb 2026, at 13:29, Zi Yan wrote: > On 6 Feb 2026, at 13:21, Mikhail Gavrilov wrote: > >> Hi, Yan >> >> On Fri, Feb 6, 2026 at 11:08 PM Zi Yan <ziy@nvidia.com> wrote: >>> >>> Do you have a reproducer for this issue? >> >> Yes, I have a stress test that reliably reproduces the crash. >> It cycles swapon/swapoff on 8GB zram under memory pressure: >> https://gist.github.com/NTMan/4ed363793ebd36bd702a39283f06cee1 Got it. Merging replies from Kairui from another email: This patch is from previous discussion: https://lore.kernel.org/linux-mm/CABXGCsO3XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com/ It looks odd to me too. That bug starts with vmalloc dropping __GFP_COMP in commit 3b8000ae185c, because with __GFP_COMP, the allocator does clean the ->private of tail pages on allocation with prep_compound_page. Without __GFP_COMP, these ->private fields are left as it is. >> >>> Last time I checked page->private usage, I find users clears ->private before free a page. >>> I wonder which one I was missing. >> >> The issue is not about freeing - it's about allocation. I assume everyone zeros used ->private, head or tail, so PageBuddy has all zeroed ->private. >> When buddy allocator merges/splits pages, it uses page->private to store order. >> When a high-order page is later allocated and split via split_page(), >> tail pages still have their old page->private values. No, in __free_one_page(), if a free page is merged to a higher order, it is deleted from free list and its ->private is zeroed. There should not be any non zero private. >> The path is: >> 1. Page freed → free_pages_prepare() does NOT clear page->private Right. The code assume page->private is zero for all pages, head or tail if it is compound. >> 2. Page goes to buddy allocator → buddy uses page->private for order >> 3. Page allocated as high-order → post_alloc_hook() only clears head >> page's private >> 4. split_page() called → tail pages keep stale page->private >> >>> Clearing ->private in split_page() looks like a hack instead of a fix. >> >> I discussed this with Kairui Song earlier in the thread. We considered: >> >> 1. Fix in post_alloc_hook() - would need to clear all pages, not just head >> 2. Fix in swapfile.c - doesn't work because stale value could >> accidentally equal SWP_CONTINUED >> 3. Fix in split_page() - ensures pages are properly initialized for >> independent use >> >> The comment in vmalloc.c says split pages should be usable >> independently ("some use page->mapping, page->lru, etc."), so >> split_page() initializing the pages seems appropriate. >> >> But I agree post_alloc_hook() might be a cleaner place. Would you >> prefer a patch there instead? I think it is better to find out which code causes non zero ->private at page free time. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 18:33 ` Zi Yan @ 2026-02-06 19:58 ` Zi Yan 2026-02-06 20:49 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-06 19:58 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On 6 Feb 2026, at 13:33, Zi Yan wrote: > Hit send too soon, sorry about that. > > On 6 Feb 2026, at 13:29, Zi Yan wrote: > >> On 6 Feb 2026, at 13:21, Mikhail Gavrilov wrote: >> >>> Hi, Yan >>> >>> On Fri, Feb 6, 2026 at 11:08 PM Zi Yan <ziy@nvidia.com> wrote: >>>> >>>> Do you have a reproducer for this issue? >>> >>> Yes, I have a stress test that reliably reproduces the crash. >>> It cycles swapon/swapoff on 8GB zram under memory pressure: >>> https://gist.github.com/NTMan/4ed363793ebd36bd702a39283f06cee1 > > Got it. > > Merging replies from Kairui from another email: > > This patch is from previous discussion: > https://lore.kernel.org/linux-mm/CABXGCsO3XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com/ > > It looks odd to me too. That bug starts with vmalloc dropping > __GFP_COMP in commit 3b8000ae185c, because with __GFP_COMP, the > allocator does clean the ->private of tail pages on allocation with > prep_compound_page. Without __GFP_COMP, these ->private fields are > left as it is. > >>> >>>> Last time I checked page->private usage, I find users clears ->private before free a page. >>>> I wonder which one I was missing. >>> >>> The issue is not about freeing - it's about allocation. > > I assume everyone zeros used ->private, head or tail, so PageBuddy has > all zeroed ->private. > >>> When buddy allocator merges/splits pages, it uses page->private to store order. >>> When a high-order page is later allocated and split via split_page(), >>> tail pages still have their old page->private values. > > No, in __free_one_page(), if a free page is merged to a higher order, > it is deleted from free list and its ->private is zeroed. There should not > be any non zero private. > >>> The path is: >>> 1. Page freed → free_pages_prepare() does NOT clear page->private > > Right. The code assume page->private is zero for all pages, head or tail > if it is compound. > >>> 2. Page goes to buddy allocator → buddy uses page->private for order >>> 3. Page allocated as high-order → post_alloc_hook() only clears head >>> page's private >>> 4. split_page() called → tail pages keep stale page->private >>> >>>> Clearing ->private in split_page() looks like a hack instead of a fix. >>> >>> I discussed this with Kairui Song earlier in the thread. We considered: >>> >>> 1. Fix in post_alloc_hook() - would need to clear all pages, not just head >>> 2. Fix in swapfile.c - doesn't work because stale value could >>> accidentally equal SWP_CONTINUED >>> 3. Fix in split_page() - ensures pages are properly initialized for >>> independent use >>> >>> The comment in vmalloc.c says split pages should be usable >>> independently ("some use page->mapping, page->lru, etc."), so >>> split_page() initializing the pages seems appropriate. >>> >>> But I agree post_alloc_hook() might be a cleaner place. Would you >>> prefer a patch there instead? > > I think it is better to find out which code causes non zero ->private > at page free time. Hi Mikhail, Do you mind sharing the kernel config? I am trying to reproduce it locally but have no luck (Iteration 111 and going) so far. Thanks. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 19:58 ` Zi Yan @ 2026-02-06 20:49 ` Zi Yan 2026-02-06 22:16 ` Mikhail Gavrilov 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-06 20:49 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On 6 Feb 2026, at 14:58, Zi Yan wrote: > On 6 Feb 2026, at 13:33, Zi Yan wrote: > >> Hit send too soon, sorry about that. >> >> On 6 Feb 2026, at 13:29, Zi Yan wrote: >> >>> On 6 Feb 2026, at 13:21, Mikhail Gavrilov wrote: >>> >>>> Hi, Yan >>>> >>>> On Fri, Feb 6, 2026 at 11:08 PM Zi Yan <ziy@nvidia.com> wrote: >>>>> >>>>> Do you have a reproducer for this issue? >>>> >>>> Yes, I have a stress test that reliably reproduces the crash. >>>> It cycles swapon/swapoff on 8GB zram under memory pressure: >>>> https://gist.github.com/NTMan/4ed363793ebd36bd702a39283f06cee1 >> >> Got it. >> >> Merging replies from Kairui from another email: >> >> This patch is from previous discussion: >> https://lore.kernel.org/linux-mm/CABXGCsO3XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com/ >> >> It looks odd to me too. That bug starts with vmalloc dropping >> __GFP_COMP in commit 3b8000ae185c, because with __GFP_COMP, the >> allocator does clean the ->private of tail pages on allocation with >> prep_compound_page. Without __GFP_COMP, these ->private fields are >> left as it is. >> >>>> >>>>> Last time I checked page->private usage, I find users clears ->private before free a page. >>>>> I wonder which one I was missing. >>>> >>>> The issue is not about freeing - it's about allocation. >> >> I assume everyone zeros used ->private, head or tail, so PageBuddy has >> all zeroed ->private. >> >>>> When buddy allocator merges/splits pages, it uses page->private to store order. >>>> When a high-order page is later allocated and split via split_page(), >>>> tail pages still have their old page->private values. >> >> No, in __free_one_page(), if a free page is merged to a higher order, >> it is deleted from free list and its ->private is zeroed. There should not >> be any non zero private. >> >>>> The path is: >>>> 1. Page freed → free_pages_prepare() does NOT clear page->private >> >> Right. The code assume page->private is zero for all pages, head or tail >> if it is compound. >> >>>> 2. Page goes to buddy allocator → buddy uses page->private for order >>>> 3. Page allocated as high-order → post_alloc_hook() only clears head >>>> page's private >>>> 4. split_page() called → tail pages keep stale page->private >>>> >>>>> Clearing ->private in split_page() looks like a hack instead of a fix. >>>> >>>> I discussed this with Kairui Song earlier in the thread. We considered: >>>> >>>> 1. Fix in post_alloc_hook() - would need to clear all pages, not just head >>>> 2. Fix in swapfile.c - doesn't work because stale value could >>>> accidentally equal SWP_CONTINUED >>>> 3. Fix in split_page() - ensures pages are properly initialized for >>>> independent use >>>> >>>> The comment in vmalloc.c says split pages should be usable >>>> independently ("some use page->mapping, page->lru, etc."), so >>>> split_page() initializing the pages seems appropriate. >>>> >>>> But I agree post_alloc_hook() might be a cleaner place. Would you >>>> prefer a patch there instead? >> >> I think it is better to find out which code causes non zero ->private >> at page free time. > > Hi Mikhail, > > Do you mind sharing the kernel config? I am trying to reproduce it locally > but have no luck (Iteration 111 and going) so far. > It seems that I reproduced it locally after enabling KASAN. And page owner seems to tell that it is KASAN code causing the issue. I added the patch below to dump_page() and dump_stack() when a freeing page’s private is not zero. It is on top of 6.19-rc7. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..2151c847c35d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1402,6 +1402,10 @@ __always_inline bool free_pages_prepare(struct page *page, #endif } for (i = 1; i < (1 << order); i++) { + if ((page + i)->private) { + dump_page(page + i, "non zero private"); + dump_stack(); + } if (compound) bad += free_tail_page_prepare(page, page + i); if (is_check_pages_enabled()) { Kernel dump below says the page with non zero private was allocated in kasan_save_stack() and freed in kasan_save_stack(). So fix kasan instead? ;) qemu-vm login: [ 59.753874] zram: Added device: zram0 [ 61.112878] zram0: detected capacity change from 0 to 16777216 [ 61.131201] Adding 8388604k swap on /dev/zram0. Priority:100 extents:1 across:8388604k SS [ 71.001984] zram0: detected capacity change from 16777216 to 0 [ 71.089084] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888131a9da00 pfn:0x131a9d [ 71.090751] flags: 0x100000000000000(node=0|zone=2) [ 71.091643] raw: 0100000000000000 dead000000000100 dead000000000122 0000000000000000 [ 71.092913] raw: ffff888131a9da00 0000000000100000 00000000ffffffff 0000000000000000 [ 71.094336] page dumped because: non zero private [ 71.095000] page_owner tracks the page as allocated [ 71.095871] page last allocated via order 2, migratetype Unmovable, gfp_mask 0x92cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOMEMALLOC), pid 834, tgid 834 (rmmod), ts 71089064250, free_ts 67872485904 [ 71.099315] get_page_from_freelist+0x79b/0x3fa0 [ 71.100216] __alloc_frozen_pages_noprof+0x245/0x2160 [ 71.101177] alloc_pages_mpol+0x14c/0x360 [ 71.101704] alloc_pages_noprof+0xfa/0x320 [ 71.102497] stack_depot_save_flags+0x81c/0x8e0 [ 71.103100] kasan_save_stack+0x3f/0x50 [ 71.103624] kasan_save_track+0x17/0x60 [ 71.104168] __kasan_slab_alloc+0x63/0x80 [ 71.104694] kmem_cache_alloc_lru_noprof+0x143/0x550 [ 71.105385] __d_alloc+0x2f/0x850 [ 71.105831] d_alloc_parallel+0xcd/0xc50 [ 71.106395] __lookup_slow+0xec/0x320 [ 71.106880] lookup_slow+0x4f/0x80 [ 71.107463] lookup_noperm_positive_unlocked+0x7d/0xb0 [ 71.108173] debugfs_lookup+0x74/0xe0 [ 71.108660] debugfs_lookup_and_remove+0xa/0x70 [ 71.109363] page last free pid 808 tgid 808 stack trace: [ 71.110058] register_dummy_stack+0x6d/0xb0 [ 71.110749] init_page_owner+0x2e/0x680 [ 71.111296] page_ext_init+0x485/0x4b0 [ 71.111902] mm_core_init+0x157/0x170 [ 71.112422] CPU: 1 UID: 0 PID: 834 Comm: rmmod Not tainted 6.19.0-rc7-dirty #201 PREEMPT(voluntary) [ 71.112427] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 [ 71.112431] Call Trace: [ 71.112434] <TASK> [ 71.112436] dump_stack_lvl+0x4d/0x70 [ 71.112441] __free_frozen_pages+0xef3/0x1100 [ 71.112444] stack_depot_save_flags+0x4d6/0x8e0 [ 71.112447] ? __d_alloc+0x2f/0x850 [ 71.112450] kasan_save_stack+0x3f/0x50 [ 71.112454] ? kasan_save_stack+0x30/0x50 [ 71.112458] ? kasan_save_track+0x17/0x60 [ 71.112461] ? __kasan_slab_alloc+0x63/0x80 [ 71.112464] ? kmem_cache_alloc_lru_noprof+0x143/0x550 [ 71.112469] ? __d_alloc+0x2f/0x850 [ 71.112473] ? d_alloc_parallel+0xcd/0xc50 [ 71.112477] ? __lookup_slow+0xec/0x320 [ 71.112480] ? lookup_slow+0x4f/0x80 [ 71.112484] ? lookup_noperm_positive_unlocked+0x7d/0xb0 [ 71.112488] ? debugfs_lookup+0x74/0xe0 [ 71.112492] ? debugfs_lookup_and_remove+0xa/0x70 [ 71.112495] ? kmem_cache_destroy+0xbe/0x1a0 [ 71.112500] ? zs_destroy_pool+0x145/0x200 [zsmalloc] [ 71.112506] ? zram_reset_device+0x210/0x5e0 [zram] [ 71.112514] ? zram_remove.part.0.cold+0x8f/0x37f [zram] [ 71.112522] ? idr_for_each+0x10b/0x200 [ 71.112526] ? destroy_devices+0x21/0x57 [zram] [ 71.112533] ? __do_sys_delete_module+0x33f/0x500 [ 71.112537] ? do_syscall_64+0xa4/0xf80 [ 71.112541] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 71.112548] kasan_save_track+0x17/0x60 [ 71.112551] __kasan_slab_alloc+0x63/0x80 [ 71.112556] kmem_cache_alloc_lru_noprof+0x143/0x550 [ 71.112560] ? kernfs_put.part.0+0x14d/0x340 [ 71.112564] __d_alloc+0x2f/0x850 [ 71.112568] ? destroy_devices+0x21/0x57 [zram] [ 71.112575] ? __do_sys_delete_module+0x33f/0x500 [ 71.112579] d_alloc_parallel+0xcd/0xc50 [ 71.112583] ? __pfx_d_alloc_parallel+0x10/0x10 [ 71.112586] __lookup_slow+0xec/0x320 [ 71.112590] ? __pfx___lookup_slow+0x10/0x10 [ 71.112594] ? down_read+0x132/0x240 [ 71.112598] ? __pfx_down_read+0x10/0x10 [ 71.112601] ? __d_lookup+0x17b/0x1e0 [ 71.112605] lookup_slow+0x4f/0x80 [ 71.112610] lookup_noperm_positive_unlocked+0x7d/0xb0 [ 71.112614] debugfs_lookup+0x74/0xe0 [ 71.112618] debugfs_lookup_and_remove+0xa/0x70 [ 71.112623] kmem_cache_destroy+0xbe/0x1a0 [ 71.112626] zs_destroy_pool+0x145/0x200 [zsmalloc] [ 71.112631] ? __pfx_zram_remove_cb+0x10/0x10 [zram] [ 71.112638] zram_reset_device+0x210/0x5e0 [zram] [ 71.112645] ? __pfx_zram_remove_cb+0x10/0x10 [zram] [ 71.112651] ? __pfx_zram_remove_cb+0x10/0x10 [zram] [ 71.112658] zram_remove.part.0.cold+0x8f/0x37f [zram] [ 71.112665] ? __pfx_zram_remove_cb+0x10/0x10 [zram] [ 71.112671] idr_for_each+0x10b/0x200 [ 71.112675] ? kasan_save_track+0x25/0x60 [ 71.112678] ? __pfx_idr_for_each+0x10/0x10 [ 71.112681] ? kfree+0x16e/0x490 [ 71.112685] destroy_devices+0x21/0x57 [zram] [ 71.112692] __do_sys_delete_module+0x33f/0x500 [ 71.112696] ? __pfx___do_sys_delete_module+0x10/0x10 [ 71.112702] do_syscall_64+0xa4/0xf80 [ 71.112706] entry_SYSCALL_64_after_hwframe+0x77/0x7f Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 20:49 ` Zi Yan @ 2026-02-06 22:16 ` Mikhail Gavrilov 2026-02-06 22:37 ` Mikhail Gavrilov 0 siblings, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-06 22:16 UTC (permalink / raw) To: Zi Yan Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On Sat, Feb 7, 2026 at 1:49 AM Zi Yan <ziy@nvidia.com> wrote: > > It seems that I reproduced it locally after enabling KASAN. And page owner > seems to tell that it is KASAN code causing the issue. I added the patch > below to dump_page() and dump_stack() when a freeing page’s private > is not zero. It is on top of 6.19-rc7. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..2151c847c35d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1402,6 +1402,10 @@ __always_inline bool free_pages_prepare(struct page *page, > #endif > } > for (i = 1; i < (1 << order); i++) { > + if ((page + i)->private) { > + dump_page(page + i, "non zero private"); > + dump_stack(); > + } > if (compound) > bad += free_tail_page_prepare(page, page + i); > if (is_check_pages_enabled()) { > > Kernel dump below says the page with non zero private was allocated > in kasan_save_stack() and freed in kasan_save_stack(). > > So fix kasan instead? ;) > Hi Zi, Thanks for the deep investigation! So the actual culprit is KASAN's kasan_save_stack() leaving non-zero page->private. That explains why it only reproduces with KASAN enabled. Looking at the code, kasan_save_stack() doesn't seem to use page->private directly - it goes through stack_depot. Is stack_depot the actual culprit? Happy to help investigate further if needed. Regarding the fix location - even if we fix KASAN/stack_depot, split_page() clearing page->private still seems like the right defensive fix. The contract for split_page() is that it produces independent usable pages, and page->private being clean is part of that. Other code could potentially leave stale values too. I can share my .config if still needed, but it sounds like you've already reproduced it. -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 22:16 ` Mikhail Gavrilov @ 2026-02-06 22:37 ` Mikhail Gavrilov 2026-02-06 23:06 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-06 22:37 UTC (permalink / raw) To: Zi Yan Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On Sat, Feb 7, 2026 at 3:16 AM Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote: > > Hi Zi, > Thanks for the deep investigation! > So the actual culprit is KASAN's kasan_save_stack() leaving non-zero > page->private. > That explains why it only reproduces with KASAN enabled. > Looking at the code, kasan_save_stack() doesn't seem to use > page->private directly - it goes through stack_depot. Is stack_depot > the actual culprit? > Happy to help investigate further if needed. > Regarding the fix location - even if we fix KASAN/stack_depot, > split_page() clearing page->private still seems like the right > defensive fix. > The contract for split_page() is that it produces independent usable > pages, and page->private being clean is part of that. > Other code could potentially leave stale values too. > I can share my .config if still needed, but it sounds like you've > already reproduced it. > I think I found it. Looking at mm/internal.h:811, prep_compound_tail() clears page->private for tail pages, but it's only called for compound pages (__GFP_COMP). Before commit 3b8000ae185c, vmalloc used __GFP_COMP, so tail pages got their page->private cleared via prep_compound_tail(). After that commit dropped __GFP_COMP, tail pages keep stale values from buddy allocator (which uses page->private for order). So the stale value comes from buddy allocator's set_buddy_order() at mm/page_alloc.c:755, and __del_page_from_free_list() at line 898 only clears the head page's private. This confirms the split_page() fix is the right place - it ensures tail pages are properly initialized for independent use after splitting. -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 22:37 ` Mikhail Gavrilov @ 2026-02-06 23:06 ` Zi Yan 2026-02-07 3:28 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-06 23:06 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, Kairui Song On 6 Feb 2026, at 17:37, Mikhail Gavrilov wrote: > On Sat, Feb 7, 2026 at 3:16 AM Mikhail Gavrilov > <mikhail.v.gavrilov@gmail.com> wrote: >> >> Hi Zi, >> Thanks for the deep investigation! >> So the actual culprit is KASAN's kasan_save_stack() leaving non-zero >> page->private. >> That explains why it only reproduces with KASAN enabled. >> Looking at the code, kasan_save_stack() doesn't seem to use >> page->private directly - it goes through stack_depot. Is stack_depot >> the actual culprit? >> Happy to help investigate further if needed. >> Regarding the fix location - even if we fix KASAN/stack_depot, >> split_page() clearing page->private still seems like the right >> defensive fix. >> The contract for split_page() is that it produces independent usable >> pages, and page->private being clean is part of that. >> Other code could potentially leave stale values too. >> I can share my .config if still needed, but it sounds like you've >> already reproduced it. >> > > I think I found it. Looking at mm/internal.h:811, prep_compound_tail() > clears page->private for tail pages, > but it's only called for compound pages (__GFP_COMP). > Before commit 3b8000ae185c, vmalloc used __GFP_COMP, so tail pages got > their page->private cleared via prep_compound_tail(). > After that commit dropped __GFP_COMP, tail pages keep stale values > from buddy allocator (which uses page->private for order). > So the stale value comes from buddy allocator's set_buddy_order() at > mm/page_alloc.c:755, > and __del_page_from_free_list() at line 898 only clears the head page's private. set_buddy_order() also only set head page’s private. And at each buddy page merge, any buddy found in free list gets its head page’s private cleared in __del_page_from_free_list(). The final merged free page gets its private set by set_buddy_order() at done_merging. There should not be any stale values in any page’s private, if I read the code correctly. If it is the problem of buddy allocator leaving stale private values, the problem would be reproducible with and without KASAN. > This confirms the split_page() fix is the right place - it ensures > tail pages are properly initialized for independent use after > splitting. > > -- > Best Regards, > Mike Gavrilov. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 23:06 ` Zi Yan @ 2026-02-07 3:28 ` Zi Yan 2026-02-07 14:25 ` Mikhail Gavrilov 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-07 3:28 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, vbabka, Kairui Song On 6 Feb 2026, at 18:06, Zi Yan wrote: > On 6 Feb 2026, at 17:37, Mikhail Gavrilov wrote: > >> On Sat, Feb 7, 2026 at 3:16 AM Mikhail Gavrilov >> <mikhail.v.gavrilov@gmail.com> wrote: >>> >>> Hi Zi, >>> Thanks for the deep investigation! >>> So the actual culprit is KASAN's kasan_save_stack() leaving non-zero >>> page->private. >>> That explains why it only reproduces with KASAN enabled. >>> Looking at the code, kasan_save_stack() doesn't seem to use >>> page->private directly - it goes through stack_depot. Is stack_depot >>> the actual culprit? >>> Happy to help investigate further if needed. >>> Regarding the fix location - even if we fix KASAN/stack_depot, >>> split_page() clearing page->private still seems like the right >>> defensive fix. >>> The contract for split_page() is that it produces independent usable >>> pages, and page->private being clean is part of that. >>> Other code could potentially leave stale values too. >>> I can share my .config if still needed, but it sounds like you've >>> already reproduced it. >>> >> >> I think I found it. Looking at mm/internal.h:811, prep_compound_tail() >> clears page->private for tail pages, >> but it's only called for compound pages (__GFP_COMP). >> Before commit 3b8000ae185c, vmalloc used __GFP_COMP, so tail pages got >> their page->private cleared via prep_compound_tail(). >> After that commit dropped __GFP_COMP, tail pages keep stale values >> from buddy allocator (which uses page->private for order). >> So the stale value comes from buddy allocator's set_buddy_order() at >> mm/page_alloc.c:755, >> and __del_page_from_free_list() at line 898 only clears the head page's private. > > set_buddy_order() also only set head page’s private. And at each buddy > page merge, any buddy found in free list gets its head page’s private > cleared in __del_page_from_free_list(). The final merged free page > gets its private set by set_buddy_order() at done_merging. There should > not be any stale values in any page’s private, if I read the code correctly. > > If it is the problem of buddy allocator leaving stale private values, > the problem would be reproducible with and without KASAN. > OK, it seems that both slub and shmem do not reset ->private when freeing pages/folios. And tail page's private is not zero, because when a page with non zero private is freed and gets merged with a lower buddy, its private is not set to 0 in the code path. The patch below seems to fix the issue, since I am at Iteration 104 and counting. I also put a VM_BUG_ON(page->private) in free_pages_prepare() and it is not triggered either. diff --git a/mm/shmem.c b/mm/shmem.c index ec6c01378e9d..546e193ef993 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, failed_nolock: if (skip_swapcache) swapcache_clear(si, folio->swap, folio_nr_pages(folio)); - if (folio) + if (folio) { + folio->swap.val = 0; folio_put(folio); + } put_swap_device(si); return error; diff --git a/mm/slub.c b/mm/slub.c index f77b7407c51b..2cdab6d66e1a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3311,6 +3311,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) __slab_clear_pfmemalloc(slab); page->mapping = NULL; + page->private = 0; __ClearPageSlab(page); mm_account_reclaimed_pages(pages); unaccount_slab(slab, order, s); But I am not sure if that is all. Maybe the patch below on top is needed to find all violators and still keep the system running. I also would like to hear from others on whether page->private should be reset or not before free_pages_prepare(). diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..9058f94b0667 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1430,6 +1430,8 @@ __always_inline bool free_pages_prepare(struct page *page, page_cpupid_reset_last(page); page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + VM_WARN_ON_ONCE(page->private); + page->private = 0; reset_page_owner(page, order); page_table_check_free(page, order); pgalloc_tag_sub(page, 1 << order); -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-07 3:28 ` Zi Yan @ 2026-02-07 14:25 ` Mikhail Gavrilov 2026-02-07 14:32 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-07 14:25 UTC (permalink / raw) To: Zi Yan Cc: linux-mm, akpm, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, vbabka, Kairui Song On Sat, Feb 7, 2026 at 8:28 AM Zi Yan <ziy@nvidia.com> wrote: > > OK, it seems that both slub and shmem do not reset ->private when freeing > pages/folios. And tail page's private is not zero, because when a page > with non zero private is freed and gets merged with a lower buddy, its > private is not set to 0 in the code path. > > The patch below seems to fix the issue, since I am at Iteration 104 and counting. > I also put a VM_BUG_ON(page->private) in free_pages_prepare() and it is not > triggered either. > > > diff --git a/mm/shmem.c b/mm/shmem.c > index ec6c01378e9d..546e193ef993 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, > failed_nolock: > if (skip_swapcache) > swapcache_clear(si, folio->swap, folio_nr_pages(folio)); > - if (folio) > + if (folio) { > + folio->swap.val = 0; > folio_put(folio); > + } > put_swap_device(si); > > return error; > diff --git a/mm/slub.c b/mm/slub.c > index f77b7407c51b..2cdab6d66e1a 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3311,6 +3311,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) > > __slab_clear_pfmemalloc(slab); > page->mapping = NULL; > + page->private = 0; > __ClearPageSlab(page); > mm_account_reclaimed_pages(pages); > unaccount_slab(slab, order, s); > > > > But I am not sure if that is all. Maybe the patch below on top is needed to find all violators > and still keep the system running. I also would like to hear from others on whether page->private > should be reset or not before free_pages_prepare(). > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..9058f94b0667 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1430,6 +1430,8 @@ __always_inline bool free_pages_prepare(struct page *page, > > page_cpupid_reset_last(page); > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + VM_WARN_ON_ONCE(page->private); > + page->private = 0; > reset_page_owner(page, order); > page_table_check_free(page, order); > pgalloc_tag_sub(page, 1 << order); > > > -- > Best Regards, > Yan, Zi I tested your patch. The VM_WARN_ON_ONCE caught another violator - TTM (GPU memory manager): ------------[ cut here ]------------ WARNING: mm/page_alloc.c:1433 at __free_pages_ok+0xe1e/0x12c0, CPU#16: gnome-shell/5841 Modules linked in: overlay uinput rfcomm snd_seq_dummy snd_hrtimer xt_mark xt_cgroup xt_MASQUERADE ip6t_REJECT ipt_REJECT nft_compat nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr uhid bnep sunrpc amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib btusb mt76 btmtk btrtl btbcm btintel vfat edac_mce_amd spd5118 bluetooth fat snd_hda_codec_atihdmi asus_ec_sensors mac80211 snd_hda_codec_hdmi kvm_amd snd_hda_intel uvcvideo snd_usb_audio snd_hda_codec uvc videobuf2_vmalloc kvm videobuf2_memops snd_hda_core joydev videobuf2_v4l2 snd_intel_dspcfg videobuf2_common snd_usbmidi_lib videodev snd_intel_sdw_acpi snd_ump irqbypass snd_hwdep asus_nb_wmi mc snd_rawmidi rapl snd_seq asus_wmi cfg80211 sparse_keymap snd_seq_device platform_profile wmi_bmof pcspkr snd_pcm snd_timer rfkill igc snd libarc4 i2c_piix4 soundcore k10temp i2c_smbus gpio_amdpt gpio_generic nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec drm_panel_backlight_quirks gpu_sched drm_suballoc_helper nvme video nvme_core drm_buddy ghash_clmulni_intel drm_display_helper nvme_keyring nvme_auth cec sp5100_tco hkdf wmi uas usb_storage fuse ntsync i2c_dev CPU: 16 UID: 1000 PID: 5841 Comm: gnome-shell Tainted: G W 6.19.0-rc8-f14faaf3a1fb-with-fix-reset-private-when-freeing+ #82 PREEMPT(lazy) Tainted: [W]=WARN Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI, BIOS 3602 11/13/2025 RIP: 0010:__free_pages_ok+0xe1e/0x12c0 Code: ef 48 89 c6 e8 f3 59 ff ff 83 44 24 20 01 49 ba 00 00 00 00 00 fc ff df e9 71 fe ff ff 41 c7 45 30 ff ff ff ff e9 f5 f4 ff ff <0f> 0b e9 73 f5 ff ff e8 86 4c 0e 00 e9 02 fb ff ff 48 c7 44 24 30 RSP: 0018:ffffc9000e0cf878 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000f80 RCX: 1ffffd40028c6000 RDX: 1ffffd40028c6005 RSI: 0000000000000004 RDI: ffffea0014630038 RBP: ffffea0014630028 R08: ffffffff9e58e2de R09: 1ffffd40028c6006 R10: fffff940028c6007 R11: fffff940028c6007 R12: ffffffffa27376d8 R13: ffffea0014630000 R14: ffff889054e559c0 R15: 0000000000000000 FS: 00007f510f914000(0000) GS:ffff8890317a8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005607eaf70168 CR3: 00000001dfd6a000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> ttm_pool_unmap_and_free+0x30c/0x520 [ttm] ? dma_resv_iter_first_unlocked+0x2f9/0x470 ttm_pool_free_range+0xef/0x160 [ttm] ? __pfx_drm_gem_close_ioctl+0x10/0x10 ttm_pool_free+0x70/0xe0 [ttm] ? rcu_is_watching+0x15/0xe0 ttm_tt_unpopulate+0xa2/0x2d0 [ttm] ttm_bo_cleanup_memtype_use+0xec/0x200 [ttm] ttm_bo_release+0x371/0xb00 [ttm] ? __pfx_ttm_bo_release+0x10/0x10 [ttm] ? drm_vma_node_revoke+0x1a/0x1e0 ? local_clock+0x15/0x30 ? __pfx_drm_gem_close_ioctl+0x10/0x10 drm_gem_object_release_handle+0xcd/0x1f0 drm_gem_handle_delete+0x6a/0xc0 ? drm_dev_exit+0x35/0x50 drm_ioctl_kernel+0x172/0x2e0 ? __lock_release.isra.0+0x1a2/0x370 ? __pfx_drm_ioctl_kernel+0x10/0x10 drm_ioctl+0x571/0xb50 ? __pfx_drm_gem_close_ioctl+0x10/0x10 ? __pfx_drm_ioctl+0x10/0x10 ? rcu_is_watching+0x15/0xe0 ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 ? trace_hardirqs_on+0x18/0x140 ? lockdep_hardirqs_on+0x90/0x130 ? __raw_spin_unlock_irqrestore+0x5d/0x80 ? __raw_spin_unlock_irqrestore+0x46/0x80 amdgpu_drm_ioctl+0xd3/0x190 [amdgpu] __x64_sys_ioctl+0x13c/0x1d0 ? syscall_trace_enter+0x15c/0x2a0 do_syscall_64+0x9c/0x4e0 ? __lock_release.isra.0+0x1a2/0x370 ? do_user_addr_fault+0x87a/0xf60 ? fpregs_assert_state_consistent+0x8f/0x100 ? trace_hardirqs_on_prepare+0x101/0x140 ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 ? irqentry_exit+0x99/0x600 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f5113af889d Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 RSP: 002b:00007fff83c100c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005607ed127c50 RCX: 00007f5113af889d RDX: 00007fff83c10150 RSI: 0000000040086409 RDI: 000000000000000e RBP: 00007fff83c10110 R08: 00005607ead46d50 R09: 0000000000000000 R10: 0000000000000031 R11: 0000000000000246 R12: 00007fff83c10150 R13: 0000000040086409 R14: 000000000000000e R15: 00005607ead46d50 </TASK> irq event stamp: 5186663 hardirqs last enabled at (5186669): [<ffffffff9dc9ce6e>] __up_console_sem+0x7e/0x90 hardirqs last disabled at (5186674): [<ffffffff9dc9ce53>] __up_console_sem+0x63/0x90 softirqs last enabled at (5186538): [<ffffffff9da5325b>] handle_softirqs+0x54b/0x810 softirqs last disabled at (5186531): [<ffffffff9da53654>] __irq_exit_rcu+0x124/0x240 ---[ end trace 0000000000000000 ]--- So there are more violators than just slub and shmem. I also tested the post_alloc_hook() fix (clearing page->private for all pages at allocation) - 1600+ iterations without crash. Given multiple violators, maybe a defensive fix (either in split_page() which is already in mm-unstable, or in post_alloc_hook()) is the right approach, rather than hunting down each violator? -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-07 14:25 ` Mikhail Gavrilov @ 2026-02-07 14:32 ` Zi Yan 2026-02-07 15:03 ` Mikhail Gavrilov 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-07 14:32 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, vbabka, Kairui Song On 7 Feb 2026, at 9:25, Mikhail Gavrilov wrote: > On Sat, Feb 7, 2026 at 8:28 AM Zi Yan <ziy@nvidia.com> wrote: >> >> OK, it seems that both slub and shmem do not reset ->private when freeing >> pages/folios. And tail page's private is not zero, because when a page >> with non zero private is freed and gets merged with a lower buddy, its >> private is not set to 0 in the code path. >> >> The patch below seems to fix the issue, since I am at Iteration 104 and counting. >> I also put a VM_BUG_ON(page->private) in free_pages_prepare() and it is not >> triggered either. >> >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index ec6c01378e9d..546e193ef993 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, >> failed_nolock: >> if (skip_swapcache) >> swapcache_clear(si, folio->swap, folio_nr_pages(folio)); >> - if (folio) >> + if (folio) { >> + folio->swap.val = 0; >> folio_put(folio); >> + } >> put_swap_device(si); >> >> return error; >> diff --git a/mm/slub.c b/mm/slub.c >> index f77b7407c51b..2cdab6d66e1a 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -3311,6 +3311,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) >> >> __slab_clear_pfmemalloc(slab); >> page->mapping = NULL; >> + page->private = 0; >> __ClearPageSlab(page); >> mm_account_reclaimed_pages(pages); >> unaccount_slab(slab, order, s); >> >> >> >> But I am not sure if that is all. Maybe the patch below on top is needed to find all violators >> and still keep the system running. I also would like to hear from others on whether page->private >> should be reset or not before free_pages_prepare(). >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cbf758e27aa2..9058f94b0667 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1430,6 +1430,8 @@ __always_inline bool free_pages_prepare(struct page *page, >> >> page_cpupid_reset_last(page); >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> + VM_WARN_ON_ONCE(page->private); >> + page->private = 0; >> reset_page_owner(page, order); >> page_table_check_free(page, order); >> pgalloc_tag_sub(page, 1 << order); >> >> >> -- >> Best Regards, >> Yan, Zi > > I tested your patch. The VM_WARN_ON_ONCE caught another violator - TTM > (GPU memory manager): Thanks. As a fix, I think we could combine the two patches above into one and remove the VM_WARN_ON_ONCE() or just send the second one without VM_WARN_ON_ONCE(). I can send a separate patch later to fix all users that do not reset ->private and include VM_WARN_ON_ONCE(). WDYT? > ------------[ cut here ]------------ > WARNING: mm/page_alloc.c:1433 at __free_pages_ok+0xe1e/0x12c0, > CPU#16: gnome-shell/5841 > Modules linked in: overlay uinput rfcomm snd_seq_dummy snd_hrtimer > xt_mark xt_cgroup xt_MASQUERADE ip6t_REJECT ipt_REJECT nft_compat > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr uhid bnep sunrpc amd_atl > intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib > mt76_connac_lib btusb mt76 btmtk btrtl btbcm btintel vfat edac_mce_amd > spd5118 bluetooth fat snd_hda_codec_atihdmi asus_ec_sensors mac80211 > snd_hda_codec_hdmi kvm_amd snd_hda_intel uvcvideo snd_usb_audio > snd_hda_codec uvc videobuf2_vmalloc kvm videobuf2_memops snd_hda_core > joydev videobuf2_v4l2 snd_intel_dspcfg videobuf2_common > snd_usbmidi_lib videodev snd_intel_sdw_acpi snd_ump irqbypass > snd_hwdep asus_nb_wmi mc snd_rawmidi rapl snd_seq asus_wmi cfg80211 > sparse_keymap snd_seq_device platform_profile wmi_bmof pcspkr snd_pcm > snd_timer rfkill igc snd > libarc4 i2c_piix4 soundcore k10temp i2c_smbus gpio_amdpt > gpio_generic nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp > i2c_algo_bit drm_ttm_helper ttm drm_exec drm_panel_backlight_quirks > gpu_sched drm_suballoc_helper nvme video nvme_core drm_buddy > ghash_clmulni_intel drm_display_helper nvme_keyring nvme_auth cec > sp5100_tco hkdf wmi uas usb_storage fuse ntsync i2c_dev > CPU: 16 UID: 1000 PID: 5841 Comm: gnome-shell Tainted: G W > 6.19.0-rc8-f14faaf3a1fb-with-fix-reset-private-when-freeing+ #82 > PREEMPT(lazy) > Tainted: [W]=WARN > Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING > WIFI, BIOS 3602 11/13/2025 > RIP: 0010:__free_pages_ok+0xe1e/0x12c0 > Code: ef 48 89 c6 e8 f3 59 ff ff 83 44 24 20 01 49 ba 00 00 00 00 00 > fc ff df e9 71 fe ff ff 41 c7 45 30 ff ff ff ff e9 f5 f4 ff ff <0f> 0b > e9 73 f5 ff ff e8 86 4c 0e 00 e9 02 fb ff ff 48 c7 44 24 30 > RSP: 0018:ffffc9000e0cf878 EFLAGS: 00010206 > RAX: dffffc0000000000 RBX: 0000000000000f80 RCX: 1ffffd40028c6000 > RDX: 1ffffd40028c6005 RSI: 0000000000000004 RDI: ffffea0014630038 > RBP: ffffea0014630028 R08: ffffffff9e58e2de R09: 1ffffd40028c6006 > R10: fffff940028c6007 R11: fffff940028c6007 R12: ffffffffa27376d8 > R13: ffffea0014630000 R14: ffff889054e559c0 R15: 0000000000000000 > FS: 00007f510f914000(0000) GS:ffff8890317a8000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00005607eaf70168 CR3: 00000001dfd6a000 CR4: 0000000000f50ef0 > PKRU: 55555554 > Call Trace: > <TASK> > ttm_pool_unmap_and_free+0x30c/0x520 [ttm] > ? dma_resv_iter_first_unlocked+0x2f9/0x470 > ttm_pool_free_range+0xef/0x160 [ttm] > ? __pfx_drm_gem_close_ioctl+0x10/0x10 > ttm_pool_free+0x70/0xe0 [ttm] > ? rcu_is_watching+0x15/0xe0 > ttm_tt_unpopulate+0xa2/0x2d0 [ttm] > ttm_bo_cleanup_memtype_use+0xec/0x200 [ttm] > ttm_bo_release+0x371/0xb00 [ttm] > ? __pfx_ttm_bo_release+0x10/0x10 [ttm] > ? drm_vma_node_revoke+0x1a/0x1e0 > ? local_clock+0x15/0x30 > ? __pfx_drm_gem_close_ioctl+0x10/0x10 > drm_gem_object_release_handle+0xcd/0x1f0 > drm_gem_handle_delete+0x6a/0xc0 > ? drm_dev_exit+0x35/0x50 > drm_ioctl_kernel+0x172/0x2e0 > ? __lock_release.isra.0+0x1a2/0x370 > ? __pfx_drm_ioctl_kernel+0x10/0x10 > drm_ioctl+0x571/0xb50 > ? __pfx_drm_gem_close_ioctl+0x10/0x10 > ? __pfx_drm_ioctl+0x10/0x10 > ? rcu_is_watching+0x15/0xe0 > ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 > ? trace_hardirqs_on+0x18/0x140 > ? lockdep_hardirqs_on+0x90/0x130 > ? __raw_spin_unlock_irqrestore+0x5d/0x80 > ? __raw_spin_unlock_irqrestore+0x46/0x80 > amdgpu_drm_ioctl+0xd3/0x190 [amdgpu] > __x64_sys_ioctl+0x13c/0x1d0 > ? syscall_trace_enter+0x15c/0x2a0 > do_syscall_64+0x9c/0x4e0 > ? __lock_release.isra.0+0x1a2/0x370 > ? do_user_addr_fault+0x87a/0xf60 > ? fpregs_assert_state_consistent+0x8f/0x100 > ? trace_hardirqs_on_prepare+0x101/0x140 > ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 > ? irqentry_exit+0x99/0x600 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7f5113af889d > Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 > 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 > 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 > RSP: 002b:00007fff83c100c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00005607ed127c50 RCX: 00007f5113af889d > RDX: 00007fff83c10150 RSI: 0000000040086409 RDI: 000000000000000e > RBP: 00007fff83c10110 R08: 00005607ead46d50 R09: 0000000000000000 > R10: 0000000000000031 R11: 0000000000000246 R12: 00007fff83c10150 > R13: 0000000040086409 R14: 000000000000000e R15: 00005607ead46d50 > </TASK> > irq event stamp: 5186663 > hardirqs last enabled at (5186669): [<ffffffff9dc9ce6e>] > __up_console_sem+0x7e/0x90 > hardirqs last disabled at (5186674): [<ffffffff9dc9ce53>] > __up_console_sem+0x63/0x90 > softirqs last enabled at (5186538): [<ffffffff9da5325b>] > handle_softirqs+0x54b/0x810 > softirqs last disabled at (5186531): [<ffffffff9da53654>] > __irq_exit_rcu+0x124/0x240 > ---[ end trace 0000000000000000 ]--- > > So there are more violators than just slub and shmem. > I also tested the post_alloc_hook() fix (clearing page->private for > all pages at allocation) - 1600+ iterations without crash. > Given multiple violators, maybe a defensive fix (either in > split_page() which is already in mm-unstable, or in post_alloc_hook()) > is the right approach, rather than hunting down each violator? > > -- > Best Regards, > Mike Gavrilov. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-07 14:32 ` Zi Yan @ 2026-02-07 15:03 ` Mikhail Gavrilov 2026-02-07 15:06 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-07 15:03 UTC (permalink / raw) To: Zi Yan Cc: linux-mm, akpm, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, vbabka, Kairui Song On Sat, Feb 7, 2026 at 7:32 PM Zi Yan <ziy@nvidia.com> wrote: > > Thanks. As a fix, I think we could combine the two patches above into one and remove > the VM_WARN_ON_ONCE() or just send the second one without VM_WARN_ON_ONCE(). > I can send a separate patch later to fix all users that do not reset ->private > and include VM_WARN_ON_ONCE(). > > WDYT? > Makes sense. Ship the quiet fix first for stable, then add VM_WARN_ON_ONCE separately to hunt down violators in mainline. I'd vote for option 2 (just free_pages_prepare without VM_WARN) - it's simpler and covers all cases. Will your patch include a revert of the split_page() fix that's already in mm-unstable, or should that be handled separately? -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-07 15:03 ` Mikhail Gavrilov @ 2026-02-07 15:06 ` Zi Yan 2026-02-07 15:37 ` [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() Mikhail Gavrilov 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-07 15:06 UTC (permalink / raw) To: Mikhail Gavrilov, akpm Cc: linux-mm, chrisl, kasong, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb, vbabka, Kairui Song On 7 Feb 2026, at 10:03, Mikhail Gavrilov wrote: > On Sat, Feb 7, 2026 at 7:32 PM Zi Yan <ziy@nvidia.com> wrote: >> >> Thanks. As a fix, I think we could combine the two patches above into one and remove >> the VM_WARN_ON_ONCE() or just send the second one without VM_WARN_ON_ONCE(). >> I can send a separate patch later to fix all users that do not reset ->private >> and include VM_WARN_ON_ONCE(). >> >> WDYT? >> > > Makes sense. Ship the quiet fix first for stable, then add > VM_WARN_ON_ONCE separately to hunt down violators in mainline. > I'd vote for option 2 (just free_pages_prepare without VM_WARN) - it's > simpler and covers all cases. Sounds good to me. > Will your patch include a revert of the split_page() fix that's > already in mm-unstable, or should that be handled separately? Hi Andrew, Can you drop this patch? Mikhail is going to send a different fix. Thanks. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 15:06 ` Zi Yan @ 2026-02-07 15:37 ` Mikhail Gavrilov 2026-02-07 16:12 ` Zi Yan 2026-02-09 11:11 ` [PATCH v2] " Vlastimil Babka 0 siblings, 2 replies; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-07 15:37 UTC (permalink / raw) To: linux-mm Cc: akpm, vbabka, chrisl, kasong, hughd, ryncsn, ziy, Mikhail Gavrilov, stable Several subsystems (slub, shmem, ttm, etc.) use page->private but don't clear it before freeing pages. When these pages are later allocated as high-order pages and split via split_page(), tail pages retain stale page->private values. This causes a use-after-free in the swap subsystem. The swap code uses page->private to track swap count continuations, assuming freshly allocated pages have page->private == 0. When stale values are present, swap_count_continued() incorrectly assumes the continuation list is valid and iterates over uninitialized page->lru containing LIST_POISON values, causing a crash: KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] RIP: 0010:__do_sys_swapoff+0x1151/0x1860 Fix this by clearing page->private in free_pages_prepare(), ensuring all freed pages have clean state regardless of previous use. Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") Cc: stable@vger.kernel.org Suggested-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..24ac34199f95 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, page_cpupid_reset_last(page); page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + page->private = 0; reset_page_owner(page, order); page_table_check_free(page, order); pgalloc_tag_sub(page, 1 << order); -- 2.53.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 15:37 ` [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() Mikhail Gavrilov @ 2026-02-07 16:12 ` Zi Yan 2026-02-07 17:36 ` [PATCH v3] " Mikhail Gavrilov 2026-02-09 11:11 ` [PATCH v2] " Vlastimil Babka 1 sibling, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-07 16:12 UTC (permalink / raw) To: Mikhail Gavrilov Cc: linux-mm, akpm, vbabka, chrisl, kasong, hughd, ryncsn, stable, David Hildenbrand, Matthew Wilcox, mhocko, hannes, jackmanb, Suren Baghdasaryan +folks involved in the original conversation. On 7 Feb 2026, at 10:37, Mikhail Gavrilov wrote: > Several subsystems (slub, shmem, ttm, etc.) use page->private but don't > clear it before freeing pages. When these pages are later allocated as > high-order pages and split via split_page(), tail pages retain stale > page->private values. > > This causes a use-after-free in the swap subsystem. The swap code uses > page->private to track swap count continuations, assuming freshly > allocated pages have page->private == 0. When stale values are present, > swap_count_continued() incorrectly assumes the continuation list is valid > and iterates over uninitialized page->lru containing LIST_POISON values, > causing a crash: > > KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] > RIP: 0010:__do_sys_swapoff+0x1151/0x1860 > > Fix this by clearing page->private in free_pages_prepare(), ensuring all > freed pages have clean state regardless of previous use. > > Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") > Cc: stable@vger.kernel.org > Suggested-by: Zi Yan <ziy@nvidia.com> > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) Hi Mikhail, Please include everyone was in the original email thread. Also, please use ./scripts/get_maintainer.pl to get the right people to cc. Thanks. Acked-by: Zi Yan <ziy@nvidia.com> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..24ac34199f95 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > page_cpupid_reset_last(page); > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + page->private = 0; > reset_page_owner(page, order); > page_table_check_free(page, order); > pgalloc_tag_sub(page, 1 << order); > -- > 2.53.0 -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 16:12 ` Zi Yan @ 2026-02-07 17:36 ` Mikhail Gavrilov 2026-02-07 22:02 ` David Hildenbrand (Arm) 2026-02-09 19:46 ` David Hildenbrand (Arm) 0 siblings, 2 replies; 42+ messages in thread From: Mikhail Gavrilov @ 2026-02-07 17:36 UTC (permalink / raw) To: linux-mm Cc: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, david, willy, Mikhail Gavrilov Several subsystems (slub, shmem, ttm, etc.) use page->private but don't clear it before freeing pages. When these pages are later allocated as high-order pages and split via split_page(), tail pages retain stale page->private values. This causes a use-after-free in the swap subsystem. The swap code uses page->private to track swap count continuations, assuming freshly allocated pages have page->private == 0. When stale values are present, swap_count_continued() incorrectly assumes the continuation list is valid and iterates over uninitialized page->lru containing LIST_POISON values, causing a crash: KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] RIP: 0010:__do_sys_swapoff+0x1151/0x1860 Fix this by clearing page->private in free_pages_prepare(), ensuring all freed pages have clean state regardless of previous use. Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") Cc: stable@vger.kernel.org Suggested-by: Zi Yan <ziy@nvidia.com> Acked-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..24ac34199f95 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, page_cpupid_reset_last(page); page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + page->private = 0; reset_page_owner(page, order); page_table_check_free(page, order); pgalloc_tag_sub(page, 1 << order); -- 2.53.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 17:36 ` [PATCH v3] " Mikhail Gavrilov @ 2026-02-07 22:02 ` David Hildenbrand (Arm) 2026-02-07 22:08 ` David Hildenbrand (Arm) 2026-02-07 23:00 ` Zi Yan 2026-02-09 19:46 ` David Hildenbrand (Arm) 1 sibling, 2 replies; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-07 22:02 UTC (permalink / raw) To: Mikhail Gavrilov, linux-mm Cc: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/7/26 18:36, Mikhail Gavrilov wrote: Thanks! > Several subsystems (slub, shmem, ttm, etc.) use page->private but don't > clear it before freeing pages. When these pages are later allocated as > high-order pages and split via split_page(), tail pages retain stale > page->private values. > > This causes a use-after-free in the swap subsystem. The swap code uses > page->private to track swap count continuations, assuming freshly > allocated pages have page->private == 0. When stale values are present, > swap_count_continued() incorrectly assumes the continuation list is valid > and iterates over uninitialized page->lru containing LIST_POISON values, > causing a crash: > > KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] > RIP: 0010:__do_sys_swapoff+0x1151/0x1860 > > Fix this by clearing page->private in free_pages_prepare(), ensuring all > freed pages have clean state regardless of previous use. I could have sworn we discussed something like that already in the past. I recall that freeing pages with page->private set was allowed. Although I once wondered whether we should actually change that. > > Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") > Cc: stable@vger.kernel.org > Suggested-by: Zi Yan <ziy@nvidia.com> > Acked-by: Zi Yan <ziy@nvidia.com> > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- Next time, please don't send patches as reply to another thread; that way it can easily get lost in a bigger thread. You want to get peoples attention :) > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..24ac34199f95 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > page_cpupid_reset_last(page); > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + page->private = 0; Should we be using set_page_private()? It's a bit inconsistent :) I wonder, if it's really just the split_page() problem, why not handle it there, where we already iterate over all ("tail") pages? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..cbbcfdf3ed26 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3122,8 +3122,10 @@ void split_page(struct page *page, unsigned int order) VM_BUG_ON_PAGE(PageCompound(page), page); VM_BUG_ON_PAGE(!page_count(page), page); - for (i = 1; i < (1 << order); i++) + for (i = 1; i < (1 << order); i++) { set_page_refcounted(page + i); + set_page_private(page, 0); + } split_page_owner(page, order, 0); pgalloc_tag_split(page_folio(page), order, 0); split_page_memcg(page, order); But then I thought about "what does actually happen during an folio split". We had a check in __split_folio_to_order() that got removed in 4265d67e405a, for some undocumented reason (and the patch got merged with 0 tags :( ). I assume because with zone-device there was a way to now got ->private properly set. But we removed the safety check for all other folios. - /* - * page->private should not be set in tail pages. Fix up and warn once - * if private is unexpectedly set. - */ - if (unlikely(new_folio->private)) { - VM_WARN_ON_ONCE_PAGE(true, new_head); - new_folio->private = NULL; - } I would have thought that we could have triggered that check easily before. Why didn't we? Who would have cleared the private field of tail pages? @Zi Yan, any idea why the folio splitting code wouldn't have revealed a similar problem? -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 22:02 ` David Hildenbrand (Arm) @ 2026-02-07 22:08 ` David Hildenbrand (Arm) 2026-02-09 11:17 ` Vlastimil Babka 2026-02-07 23:00 ` Zi Yan 1 sibling, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-07 22:08 UTC (permalink / raw) To: Mikhail Gavrilov, linux-mm Cc: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy > > - /* > - * page->private should not be set in tail pages. Fix up > and warn once > - * if private is unexpectedly set. > - */ > - if (unlikely(new_folio->private)) { > - VM_WARN_ON_ONCE_PAGE(true, new_head); > - new_folio->private = NULL; > - } BTW, I wonder whether we should bring that check back for non-device folios. -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 22:08 ` David Hildenbrand (Arm) @ 2026-02-09 11:17 ` Vlastimil Babka 2026-02-09 15:46 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: Vlastimil Babka @ 2026-02-09 11:17 UTC (permalink / raw) To: David Hildenbrand (Arm), Mikhail Gavrilov, linux-mm Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/7/26 23:08, David Hildenbrand (Arm) wrote: >> >> - /* >> - * page->private should not be set in tail pages. Fix up >> and warn once >> - * if private is unexpectedly set. >> - */ >> - if (unlikely(new_folio->private)) { >> - VM_WARN_ON_ONCE_PAGE(true, new_head); >> - new_folio->private = NULL; >> - } > > BTW, I wonder whether we should bring that check back for non-device folios. If the rule is now that when upon freeing in free_pages_prepare() we clear private in the head page and not tail pages (where we expect the owner of the page to do it), maybe that check for tail pages should be done in the is_check_pages_enabled() part of free_pages_prepare(). Or should the check be also in the split path because somebody can set a tail private between allocation and split? (and not just inherit it from a previous allocation that didn't clear it?). ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 11:17 ` Vlastimil Babka @ 2026-02-09 15:46 ` David Hildenbrand (Arm) 2026-02-09 16:00 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 15:46 UTC (permalink / raw) To: Vlastimil Babka, Mikhail Gavrilov, linux-mm Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 12:17, Vlastimil Babka wrote: > On 2/7/26 23:08, David Hildenbrand (Arm) wrote: >>> >>> - /* >>> - * page->private should not be set in tail pages. Fix up >>> and warn once >>> - * if private is unexpectedly set. >>> - */ >>> - if (unlikely(new_folio->private)) { >>> - VM_WARN_ON_ONCE_PAGE(true, new_head); >>> - new_folio->private = NULL; >>> - } >> >> BTW, I wonder whether we should bring that check back for non-device folios. > > If the rule is now that when upon freeing in free_pages_prepare() we clear > private in the head page and not tail pages (where we expect the owner of > the page to do it), maybe that check for tail pages should be done in the > is_check_pages_enabled() part of free_pages_prepare(). > > Or should the check be also in the split path because somebody can set a > tail private between allocation and split? (and not just inherit it from a > previous allocation that didn't clear it?). We ran into that check in the past, when folio->X overlayed page->private in a tail, and would actually have to be zeroed out. So it should be part of this splitting code I think. -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 15:46 ` David Hildenbrand (Arm) @ 2026-02-09 16:00 ` Zi Yan 2026-02-09 16:03 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-09 16:00 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Vlastimil Babka, Mikhail Gavrilov, linux-mm, akpm, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 10:46, David Hildenbrand (Arm) wrote: > On 2/9/26 12:17, Vlastimil Babka wrote: >> On 2/7/26 23:08, David Hildenbrand (Arm) wrote: >>>> >>>> - /* >>>> - * page->private should not be set in tail pages. Fix up >>>> and warn once >>>> - * if private is unexpectedly set. >>>> - */ >>>> - if (unlikely(new_folio->private)) { >>>> - VM_WARN_ON_ONCE_PAGE(true, new_head); >>>> - new_folio->private = NULL; >>>> - } >>> >>> BTW, I wonder whether we should bring that check back for non-device folios. >> >> If the rule is now that when upon freeing in free_pages_prepare() we clear >> private in the head page and not tail pages (where we expect the owner of >> the page to do it), maybe that check for tail pages should be done in the >> is_check_pages_enabled() part of free_pages_prepare(). >> >> Or should the check be also in the split path because somebody can set a >> tail private between allocation and split? (and not just inherit it from a >> previous allocation that didn't clear it?). > > We ran into that check in the past, when folio->X overlayed page->private in a tail, and would actually have to be zeroed out. Currently, _mm_id (_mm_ids) overlaps with page->private. At split time, it should be MM_ID_DUMMY (0), so page->private should be 0 all time. > > So it should be part of this splitting code I think. It is still better to have the check and fix in place. Why do we want to skip device private folio? Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:00 ` Zi Yan @ 2026-02-09 16:03 ` David Hildenbrand (Arm) 2026-02-09 16:05 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 16:03 UTC (permalink / raw) To: Zi Yan Cc: Vlastimil Babka, Mikhail Gavrilov, linux-mm, akpm, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 17:00, Zi Yan wrote: > On 9 Feb 2026, at 10:46, David Hildenbrand (Arm) wrote: > >> On 2/9/26 12:17, Vlastimil Babka wrote: >>> >>> If the rule is now that when upon freeing in free_pages_prepare() we clear >>> private in the head page and not tail pages (where we expect the owner of >>> the page to do it), maybe that check for tail pages should be done in the >>> is_check_pages_enabled() part of free_pages_prepare(). >>> >>> Or should the check be also in the split path because somebody can set a >>> tail private between allocation and split? (and not just inherit it from a >>> previous allocation that didn't clear it?). >> >> We ran into that check in the past, when folio->X overlayed page->private in a tail, and would actually have to be zeroed out. > > Currently, _mm_id (_mm_ids) overlaps with page->private. At split time, > it should be MM_ID_DUMMY (0), so page->private should be 0 all time. Yes, it's designed like that; because that check here caught it during development :) > >> >> So it should be part of this splitting code I think. > > It is still better to have the check and fix in place. Why do we want to > skip device private folio? I don't understand the question, can you elaborate? I asked Balbir why the check was dropped in the first place. -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:03 ` David Hildenbrand (Arm) @ 2026-02-09 16:05 ` Zi Yan 2026-02-09 16:06 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-09 16:05 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Vlastimil Babka, Mikhail Gavrilov, linux-mm, akpm, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 11:03, David Hildenbrand (Arm) wrote: > On 2/9/26 17:00, Zi Yan wrote: >> On 9 Feb 2026, at 10:46, David Hildenbrand (Arm) wrote: >> >>> On 2/9/26 12:17, Vlastimil Babka wrote: >>>> >>>> If the rule is now that when upon freeing in free_pages_prepare() we clear >>>> private in the head page and not tail pages (where we expect the owner of >>>> the page to do it), maybe that check for tail pages should be done in the >>>> is_check_pages_enabled() part of free_pages_prepare(). >>>> >>>> Or should the check be also in the split path because somebody can set a >>>> tail private between allocation and split? (and not just inherit it from a >>>> previous allocation that didn't clear it?). >>> >>> We ran into that check in the past, when folio->X overlayed page->private in a tail, and would actually have to be zeroed out. >> >> Currently, _mm_id (_mm_ids) overlaps with page->private. At split time, >> it should be MM_ID_DUMMY (0), so page->private should be 0 all time. > > Yes, it's designed like that; because that check here caught it during development :) > >> >>> >>> So it should be part of this splitting code I think. >> >> It is still better to have the check and fix in place. Why do we want to >> skip device private folio? > > I don't understand the question, can you elaborate? You said, “BTW, I wonder whether we should bring that check back for non-device folios.” I thought you know why device folio needs to keep ->private not cleared during split. > I asked Balbir why the check was dropped in the first place. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:05 ` Zi Yan @ 2026-02-09 16:06 ` David Hildenbrand (Arm) 2026-02-09 16:08 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 16:06 UTC (permalink / raw) To: Zi Yan Cc: Vlastimil Babka, Mikhail Gavrilov, linux-mm, akpm, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 17:05, Zi Yan wrote: > On 9 Feb 2026, at 11:03, David Hildenbrand (Arm) wrote: > >> On 2/9/26 17:00, Zi Yan wrote: >>> >>> >>> Currently, _mm_id (_mm_ids) overlaps with page->private. At split time, >>> it should be MM_ID_DUMMY (0), so page->private should be 0 all time. >> >> Yes, it's designed like that; because that check here caught it during development :) >> >>> >>> >>> It is still better to have the check and fix in place. Why do we want to >>> skip device private folio? >> >> I don't understand the question, can you elaborate? > > You said, > “BTW, I wonder whether we should bring that check back for non-device folios.” > > I thought you know why device folio needs to keep ->private not cleared during > split. Oh, I thought there was some overlay of ->private with zone-device special stuff. But I checked the structs and didn't spot it immediately. So I ended up asking Balbir as reply to his latest series that got merged. -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:06 ` David Hildenbrand (Arm) @ 2026-02-09 16:08 ` Zi Yan 0 siblings, 0 replies; 42+ messages in thread From: Zi Yan @ 2026-02-09 16:08 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Vlastimil Babka, Mikhail Gavrilov, linux-mm, akpm, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 11:06, David Hildenbrand (Arm) wrote: > On 2/9/26 17:05, Zi Yan wrote: >> On 9 Feb 2026, at 11:03, David Hildenbrand (Arm) wrote: >> >>> On 2/9/26 17:00, Zi Yan wrote: >>>> >>>> >>>> Currently, _mm_id (_mm_ids) overlaps with page->private. At split time, >>>> it should be MM_ID_DUMMY (0), so page->private should be 0 all time. >>> >>> Yes, it's designed like that; because that check here caught it during development :) >>> >>>> >>>> >>>> It is still better to have the check and fix in place. Why do we want to >>>> skip device private folio? >>> >>> I don't understand the question, can you elaborate? >> >> You said, >> “BTW, I wonder whether we should bring that check back for non-device folios.” >> >> I thought you know why device folio needs to keep ->private not cleared during >> split. > > Oh, I thought there was some overlay of ->private with zone-device special stuff. But I checked the structs and didn't spot it immediately. So I ended up asking Balbir as reply to his latest series that got merged. Got it. We will bring it back once we hear back from Balbir. Thanks. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 22:02 ` David Hildenbrand (Arm) 2026-02-07 22:08 ` David Hildenbrand (Arm) @ 2026-02-07 23:00 ` Zi Yan 2026-02-09 16:16 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-07 23:00 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 7 Feb 2026, at 17:02, David Hildenbrand (Arm) wrote: > On 2/7/26 18:36, Mikhail Gavrilov wrote: > > Thanks! > >> Several subsystems (slub, shmem, ttm, etc.) use page->private but don't >> clear it before freeing pages. When these pages are later allocated as >> high-order pages and split via split_page(), tail pages retain stale >> page->private values. >> >> This causes a use-after-free in the swap subsystem. The swap code uses >> page->private to track swap count continuations, assuming freshly >> allocated pages have page->private == 0. When stale values are present, >> swap_count_continued() incorrectly assumes the continuation list is valid >> and iterates over uninitialized page->lru containing LIST_POISON values, >> causing a crash: >> >> KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] >> RIP: 0010:__do_sys_swapoff+0x1151/0x1860 >> >> Fix this by clearing page->private in free_pages_prepare(), ensuring all >> freed pages have clean state regardless of previous use. > > I could have sworn we discussed something like that already in the past. This[1] is my discussion on this topic and I managed to convince people we should keep ->private zero on any pages. [1] https://lore.kernel.org/all/20250925085006.23684-1-zhongjinji@honor.com/ > > I recall that freeing pages with page->private set was allowed. Although > I once wondered whether we should actually change that. But if that is allowed, we can end up with tail page's private non zero, because that free page can merge with a lower PFN buddy and its ->private is not reset. See __free_one_page(). > >> >> Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") >> Cc: stable@vger.kernel.org >> Suggested-by: Zi Yan <ziy@nvidia.com> >> Acked-by: Zi Yan <ziy@nvidia.com> >> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> >> --- > > Next time, please don't send patches as reply to another thread; that > way it can easily get lost in a bigger thread. > > You want to get peoples attention :) > >> mm/page_alloc.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cbf758e27aa2..24ac34199f95 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, >> page_cpupid_reset_last(page); >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> + page->private = 0; > > Should we be using set_page_private()? It's a bit inconsistent :) > > I wonder, if it's really just the split_page() problem, why not > handle it there, where we already iterate over all ("tail") pages? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..cbbcfdf3ed26 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3122,8 +3122,10 @@ void split_page(struct page *page, unsigned int order) > VM_BUG_ON_PAGE(PageCompound(page), page); > VM_BUG_ON_PAGE(!page_count(page), page); > - for (i = 1; i < (1 << order); i++) > + for (i = 1; i < (1 << order); i++) { > set_page_refcounted(page + i); > + set_page_private(page, 0); > + } > split_page_owner(page, order, 0); > pgalloc_tag_split(page_folio(page), order, 0); > split_page_memcg(page, order); > > > But then I thought about "what does actually happen during an folio split". > > We had a check in __split_folio_to_order() that got removed in 4265d67e405a, for some > undocumented reason (and the patch got merged with 0 tags :( ). I assume because with zone-device > there was a way to now got ->private properly set. But we removed the safety check for It was the end of last year and the review traffic was a lot. No one had time to look at it. > all other folios. > > - /* > - * page->private should not be set in tail pages. Fix up and warn once > - * if private is unexpectedly set. > - */ > - if (unlikely(new_folio->private)) { > - VM_WARN_ON_ONCE_PAGE(true, new_head); > - new_folio->private = NULL; > - } > > > I would have thought that we could have triggered that check easily before. Why didn't we? > > Who would have cleared the private field of tail pages? > > @Zi Yan, any idea why the folio splitting code wouldn't have revealed a similar problem? > For the issue reported by Mikhail[2], the page comes from vmalloc(), so it will not be split. For other cases, a page/folio needs to be compound to be splittable and prep_compound_tail() sets all tail page's private to 0. So that check is not that useful. And the issue we are handling here is non compound high order page allocation. No one is clearing ->private for all pages right now. OK, I think we want to decide whether it is OK to have a page with set ->private at page free time. If no, we can get this patch in and add a VM_WARN_ON_ONCE(page->private) to catch all violators. If yes, we can use Mikhail's original patch, zeroing ->private in split_page() and add a comment on ->private: 1. for compound page allocation, prep_compound_tail() is responsible for resetting ->private; 2. for non compound high order page allocation, split_page() is responsible for resetting ->private. [2] https://lore.kernel.org/linux-mm/CABXGCsNqk6pOkocJ0ctcHssCvke2kqhzoR2BGf_Hh1hWPZATuA@mail.gmail.com/ -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 23:00 ` Zi Yan @ 2026-02-09 16:16 ` David Hildenbrand (Arm) 2026-02-09 16:20 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 16:16 UTC (permalink / raw) To: Zi Yan Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy >> I recall that freeing pages with page->private set was allowed. Although >> I once wondered whether we should actually change that. > > But if that is allowed, we can end up with tail page's private non zero, > because that free page can merge with a lower PFN buddy and its ->private > is not reset. See __free_one_page(). Right. Or someone could use page->private on tail pages and free non-zero ->private that way. [...] > > For the issue reported by Mikhail[2], the page comes from vmalloc(), so it will not be split. > For other cases, a page/folio needs to be compound to be splittable and prep_compound_tail() > sets all tail page's private to 0. So that check is not that useful. Thanks. > > And the issue we are handling here is non compound high order page allocation. No one is > clearing ->private for all pages right now. Right. > > OK, I think we want to decide whether it is OK to have a page with set ->private at > page free time. Right. And whether it is okay to have any tail->private be non-zero. > If no, we can get this patch in and add a VM_WARN_ON_ONCE(page->private) > to catch all violators. If yes, we can use Mikhail's original patch, zeroing ->private > in split_page() and add a comment on ->private: > > 1. for compound page allocation, prep_compound_tail() is responsible for resetting ->private; > 2. for non compound high order page allocation, split_page() is responsible for resetting ->private. Ideally, I guess, we would minimize the clearing of the ->private fields. If we could guarantee that *any* pages in the buddy have ->private clear, maybe prep_compound_tail() could stop clearing it (and check instead). So similar to what Vlasta said, maybe we want to (not check but actually clear): diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e4104973e22f..4960a36145fe 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1410,6 +1410,7 @@ __always_inline bool free_pages_prepare(struct page *page, } } (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + set_page_private(page + i, 0); } } if (folio_test_anon(folio)) { And the try removing any other unnecessary clearing (like in prep_compound_tail()). -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:16 ` David Hildenbrand (Arm) @ 2026-02-09 16:20 ` David Hildenbrand (Arm) 2026-02-09 16:33 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 16:20 UTC (permalink / raw) To: Zi Yan Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 17:16, David Hildenbrand (Arm) wrote: >>> I recall that freeing pages with page->private set was allowed. Although >>> I once wondered whether we should actually change that. >> >> But if that is allowed, we can end up with tail page's private non zero, >> because that free page can merge with a lower PFN buddy and its ->private >> is not reset. See __free_one_page(). > > Right. Or someone could use page->private on tail pages and free non- > zero ->private that way. > > [...] > >> >> For the issue reported by Mikhail[2], the page comes from vmalloc(), >> so it will not be split. >> For other cases, a page/folio needs to be compound to be splittable >> and prep_compound_tail() >> sets all tail page's private to 0. So that check is not that useful. > > Thanks. > >> >> And the issue we are handling here is non compound high order page >> allocation. No one is >> clearing ->private for all pages right now. > > Right. > >> >> OK, I think we want to decide whether it is OK to have a page with set >> ->private at >> page free time. > > Right. And whether it is okay to have any tail->private be non-zero. > >> If no, we can get this patch in and add a VM_WARN_ON_ONCE(page->private) >> to catch all violators. If yes, we can use Mikhail's original patch, >> zeroing ->private >> in split_page() and add a comment on ->private: >> >> 1. for compound page allocation, prep_compound_tail() is responsible >> for resetting ->private; >> 2. for non compound high order page allocation, split_page() is >> responsible for resetting ->private. > > Ideally, I guess, we would minimize the clearing of the ->private fields. > > If we could guarantee that *any* pages in the buddy have ->private > clear, maybe > prep_compound_tail() could stop clearing it (and check instead). > > So similar to what Vlasta said, maybe we want to (not check but actually > clear): > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e4104973e22f..4960a36145fe 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1410,6 +1410,7 @@ __always_inline bool free_pages_prepare(struct > page *page, > } > } > (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + set_page_private(page + i, 0); > } > } Thinking again, maybe it is indeed better to rework the code to not allow freeing pages with ->private on any page. Then, we only have to zero it out where we actually used it and could check here that all ->private is 0. I guess that's a bit more work, and any temporary fix would likely just do. -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:20 ` David Hildenbrand (Arm) @ 2026-02-09 16:33 ` Zi Yan 2026-02-09 17:36 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-09 16:33 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 11:20, David Hildenbrand (Arm) wrote: > On 2/9/26 17:16, David Hildenbrand (Arm) wrote: >>>> I recall that freeing pages with page->private set was allowed. Although >>>> I once wondered whether we should actually change that. >>> >>> But if that is allowed, we can end up with tail page's private non zero, >>> because that free page can merge with a lower PFN buddy and its ->private >>> is not reset. See __free_one_page(). >> >> Right. Or someone could use page->private on tail pages and free non- zero ->private that way. >> >> [...] >> >>> >>> For the issue reported by Mikhail[2], the page comes from vmalloc(), so it will not be split. >>> For other cases, a page/folio needs to be compound to be splittable and prep_compound_tail() >>> sets all tail page's private to 0. So that check is not that useful. >> >> Thanks. >> >>> >>> And the issue we are handling here is non compound high order page allocation. No one is >>> clearing ->private for all pages right now. >> >> Right. >> >>> >>> OK, I think we want to decide whether it is OK to have a page with set ->private at >>> page free time. >> >> Right. And whether it is okay to have any tail->private be non-zero. >> >>> If no, we can get this patch in and add a VM_WARN_ON_ONCE(page->private) >>> to catch all violators. If yes, we can use Mikhail's original patch, zeroing ->private >>> in split_page() and add a comment on ->private: >>> >>> 1. for compound page allocation, prep_compound_tail() is responsible for resetting ->private; >>> 2. for non compound high order page allocation, split_page() is responsible for resetting ->private. >> >> Ideally, I guess, we would minimize the clearing of the ->private fields. >> >> If we could guarantee that *any* pages in the buddy have ->private clear, maybe >> prep_compound_tail() could stop clearing it (and check instead). >> >> So similar to what Vlasta said, maybe we want to (not check but actually clear): >> >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index e4104973e22f..4960a36145fe 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1410,6 +1410,7 @@ __always_inline bool free_pages_prepare(struct page *page, >> } >> } >> (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> + set_page_private(page + i, 0); >> } >> } > > Thinking again, maybe it is indeed better to rework the code to not allow freeing pages with ->private on any page. Then, we only have to zero it out where we actually used it and could check here that all > ->private is 0. > > I guess that's a bit more work, and any temporary fix would likely just do. I agree. Silently fixing non zero ->private just moves the work/responsibility from users to core mm. They could do better. :) We can have a patch or multiple patches to fix users do not zero ->private when freeing a page and add the patch below. The hassle would be that catching all, especially non mm users might not be easy, but we could merge the patch below (and obviously fixes) after next merge window is closed and let rc tests tell us the remaining one. WDYT? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 24ac34199f95..0c5d117a251e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1411,6 +1411,7 @@ __always_inline bool free_pages_prepare(struct page *page, } } (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + VM_WARN_ON_ONCE((page + i)->private); } } if (folio_test_anon(folio)) { @@ -1430,6 +1431,7 @@ __always_inline bool free_pages_prepare(struct page *page, page_cpupid_reset_last(page); page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + VM_WARN_ON_ONCE(page->private); page->private = 0; reset_page_owner(page, order); page_table_check_free(page, order); Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 16:33 ` Zi Yan @ 2026-02-09 17:36 ` David Hildenbrand (Arm) 2026-02-09 17:44 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 17:36 UTC (permalink / raw) To: Zi Yan Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 17:33, Zi Yan wrote: > On 9 Feb 2026, at 11:20, David Hildenbrand (Arm) wrote: > >> On 2/9/26 17:16, David Hildenbrand (Arm) wrote: >>> >>> Right. Or someone could use page->private on tail pages and free non- zero ->private that way. >>> >>> [...] >>> >>> >>> Thanks. >>> >>> >>> Right. >>> >>> >>> Right. And whether it is okay to have any tail->private be non-zero. >>> >>> >>> Ideally, I guess, we would minimize the clearing of the ->private fields. >>> >>> If we could guarantee that *any* pages in the buddy have ->private clear, maybe >>> prep_compound_tail() could stop clearing it (and check instead). >>> >>> So similar to what Vlasta said, maybe we want to (not check but actually clear): >>> >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index e4104973e22f..4960a36145fe 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1410,6 +1410,7 @@ __always_inline bool free_pages_prepare(struct page *page, >>> } >>> } >>> (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>> + set_page_private(page + i, 0); >>> } >>> } >> >> Thinking again, maybe it is indeed better to rework the code to not allow freeing pages with ->private on any page. Then, we only have to zero it out where we actually used it and could check here that all >> ->private is 0. >> >> I guess that's a bit more work, and any temporary fix would likely just do. > > I agree. Silently fixing non zero ->private just moves the work/responsibility > from users to core mm. They could do better. :) > > We can have a patch or multiple patches to fix users do not zero ->private > when freeing a page and add the patch below. Do we know roughly which ones don't zero it out? > The hassle would be that > catching all, especially non mm users might not be easy, but we could merge > the patch below (and obviously fixes) after next merge window is closed and > let rc tests tell us the remaining one. WDYT? LGTM, then we can look into stopping to zero for compound pages. > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 24ac34199f95..0c5d117a251e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1411,6 +1411,7 @@ __always_inline bool free_pages_prepare(struct page *page, > } > } > (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + VM_WARN_ON_ONCE((page + i)->private); > } > } > if (folio_test_anon(folio)) { > @@ -1430,6 +1431,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > page_cpupid_reset_last(page); > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + VM_WARN_ON_ONCE(page->private); > page->private = 0; > reset_page_owner(page, order); > page_table_check_free(page, order); > > > Best Regards, > Yan, Zi -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 17:36 ` David Hildenbrand (Arm) @ 2026-02-09 17:44 ` Zi Yan 2026-02-09 19:39 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-09 17:44 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: > On 2/9/26 17:33, Zi Yan wrote: >> On 9 Feb 2026, at 11:20, David Hildenbrand (Arm) wrote: >> >>> On 2/9/26 17:16, David Hildenbrand (Arm) wrote: >>>> >>>> Right. Or someone could use page->private on tail pages and free non- zero ->private that way. >>>> >>>> [...] >>>> >>>> >>>> Thanks. >>>> >>>> >>>> Right. >>>> >>>> >>>> Right. And whether it is okay to have any tail->private be non-zero. >>>> >>>> >>>> Ideally, I guess, we would minimize the clearing of the ->private fields. >>>> >>>> If we could guarantee that *any* pages in the buddy have ->private clear, maybe >>>> prep_compound_tail() could stop clearing it (and check instead). >>>> >>>> So similar to what Vlasta said, maybe we want to (not check but actually clear): >>>> >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index e4104973e22f..4960a36145fe 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1410,6 +1410,7 @@ __always_inline bool free_pages_prepare(struct page *page, >>>> } >>>> } >>>> (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>> + set_page_private(page + i, 0); >>>> } >>>> } >>> >>> Thinking again, maybe it is indeed better to rework the code to not allow freeing pages with ->private on any page. Then, we only have to zero it out where we actually used it and could check here that all >>> ->private is 0. >>> >>> I guess that's a bit more work, and any temporary fix would likely just do. >> >> I agree. Silently fixing non zero ->private just moves the work/responsibility >> from users to core mm. They could do better. :) >> >> We can have a patch or multiple patches to fix users do not zero ->private >> when freeing a page and add the patch below. > > Do we know roughly which ones don't zero it out? So far based on [1], I found: 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping with private); 2. __free_slab() in mm/slub.c does not zero ->inuse, ->objects, ->frozen (overlapping with private). Mikhail found ttm_pool_unmap_and_free() in drivers/gpu/drm/ttm/ttm_pool.c does not zero ->private, which stores page order. [1] https://lore.kernel.org/all/CABXGCsNyt6DB=SX9JWD=-WK_BiHhbXaCPNV-GOM8GskKJVAn_A@mail.gmail.com/ > >> The hassle would be that >> catching all, especially non mm users might not be easy, but we could merge >> the patch below (and obviously fixes) after next merge window is closed and >> let rc tests tell us the remaining one. WDYT? > > LGTM, then we can look into stopping to zero for compound pages. > >> >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 24ac34199f95..0c5d117a251e 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1411,6 +1411,7 @@ __always_inline bool free_pages_prepare(struct page *page, >> } >> } >> (page + i)->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> + VM_WARN_ON_ONCE((page + i)->private); >> } >> } >> if (folio_test_anon(folio)) { >> @@ -1430,6 +1431,7 @@ __always_inline bool free_pages_prepare(struct page *page, >> >> page_cpupid_reset_last(page); >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >> + VM_WARN_ON_ONCE(page->private); >> page->private = 0; >> reset_page_owner(page, order); >> page_table_check_free(page, order); >> >> >> Best Regards, >> Yan, Zi > > > -- > Cheers, > > David Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 17:44 ` Zi Yan @ 2026-02-09 19:39 ` David Hildenbrand (Arm) 2026-02-09 19:42 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 19:39 UTC (permalink / raw) To: Zi Yan Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/9/26 18:44, Zi Yan wrote: > On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: > >> On 2/9/26 17:33, Zi Yan wrote: >>> >>> >>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>> from users to core mm. They could do better. :) >>> >>> We can have a patch or multiple patches to fix users do not zero ->private >>> when freeing a page and add the patch below. >> >> Do we know roughly which ones don't zero it out? > > So far based on [1], I found: > > 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping > with private); > 2. __free_slab() in mm/slub.c does not zero ->inuse, ->objects, ->frozen > (overlapping with private). > > Mikhail found ttm_pool_unmap_and_free() in drivers/gpu/drm/ttm/ttm_pool.c > does not zero ->private, which stores page order. > Looks doable then :) Should we take v3 as a quick fix to backport then? -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 19:39 ` David Hildenbrand (Arm) @ 2026-02-09 19:42 ` Zi Yan 2026-02-10 1:20 ` Baolin Wang 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-09 19:42 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 14:39, David Hildenbrand (Arm) wrote: > On 2/9/26 18:44, Zi Yan wrote: >> On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: >> >>> On 2/9/26 17:33, Zi Yan wrote: >>>> >>>> >>>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>>> from users to core mm. They could do better. :) >>>> >>>> We can have a patch or multiple patches to fix users do not zero ->private >>>> when freeing a page and add the patch below. >>> >>> Do we know roughly which ones don't zero it out? >> >> So far based on [1], I found: >> >> 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping >> with private); >> 2. __free_slab() in mm/slub.c does not zero ->inuse, ->objects, ->frozen >> (overlapping with private). >> >> Mikhail found ttm_pool_unmap_and_free() in drivers/gpu/drm/ttm/ttm_pool.c >> does not zero ->private, which stores page order. >> > > Looks doable then :) Should we take v3 as a quick fix to backport then? Sounds good to me. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-09 19:42 ` Zi Yan @ 2026-02-10 1:20 ` Baolin Wang 2026-02-10 2:12 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Baolin Wang @ 2026-02-10 1:20 UTC (permalink / raw) To: Zi Yan, David Hildenbrand (Arm) Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/10/26 3:42 AM, Zi Yan wrote: > On 9 Feb 2026, at 14:39, David Hildenbrand (Arm) wrote: > >> On 2/9/26 18:44, Zi Yan wrote: >>> On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: >>> >>>> On 2/9/26 17:33, Zi Yan wrote: >>>>> >>>>> >>>>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>>>> from users to core mm. They could do better. :) >>>>> >>>>> We can have a patch or multiple patches to fix users do not zero ->private >>>>> when freeing a page and add the patch below. >>>> >>>> Do we know roughly which ones don't zero it out? >>> >>> So far based on [1], I found: >>> >>> 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping >>> with private); After Kairui’s series [1], the shmem part looks good to me. As we no longer skip the swapcache now, we shouldn’t clear the ->swap.val of a swapcache folio if failed to swap-in. [1]https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#mcba8a32e1021dc28ce1e824c9d042dca316a30d7 >>> 2. __free_slab() in mm/slub.c does not zero ->inuse, ->objects, ->frozen >>> (overlapping with private). >>> >>> Mikhail found ttm_pool_unmap_and_free() in drivers/gpu/drm/ttm/ttm_pool.c >>> does not zero ->private, which stores page order. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-10 1:20 ` Baolin Wang @ 2026-02-10 2:12 ` Zi Yan 2026-02-10 2:25 ` Baolin Wang 0 siblings, 1 reply; 42+ messages in thread From: Zi Yan @ 2026-02-10 2:12 UTC (permalink / raw) To: Baolin Wang Cc: David Hildenbrand (Arm), Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 20:20, Baolin Wang wrote: > On 2/10/26 3:42 AM, Zi Yan wrote: >> On 9 Feb 2026, at 14:39, David Hildenbrand (Arm) wrote: >> >>> On 2/9/26 18:44, Zi Yan wrote: >>>> On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: >>>> >>>>> On 2/9/26 17:33, Zi Yan wrote: >>>>>> >>>>>> >>>>>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>>>>> from users to core mm. They could do better. :) >>>>>> >>>>>> We can have a patch or multiple patches to fix users do not zero ->private >>>>>> when freeing a page and add the patch below. >>>>> >>>>> Do we know roughly which ones don't zero it out? >>>> >>>> So far based on [1], I found: >>>> >>>> 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping >>>> with private); > > After Kairui’s series [1], the shmem part looks good to me. As we no longer skip the swapcache now, we shouldn’t clear the ->swap.val of a swapcache folio if failed to swap-in. What do you mean by "after Kairui's series[1]"? Can you elaborate a little bit more? For the diff below, does the "folio_put(folio)" have different outcomes based on skip_swapcache? Only if skip_swapcache is true, "folio_put(folio)" frees the folio? Thanks. diff --git a/mm/shmem.c b/mm/shmem.c index ec6c01378e9d..546e193ef993 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, failed_nolock: if (skip_swapcache) swapcache_clear(si, folio->swap, folio_nr_pages(folio)); - if (folio) + if (folio) { + folio->swap.val = 0; folio_put(folio); + } put_swap_device(si); return error; > > [1]https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#mcba8a32e1021dc28ce1e824c9d042dca316a30d7 > >>>> 2. __free_slab() in mm/slub.c does not zero ->inuse, ->objects, ->frozen >>>> (overlapping with private). >>>> >>>> Mikhail found ttm_pool_unmap_and_free() in drivers/gpu/drm/ttm/ttm_pool.c >>>> does not zero ->private, which stores page order. -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-10 2:12 ` Zi Yan @ 2026-02-10 2:25 ` Baolin Wang 2026-02-10 2:32 ` Zi Yan 0 siblings, 1 reply; 42+ messages in thread From: Baolin Wang @ 2026-02-10 2:25 UTC (permalink / raw) To: Zi Yan Cc: David Hildenbrand (Arm), Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/10/26 10:12 AM, Zi Yan wrote: > On 9 Feb 2026, at 20:20, Baolin Wang wrote: > >> On 2/10/26 3:42 AM, Zi Yan wrote: >>> On 9 Feb 2026, at 14:39, David Hildenbrand (Arm) wrote: >>> >>>> On 2/9/26 18:44, Zi Yan wrote: >>>>> On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: >>>>> >>>>>> On 2/9/26 17:33, Zi Yan wrote: >>>>>>> >>>>>>> >>>>>>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>>>>>> from users to core mm. They could do better. :) >>>>>>> >>>>>>> We can have a patch or multiple patches to fix users do not zero ->private >>>>>>> when freeing a page and add the patch below. >>>>>> >>>>>> Do we know roughly which ones don't zero it out? >>>>> >>>>> So far based on [1], I found: >>>>> >>>>> 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping >>>>> with private); >> >> After Kairui’s series [1], the shmem part looks good to me. As we no longer skip the swapcache now, we shouldn’t clear the ->swap.val of a swapcache folio if failed to swap-in. > > What do you mean by "after Kairui's series[1]"? Can you elaborate a little bit more? Sure. This patch [2] in Kairui's series will never skip the swapcache, which means the shmem folio we’re trying to swap-in must be in the swapcache. [2] https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#me242d9f77d2caa126124afd5a7731113e8f0346e > For the diff below, does the "folio_put(folio)" have different outcomes based on > skip_swapcache? Only if skip_swapcache is true, "folio_put(folio)" frees the folio? Please check the latest mm-stable branch. The skip_swapcache related logic has been removed by Kairui’s series [1]. > diff --git a/mm/shmem.c b/mm/shmem.c > index ec6c01378e9d..546e193ef993 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, > failed_nolock: > if (skip_swapcache) > swapcache_clear(si, folio->swap, folio_nr_pages(folio)); > - if (folio) > + if (folio) { > + folio->swap.val = 0; > folio_put(folio); > + } > put_swap_device(si); > > return error; Without Kairui's series, this change is incorrect. Yes, only if skip_swapcache is true, the "folio_put(folio)" frees the folio. Otherwise the folio is in the swapcache, and we will not free it. >> [1]https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#mcba8a32e1021dc28ce1e824c9d042dca316a30d7 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-10 2:25 ` Baolin Wang @ 2026-02-10 2:32 ` Zi Yan 0 siblings, 0 replies; 42+ messages in thread From: Zi Yan @ 2026-02-10 2:32 UTC (permalink / raw) To: Baolin Wang Cc: David Hildenbrand (Arm), Mikhail Gavrilov, linux-mm, akpm, vbabka, surenb, mhocko, jackmanb, hannes, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 9 Feb 2026, at 21:25, Baolin Wang wrote: > On 2/10/26 10:12 AM, Zi Yan wrote: >> On 9 Feb 2026, at 20:20, Baolin Wang wrote: >> >>> On 2/10/26 3:42 AM, Zi Yan wrote: >>>> On 9 Feb 2026, at 14:39, David Hildenbrand (Arm) wrote: >>>> >>>>> On 2/9/26 18:44, Zi Yan wrote: >>>>>> On 9 Feb 2026, at 12:36, David Hildenbrand (Arm) wrote: >>>>>> >>>>>>> On 2/9/26 17:33, Zi Yan wrote: >>>>>>>> >>>>>>>> >>>>>>>> I agree. Silently fixing non zero ->private just moves the work/responsibility >>>>>>>> from users to core mm. They could do better. :) >>>>>>>> >>>>>>>> We can have a patch or multiple patches to fix users do not zero ->private >>>>>>>> when freeing a page and add the patch below. >>>>>>> >>>>>>> Do we know roughly which ones don't zero it out? >>>>>> >>>>>> So far based on [1], I found: >>>>>> >>>>>> 1. shmem_swapin_folio() in mm/shmem.c does not zero ->swap.val (overlapping >>>>>> with private); >>> >>> After Kairui’s series [1], the shmem part looks good to me. As we no longer skip the swapcache now, we shouldn’t clear the ->swap.val of a swapcache folio if failed to swap-in. >> >> What do you mean by "after Kairui's series[1]"? Can you elaborate a little bit more? > > Sure. This patch [2] in Kairui's series will never skip the swapcache, which means the shmem folio we’re trying to swap-in must be in the swapcache. > > [2] https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#me242d9f77d2caa126124afd5a7731113e8f0346e > >> For the diff below, does the "folio_put(folio)" have different outcomes based on >> skip_swapcache? Only if skip_swapcache is true, "folio_put(folio)" frees the folio? > > Please check the latest mm-stable branch. The skip_swapcache related logic has been removed by Kairui’s series [1]. > >> diff --git a/mm/shmem.c b/mm/shmem.c >> index ec6c01378e9d..546e193ef993 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, >> failed_nolock: >> if (skip_swapcache) >> swapcache_clear(si, folio->swap, folio_nr_pages(folio)); >> - if (folio) >> + if (folio) { >> + folio->swap.val = 0; >> folio_put(folio); >> + } >> put_swap_device(si); >> >> return error; > > Without Kairui's series, this change is incorrect. Yes, only if skip_swapcache is true, the "folio_put(folio)" frees the folio. Otherwise the folio is in the swapcache, and we will not free it. Got it. Thanks. I just realized that the above diff is on top of v6.19-rc7. The fix to mm-new/mm-stable for shmem should be: diff --git a/mm/shmem.c b/mm/shmem.c index eaaeca8f6c39..a52eca656ade 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2447,8 +2447,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, if (folio) folio_unlock(folio); failed_nolock: - if (folio) + if (folio) { + folio->swap.val = 0; folio_put(folio); + } put_swap_device(si); return error; Thank you for the explanation. > >>> [1]https://lore.kernel.org/all/20251219195751.61328-1-ryncsn@gmail.com/T/#mcba8a32e1021dc28ce1e824c9d042dca316a30d7 -- Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 17:36 ` [PATCH v3] " Mikhail Gavrilov 2026-02-07 22:02 ` David Hildenbrand (Arm) @ 2026-02-09 19:46 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 42+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 19:46 UTC (permalink / raw) To: Mikhail Gavrilov, linux-mm Cc: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, npiggin, linux-kernel, kasong, hughd, chrisl, ryncsn, stable, willy On 2/7/26 18:36, Mikhail Gavrilov wrote: > Several subsystems (slub, shmem, ttm, etc.) use page->private but don't > clear it before freeing pages. When these pages are later allocated as > high-order pages and split via split_page(), tail pages retain stale > page->private values. > > This causes a use-after-free in the swap subsystem. The swap code uses > page->private to track swap count continuations, assuming freshly > allocated pages have page->private == 0. When stale values are present, > swap_count_continued() incorrectly assumes the continuation list is valid > and iterates over uninitialized page->lru containing LIST_POISON values, > causing a crash: > > KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] > RIP: 0010:__do_sys_swapoff+0x1151/0x1860 > > Fix this by clearing page->private in free_pages_prepare(), ensuring all > freed pages have clean state regardless of previous use. > > Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") > Cc: stable@vger.kernel.org > Suggested-by: Zi Yan <ziy@nvidia.com> > Acked-by: Zi Yan <ziy@nvidia.com> > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- Okay, let's do this as a fix, and cleanup the page->private handling separately. Thanks! Acked-by: David Hildenbrand (Arm) <david@kernel.org> -- Cheers, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() 2026-02-07 15:37 ` [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() Mikhail Gavrilov 2026-02-07 16:12 ` Zi Yan @ 2026-02-09 11:11 ` Vlastimil Babka 1 sibling, 0 replies; 42+ messages in thread From: Vlastimil Babka @ 2026-02-09 11:11 UTC (permalink / raw) To: Mikhail Gavrilov, linux-mm Cc: akpm, chrisl, kasong, hughd, ryncsn, ziy, stable On 2/7/26 16:37, Mikhail Gavrilov wrote: > Several subsystems (slub, shmem, ttm, etc.) use page->private but don't > clear it before freeing pages. When these pages are later allocated as > high-order pages and split via split_page(), tail pages retain stale > page->private values. > > This causes a use-after-free in the swap subsystem. The swap code uses > page->private to track swap count continuations, assuming freshly > allocated pages have page->private == 0. When stale values are present, > swap_count_continued() incorrectly assumes the continuation list is valid > and iterates over uninitialized page->lru containing LIST_POISON values, > causing a crash: > > KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] > RIP: 0010:__do_sys_swapoff+0x1151/0x1860 > > Fix this by clearing page->private in free_pages_prepare(), ensuring all > freed pages have clean state regardless of previous use. > > Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") > Cc: stable@vger.kernel.org > Suggested-by: Zi Yan <ziy@nvidia.com> > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Seems cheap enough and robust to just do it. Could be different for tail pages (although we do modify them too anyway). Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cbf758e27aa2..24ac34199f95 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1430,6 +1430,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > page_cpupid_reset_last(page); > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > + page->private = 0; > reset_page_owner(page, order); > page_table_check_free(page, order); > pgalloc_tag_sub(page, 1 << order); ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages 2026-02-06 18:08 ` Zi Yan 2026-02-06 18:21 ` Mikhail Gavrilov @ 2026-02-06 18:24 ` Kairui Song 1 sibling, 0 replies; 42+ messages in thread From: Kairui Song @ 2026-02-06 18:24 UTC (permalink / raw) To: Zi Yan Cc: Mikhail Gavrilov, linux-mm, akpm, vbabka, chrisl, hughd, stable, David Hildenbrand, surenb, Matthew Wilcox, mhocko, hannes, jackmanb On Sat, Feb 7, 2026 at 2:08 AM Zi Yan <ziy@nvidia.com> wrote: > > +willy, david, and others included in Andrew’s mm-commit email. > > On 6 Feb 2026, at 12:40, Mikhail Gavrilov wrote: > > > When vmalloc allocates high-order pages and splits them via split_page(), > > tail pages may retain stale page->private values from previous use by the > > buddy allocator. > > Do you have a reproducer for this issue? Last time I checked page->private This patch is from previous discussion: https://lore.kernel.org/linux-mm/CABXGCsO3XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com/ > usage, I find users clears ->private before free a page. I wonder which one > I was missing. The comment above page_private() does say ->private can > be used on tail pages. If pages are freed with non-zero private in > tail pages, we need to either correct the violating user or clear > all pages ->private in post_alloc_hook() in addition to the head one. > Clearing ->private in split_page() looks like a hack instead of a fix. It looks odd to me too. That bug starts with vmalloc dropping __GFP_COMP in commit 3b8000ae185c, because with __GFP_COMP, the allocator does clean the ->private of tail pages on allocation with prep_compound_page. Without __GFP_COMP, these ->private fields are left as it is. ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2026-02-10 2:32 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CABXGCs03XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com>
2026-02-06 17:40 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Mikhail Gavrilov
2026-02-06 18:08 ` Zi Yan
2026-02-06 18:21 ` Mikhail Gavrilov
2026-02-06 18:29 ` Zi Yan
2026-02-06 18:33 ` Zi Yan
2026-02-06 19:58 ` Zi Yan
2026-02-06 20:49 ` Zi Yan
2026-02-06 22:16 ` Mikhail Gavrilov
2026-02-06 22:37 ` Mikhail Gavrilov
2026-02-06 23:06 ` Zi Yan
2026-02-07 3:28 ` Zi Yan
2026-02-07 14:25 ` Mikhail Gavrilov
2026-02-07 14:32 ` Zi Yan
2026-02-07 15:03 ` Mikhail Gavrilov
2026-02-07 15:06 ` Zi Yan
2026-02-07 15:37 ` [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() Mikhail Gavrilov
2026-02-07 16:12 ` Zi Yan
2026-02-07 17:36 ` [PATCH v3] " Mikhail Gavrilov
2026-02-07 22:02 ` David Hildenbrand (Arm)
2026-02-07 22:08 ` David Hildenbrand (Arm)
2026-02-09 11:17 ` Vlastimil Babka
2026-02-09 15:46 ` David Hildenbrand (Arm)
2026-02-09 16:00 ` Zi Yan
2026-02-09 16:03 ` David Hildenbrand (Arm)
2026-02-09 16:05 ` Zi Yan
2026-02-09 16:06 ` David Hildenbrand (Arm)
2026-02-09 16:08 ` Zi Yan
2026-02-07 23:00 ` Zi Yan
2026-02-09 16:16 ` David Hildenbrand (Arm)
2026-02-09 16:20 ` David Hildenbrand (Arm)
2026-02-09 16:33 ` Zi Yan
2026-02-09 17:36 ` David Hildenbrand (Arm)
2026-02-09 17:44 ` Zi Yan
2026-02-09 19:39 ` David Hildenbrand (Arm)
2026-02-09 19:42 ` Zi Yan
2026-02-10 1:20 ` Baolin Wang
2026-02-10 2:12 ` Zi Yan
2026-02-10 2:25 ` Baolin Wang
2026-02-10 2:32 ` Zi Yan
2026-02-09 19:46 ` David Hildenbrand (Arm)
2026-02-09 11:11 ` [PATCH v2] " Vlastimil Babka
2026-02-06 18:24 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Kairui Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox