From: Zi Yan <ziy@nvidia.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, chrisl@kernel.org,
kasong@tencent.com, hughd@google.com, stable@vger.kernel.org,
David Hildenbrand <david@kernel.org>,
surenb@google.com, Matthew Wilcox <willy@infradead.org>,
mhocko@suse.com, hannes@cmpxchg.org, jackmanb@google.com,
vbabka@suse.cz, Kairui Song <ryncsn@gmail.com>
Subject: Re: [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages
Date: Sat, 07 Feb 2026 09:32:12 -0500 [thread overview]
Message-ID: <247E7FE9-E089-43D1-882B-81C7134C2FFE@nvidia.com> (raw)
In-Reply-To: <CABXGCsNyt6DB=SX9JWD=-WK_BiHhbXaCPNV-GOM8GskKJVAn_A@mail.gmail.com>
On 7 Feb 2026, at 9:25, Mikhail Gavrilov wrote:
> On Sat, Feb 7, 2026 at 8:28 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> OK, it seems that both slub and shmem do not reset ->private when freeing
>> pages/folios. And tail page's private is not zero, because when a page
>> with non zero private is freed and gets merged with a lower buddy, its
>> private is not set to 0 in the code path.
>>
>> The patch below seems to fix the issue, since I am at Iteration 104 and counting.
>> I also put a VM_BUG_ON(page->private) in free_pages_prepare() and it is not
>> triggered either.
>>
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index ec6c01378e9d..546e193ef993 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2437,8 +2437,10 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>> failed_nolock:
>> if (skip_swapcache)
>> swapcache_clear(si, folio->swap, folio_nr_pages(folio));
>> - if (folio)
>> + if (folio) {
>> + folio->swap.val = 0;
>> folio_put(folio);
>> + }
>> put_swap_device(si);
>>
>> return error;
>> diff --git a/mm/slub.c b/mm/slub.c
>> index f77b7407c51b..2cdab6d66e1a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3311,6 +3311,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
>>
>> __slab_clear_pfmemalloc(slab);
>> page->mapping = NULL;
>> + page->private = 0;
>> __ClearPageSlab(page);
>> mm_account_reclaimed_pages(pages);
>> unaccount_slab(slab, order, s);
>>
>>
>>
>> But I am not sure if that is all. Maybe the patch below on top is needed to find all violators
>> and still keep the system running. I also would like to hear from others on whether page->private
>> should be reset or not before free_pages_prepare().
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index cbf758e27aa2..9058f94b0667 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1430,6 +1430,8 @@ __always_inline bool free_pages_prepare(struct page *page,
>>
>> page_cpupid_reset_last(page);
>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>> + VM_WARN_ON_ONCE(page->private);
>> + page->private = 0;
>> reset_page_owner(page, order);
>> page_table_check_free(page, order);
>> pgalloc_tag_sub(page, 1 << order);
>>
>>
>> --
>> Best Regards,
>> Yan, Zi
>
> I tested your patch. The VM_WARN_ON_ONCE caught another violator - TTM
> (GPU memory manager):
Thanks. As a fix, I think we could combine the two patches above into one and remove
the VM_WARN_ON_ONCE() or just send the second one without VM_WARN_ON_ONCE().
I can send a separate patch later to fix all users that do not reset ->private
and include VM_WARN_ON_ONCE().
WDYT?
> ------------[ cut here ]------------
> WARNING: mm/page_alloc.c:1433 at __free_pages_ok+0xe1e/0x12c0,
> CPU#16: gnome-shell/5841
> Modules linked in: overlay uinput rfcomm snd_seq_dummy snd_hrtimer
> xt_mark xt_cgroup xt_MASQUERADE ip6t_REJECT ipt_REJECT nft_compat
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr uhid bnep sunrpc amd_atl
> intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib
> mt76_connac_lib btusb mt76 btmtk btrtl btbcm btintel vfat edac_mce_amd
> spd5118 bluetooth fat snd_hda_codec_atihdmi asus_ec_sensors mac80211
> snd_hda_codec_hdmi kvm_amd snd_hda_intel uvcvideo snd_usb_audio
> snd_hda_codec uvc videobuf2_vmalloc kvm videobuf2_memops snd_hda_core
> joydev videobuf2_v4l2 snd_intel_dspcfg videobuf2_common
> snd_usbmidi_lib videodev snd_intel_sdw_acpi snd_ump irqbypass
> snd_hwdep asus_nb_wmi mc snd_rawmidi rapl snd_seq asus_wmi cfg80211
> sparse_keymap snd_seq_device platform_profile wmi_bmof pcspkr snd_pcm
> snd_timer rfkill igc snd
> libarc4 i2c_piix4 soundcore k10temp i2c_smbus gpio_amdpt
> gpio_generic nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp
> i2c_algo_bit drm_ttm_helper ttm drm_exec drm_panel_backlight_quirks
> gpu_sched drm_suballoc_helper nvme video nvme_core drm_buddy
> ghash_clmulni_intel drm_display_helper nvme_keyring nvme_auth cec
> sp5100_tco hkdf wmi uas usb_storage fuse ntsync i2c_dev
> CPU: 16 UID: 1000 PID: 5841 Comm: gnome-shell Tainted: G W
> 6.19.0-rc8-f14faaf3a1fb-with-fix-reset-private-when-freeing+ #82
> PREEMPT(lazy)
> Tainted: [W]=WARN
> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
> WIFI, BIOS 3602 11/13/2025
> RIP: 0010:__free_pages_ok+0xe1e/0x12c0
> Code: ef 48 89 c6 e8 f3 59 ff ff 83 44 24 20 01 49 ba 00 00 00 00 00
> fc ff df e9 71 fe ff ff 41 c7 45 30 ff ff ff ff e9 f5 f4 ff ff <0f> 0b
> e9 73 f5 ff ff e8 86 4c 0e 00 e9 02 fb ff ff 48 c7 44 24 30
> RSP: 0018:ffffc9000e0cf878 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000f80 RCX: 1ffffd40028c6000
> RDX: 1ffffd40028c6005 RSI: 0000000000000004 RDI: ffffea0014630038
> RBP: ffffea0014630028 R08: ffffffff9e58e2de R09: 1ffffd40028c6006
> R10: fffff940028c6007 R11: fffff940028c6007 R12: ffffffffa27376d8
> R13: ffffea0014630000 R14: ffff889054e559c0 R15: 0000000000000000
> FS: 00007f510f914000(0000) GS:ffff8890317a8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005607eaf70168 CR3: 00000001dfd6a000 CR4: 0000000000f50ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ttm_pool_unmap_and_free+0x30c/0x520 [ttm]
> ? dma_resv_iter_first_unlocked+0x2f9/0x470
> ttm_pool_free_range+0xef/0x160 [ttm]
> ? __pfx_drm_gem_close_ioctl+0x10/0x10
> ttm_pool_free+0x70/0xe0 [ttm]
> ? rcu_is_watching+0x15/0xe0
> ttm_tt_unpopulate+0xa2/0x2d0 [ttm]
> ttm_bo_cleanup_memtype_use+0xec/0x200 [ttm]
> ttm_bo_release+0x371/0xb00 [ttm]
> ? __pfx_ttm_bo_release+0x10/0x10 [ttm]
> ? drm_vma_node_revoke+0x1a/0x1e0
> ? local_clock+0x15/0x30
> ? __pfx_drm_gem_close_ioctl+0x10/0x10
> drm_gem_object_release_handle+0xcd/0x1f0
> drm_gem_handle_delete+0x6a/0xc0
> ? drm_dev_exit+0x35/0x50
> drm_ioctl_kernel+0x172/0x2e0
> ? __lock_release.isra.0+0x1a2/0x370
> ? __pfx_drm_ioctl_kernel+0x10/0x10
> drm_ioctl+0x571/0xb50
> ? __pfx_drm_gem_close_ioctl+0x10/0x10
> ? __pfx_drm_ioctl+0x10/0x10
> ? rcu_is_watching+0x15/0xe0
> ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
> ? trace_hardirqs_on+0x18/0x140
> ? lockdep_hardirqs_on+0x90/0x130
> ? __raw_spin_unlock_irqrestore+0x5d/0x80
> ? __raw_spin_unlock_irqrestore+0x46/0x80
> amdgpu_drm_ioctl+0xd3/0x190 [amdgpu]
> __x64_sys_ioctl+0x13c/0x1d0
> ? syscall_trace_enter+0x15c/0x2a0
> do_syscall_64+0x9c/0x4e0
> ? __lock_release.isra.0+0x1a2/0x370
> ? do_user_addr_fault+0x87a/0xf60
> ? fpregs_assert_state_consistent+0x8f/0x100
> ? trace_hardirqs_on_prepare+0x101/0x140
> ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
> ? irqentry_exit+0x99/0x600
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f5113af889d
> Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00
> 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2
> 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
> RSP: 002b:00007fff83c100c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00005607ed127c50 RCX: 00007f5113af889d
> RDX: 00007fff83c10150 RSI: 0000000040086409 RDI: 000000000000000e
> RBP: 00007fff83c10110 R08: 00005607ead46d50 R09: 0000000000000000
> R10: 0000000000000031 R11: 0000000000000246 R12: 00007fff83c10150
> R13: 0000000040086409 R14: 000000000000000e R15: 00005607ead46d50
> </TASK>
> irq event stamp: 5186663
> hardirqs last enabled at (5186669): [<ffffffff9dc9ce6e>]
> __up_console_sem+0x7e/0x90
> hardirqs last disabled at (5186674): [<ffffffff9dc9ce53>]
> __up_console_sem+0x63/0x90
> softirqs last enabled at (5186538): [<ffffffff9da5325b>]
> handle_softirqs+0x54b/0x810
> softirqs last disabled at (5186531): [<ffffffff9da53654>]
> __irq_exit_rcu+0x124/0x240
> ---[ end trace 0000000000000000 ]---
>
> So there are more violators than just slub and shmem.
> I also tested the post_alloc_hook() fix (clearing page->private for
> all pages at allocation) - 1600+ iterations without crash.
> Given multiple violators, maybe a defensive fix (either in
> split_page() which is already in mm-unstable, or in post_alloc_hook())
> is the right approach, rather than hunting down each violator?
>
> --
> Best Regards,
> Mike Gavrilov.
--
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2026-02-07 14:32 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CABXGCs03XcXt5GDae7d74ynC6P6G2gLw3ZrwAYvSQ3PwP0mGXA@mail.gmail.com>
2026-02-06 17:40 ` Mikhail Gavrilov
2026-02-06 18:08 ` Zi Yan
2026-02-06 18:21 ` Mikhail Gavrilov
2026-02-06 18:29 ` Zi Yan
2026-02-06 18:33 ` Zi Yan
2026-02-06 19:58 ` Zi Yan
2026-02-06 20:49 ` Zi Yan
2026-02-06 22:16 ` Mikhail Gavrilov
2026-02-06 22:37 ` Mikhail Gavrilov
2026-02-06 23:06 ` Zi Yan
2026-02-07 3:28 ` Zi Yan
2026-02-07 14:25 ` Mikhail Gavrilov
2026-02-07 14:32 ` Zi Yan [this message]
2026-02-07 15:03 ` Mikhail Gavrilov
2026-02-07 15:06 ` Zi Yan
2026-02-07 15:37 ` [PATCH v2] mm/page_alloc: clear page->private in free_pages_prepare() Mikhail Gavrilov
2026-02-07 16:12 ` Zi Yan
2026-02-07 17:36 ` [PATCH v3] " Mikhail Gavrilov
2026-02-07 22:02 ` David Hildenbrand (Arm)
2026-02-07 22:08 ` David Hildenbrand (Arm)
2026-02-09 11:17 ` Vlastimil Babka
2026-02-09 15:46 ` David Hildenbrand (Arm)
2026-02-09 16:00 ` Zi Yan
2026-02-09 16:03 ` David Hildenbrand (Arm)
2026-02-09 16:05 ` Zi Yan
2026-02-09 16:06 ` David Hildenbrand (Arm)
2026-02-09 16:08 ` Zi Yan
2026-02-07 23:00 ` Zi Yan
2026-02-09 16:16 ` David Hildenbrand (Arm)
2026-02-09 16:20 ` David Hildenbrand (Arm)
2026-02-09 16:33 ` Zi Yan
2026-02-09 17:36 ` David Hildenbrand (Arm)
2026-02-09 17:44 ` Zi Yan
2026-02-09 19:39 ` David Hildenbrand (Arm)
2026-02-09 19:42 ` Zi Yan
2026-02-10 1:20 ` Baolin Wang
2026-02-10 2:12 ` Zi Yan
2026-02-10 2:25 ` Baolin Wang
2026-02-10 2:32 ` Zi Yan
2026-02-09 19:46 ` David Hildenbrand (Arm)
2026-02-09 11:11 ` [PATCH v2] " Vlastimil Babka
2026-02-06 18:24 ` [PATCH] mm/page_alloc: clear page->private in split_page() for tail pages Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=247E7FE9-E089-43D1-882B-81C7134C2FFE@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mikhail.v.gavrilov@gmail.com \
--cc=ryncsn@gmail.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox