* [PATCH] mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
@ 2024-10-21 16:48 Roman Gushchin
2024-10-21 17:01 ` Vlastimil Babka
0 siblings, 1 reply; 3+ messages in thread
From: Roman Gushchin @ 2024-10-21 16:48 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Vlastimil Babka, linux-kernel, Roman Gushchin, stable,
Hugh Dickins, Matthew Wilcox
Syzbot reported [1] a bad page state problem caused by a page
being freed using free_page() still having a mlocked flag at
free_pages_prepare() stage:
BUG: Bad page state in process syz.0.15 pfn:1137bb
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb
flags: 0x400000000080000(mlocked|node=0|zone=1)
raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000
raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask
0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid
3004 (syz.0.15), ts 61546 608067, free_ts 61390082085
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457
__alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265
kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline]
kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
page last free pid 951 tgid 951 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638
vfree+0x181/0x2e0 mm/vmalloc.c:3361
delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310
worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391
kthread+0x2df/0x370 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The problem was originally introduced by
commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final
clearance"): it was handling focused on handling pagecache
and anonymous memory and wasn't suitable for lower level
get_page()/free_page() API's used for example by KVM, as with
this reproducer.
Fix it by moving the mlocked flag clearance down to
free_page_prepare().
The bug itself if fairly old and harmless (aside from generating these
warnings), so the stable backport is likely not justified.
Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000
Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
---
mm/page_alloc.c | 9 +++++++++
mm/swap.c | 14 --------------
2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc55d39eb372..24200651ad92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page,
bool skip_kasan_poison = should_skip_kasan_poison(page);
bool init = want_init_on_free();
bool compound = PageCompound(page);
+ struct folio *folio = page_folio(page);
VM_BUG_ON_PAGE(PageTail(page), page);
@@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page,
if (memcg_kmem_online() && PageMemcgKmem(page))
__memcg_kmem_uncharge_page(page, order);
+ if (unlikely(folio_test_mlocked(folio))) {
+ long nr_pages = folio_nr_pages(folio);
+
+ __folio_clear_mlocked(folio);
+ zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
+ count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
+ }
+
if (unlikely(PageHWPoison(page)) && !order) {
/* Do not let hwpoison pages hit pcplists/buddy */
reset_page_owner(page, order);
diff --git a/mm/swap.c b/mm/swap.c
index 835bdf324b76..7cd0f4719423 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp,
lruvec_del_folio(*lruvecp, folio);
__folio_clear_lru_flags(folio);
}
-
- /*
- * In rare cases, when truncation or holepunching raced with
- * munlock after VM_LOCKED was cleared, Mlocked may still be
- * found set here. This does not indicate a problem, unless
- * "unevictable_pgs_cleared" appears worryingly large.
- */
- if (unlikely(folio_test_mlocked(folio))) {
- long nr_pages = folio_nr_pages(folio);
-
- __folio_clear_mlocked(folio);
- zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
- count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
- }
}
/*
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
2024-10-21 16:48 [PATCH] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Roman Gushchin
@ 2024-10-21 17:01 ` Vlastimil Babka
2024-10-21 17:17 ` Roman Gushchin
0 siblings, 1 reply; 3+ messages in thread
From: Vlastimil Babka @ 2024-10-21 17:01 UTC (permalink / raw)
To: Roman Gushchin, Andrew Morton
Cc: linux-mm, linux-kernel, stable, Hugh Dickins, Matthew Wilcox
On 10/21/24 18:48, Roman Gushchin wrote:
> Syzbot reported [1] a bad page state problem caused by a page
> being freed using free_page() still having a mlocked flag at
> free_pages_prepare() stage:
>
> BUG: Bad page state in process syz.0.15 pfn:1137bb
> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb
> flags: 0x400000000080000(mlocked|node=0|zone=1)
> raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000
> raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> page_owner tracks the page as allocated
> page last allocated via order 0, migratetype Unmovable, gfp_mask
> 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid
> 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085
> set_page_owner include/linux/page_owner.h:32 [inline]
> post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
> prep_new_page mm/page_alloc.c:1545 [inline]
> get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457
> __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733
> alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265
> kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
> kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
> kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline]
> kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:907 [inline]
> __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> page last free pid 951 tgid 951 stack trace:
> reset_page_owner include/linux/page_owner.h:25 [inline]
> free_pages_prepare mm/page_alloc.c:1108 [inline]
> free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638
> vfree+0x181/0x2e0 mm/vmalloc.c:3361
> delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282
> process_one_work kernel/workqueue.c:3229 [inline]
> process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310
> worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391
> kthread+0x2df/0x370 kernel/kthread.c:389
> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>
> The problem was originally introduced by
> commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final
> clearance"): it was handling focused on handling pagecache
> and anonymous memory and wasn't suitable for lower level
> get_page()/free_page() API's used for example by KVM, as with
> this reproducer.
Does that mean KVM is mlocking pages that are not pagecache nor anonymous,
thus not LRU? How and why (and since when) is that done?
> Fix it by moving the mlocked flag clearance down to
> free_page_prepare().
>
> The bug itself if fairly old and harmless (aside from generating these
> warnings), so the stable backport is likely not justified.
But since there's a Cc: stable below, it will be backported :)
> Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000
> Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance")
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: <stable@vger.kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> ---
> mm/page_alloc.c | 9 +++++++++
> mm/swap.c | 14 --------------
> 2 files changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bc55d39eb372..24200651ad92 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page,
> bool skip_kasan_poison = should_skip_kasan_poison(page);
> bool init = want_init_on_free();
> bool compound = PageCompound(page);
> + struct folio *folio = page_folio(page);
>
> VM_BUG_ON_PAGE(PageTail(page), page);
>
> @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page,
> if (memcg_kmem_online() && PageMemcgKmem(page))
> __memcg_kmem_uncharge_page(page, order);
>
> + if (unlikely(folio_test_mlocked(folio))) {
> + long nr_pages = folio_nr_pages(folio);
> +
> + __folio_clear_mlocked(folio);
> + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
> + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> + }
Why drop the useful comment?
> +
> if (unlikely(PageHWPoison(page)) && !order) {
> /* Do not let hwpoison pages hit pcplists/buddy */
> reset_page_owner(page, order);
> diff --git a/mm/swap.c b/mm/swap.c
> index 835bdf324b76..7cd0f4719423 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp,
> lruvec_del_folio(*lruvecp, folio);
> __folio_clear_lru_flags(folio);
> }
> -
> - /*
> - * In rare cases, when truncation or holepunching raced with
> - * munlock after VM_LOCKED was cleared, Mlocked may still be
> - * found set here. This does not indicate a problem, unless
> - * "unevictable_pgs_cleared" appears worryingly large.
> - */
> - if (unlikely(folio_test_mlocked(folio))) {
> - long nr_pages = folio_nr_pages(folio);
> -
> - __folio_clear_mlocked(folio);
> - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
> - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> - }
> }
>
> /*
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
2024-10-21 17:01 ` Vlastimil Babka
@ 2024-10-21 17:17 ` Roman Gushchin
0 siblings, 0 replies; 3+ messages in thread
From: Roman Gushchin @ 2024-10-21 17:17 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-mm, linux-kernel, stable, Hugh Dickins,
Matthew Wilcox
On Mon, Oct 21, 2024 at 07:01:59PM +0200, Vlastimil Babka wrote:
> On 10/21/24 18:48, Roman Gushchin wrote:
> > Syzbot reported [1] a bad page state problem caused by a page
> > being freed using free_page() still having a mlocked flag at
> > free_pages_prepare() stage:
> >
> > BUG: Bad page state in process syz.0.15 pfn:1137bb
> > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb
> > flags: 0x400000000080000(mlocked|node=0|zone=1)
> > raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000
> > raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000
> > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > page_owner tracks the page as allocated
> > page last allocated via order 0, migratetype Unmovable, gfp_mask
> > 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid
> > 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085
> > set_page_owner include/linux/page_owner.h:32 [inline]
> > post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
> > prep_new_page mm/page_alloc.c:1545 [inline]
> > get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457
> > __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733
> > alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265
> > kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
> > kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
> > kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline]
> > kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542
> > vfs_ioctl fs/ioctl.c:51 [inline]
> > __do_sys_ioctl fs/ioctl.c:907 [inline]
> > __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > page last free pid 951 tgid 951 stack trace:
> > reset_page_owner include/linux/page_owner.h:25 [inline]
> > free_pages_prepare mm/page_alloc.c:1108 [inline]
> > free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638
> > vfree+0x181/0x2e0 mm/vmalloc.c:3361
> > delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282
> > process_one_work kernel/workqueue.c:3229 [inline]
> > process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310
> > worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391
> > kthread+0x2df/0x370 kernel/kthread.c:389
> > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> >
> > The problem was originally introduced by
> > commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final
> > clearance"): it was handling focused on handling pagecache
> > and anonymous memory and wasn't suitable for lower level
> > get_page()/free_page() API's used for example by KVM, as with
> > this reproducer.
>
> Does that mean KVM is mlocking pages that are not pagecache nor anonymous,
> thus not LRU? How and why (and since when) is that done?
KVM allows to mmap and mlock several pages allocated directly.
Please, take a look at the reproducer:
https://syzkaller.appspot.com/x/repro.c?x=1437939f980000
>
> > Fix it by moving the mlocked flag clearance down to
> > free_page_prepare().
> >
> > The bug itself if fairly old and harmless (aside from generating these
> > warnings), so the stable backport is likely not justified.
>
> But since there's a Cc: stable below, it will be backported :)
My bad, I changed my mind in the last minute and added Cc: stable but
forgot to drop this sentence.
>
> > Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000
> > Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance")
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: <stable@vger.kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > ---
> > mm/page_alloc.c | 9 +++++++++
> > mm/swap.c | 14 --------------
> > 2 files changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bc55d39eb372..24200651ad92 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page,
> > bool skip_kasan_poison = should_skip_kasan_poison(page);
> > bool init = want_init_on_free();
> > bool compound = PageCompound(page);
> > + struct folio *folio = page_folio(page);
> >
> > VM_BUG_ON_PAGE(PageTail(page), page);
> >
> > @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page,
> > if (memcg_kmem_online() && PageMemcgKmem(page))
> > __memcg_kmem_uncharge_page(page, order);
> >
> > + if (unlikely(folio_test_mlocked(folio))) {
> > + long nr_pages = folio_nr_pages(folio);
> > +
> > + __folio_clear_mlocked(folio);
> > + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
> > + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> > + }
>
> Why drop the useful comment?
Agree. Sounds like I need to restore the comment, drop no stable backport
recommendation and send v2.
Thank you for taking a look!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-10-21 17:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-21 16:48 [PATCH] mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Roman Gushchin
2024-10-21 17:01 ` Vlastimil Babka
2024-10-21 17:17 ` Roman Gushchin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox