From: Miaohe Lin <linmiaohe@huawei.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>, Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 0/5] Remove some races around folio_test_hugetlb
Date: Mon, 4 Mar 2024 17:09:58 +0800 [thread overview]
Message-ID: <a9b40ae3-6e2d-56bb-ba75-8cfd2ace4b33@huawei.com> (raw)
In-Reply-To: <20240301214712.2853147-1-willy@infradead.org>
On 2024/3/2 5:47, Matthew Wilcox (Oracle) wrote:
> Oscar and I have been exchanging a bit of email recently about the
> bug reported here:
> https://lore.kernel.org/all/ZXNhGsX32y19a2Xv@casper.infradead.org
Thanks for your patch.
>
> I've come to the conclusion that folio_test_hugetlb() is just too fragile
> as it can give both false positives and false negatives, as well as
> resulting in the above bug. With this patch series, it becomes a lot
> more robust. In the memory-failure case, we always hold the hugetlb_lock
> so it's perfectly reliable. In the compaction caase, it's unreliable, but
> the failures are acceptable and we recheck after taking the hugetlb_lock.
I encountered similar issues with PageSwapCache check when doing memory-failure test:
[66258.945079] page:00000000135e1205 refcount:1 mapcount:0 mapping:0000000000000000 index:0x9b pfn:0xa04e9a
[66258.949096] head:0000000038449724 order:9 entire_mapcount:1 nr_pages_mapped:0 pincount:0
[66258.949485] memcg:ffff95fb43379000
[66258.950334] anon flags: 0x6fffc00000a0068(uptodate|lru|head|mappedtodisk|swapbacked|node=1|zone=2|lastcpupid=0x3fff)
[66258.951212] page_type: 0xffffffff()
[66258.951882] raw: 06fffc0000000000 ffffc89628138001 dead000000000122 dead000000000400
[66258.952273] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[66258.952884] head: 06fffc00000a0068 ffffc896218a8008 ffffc89621680008 ffff95fb4349c439
[66258.953239] head: 0000000700000600 0000000000000000 00000001ffffffff ffff95fb43379000
[66258.953725] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
[66258.954497] ------------[ cut here ]------------
[66258.954937] kernel BUG at include/linux/page-flags.h:313!
[66258.956502] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[66258.957001] CPU: 14 PID: 174237 Comm: page-types Kdump: loaded Not tainted 6.8.0-rc1-00162-gd162e170f118 #11
[66258.957001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[66258.958415] RIP: 0010:folio_flags.constprop.0+0x1c/0x50
[66258.958415] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 8b 57 08 48 89 f8 83 e2 01 74 12 48 c7 c6 a0 59 34 a7 48 89 c7 e8 b5 60 e8 ff 90 <0f> 0b 66 90 c3 cc cc cc cc f7 c7 ff 0f 00 00 75 1a 48 8b 17 83 e2
[66258.958415] RSP: 0018:ffffa0f38ae53e00 EFLAGS: 00000282
[66258.958415] RAX: 0000000000000033 RBX: 0000000000000000 RCX: ffff96031fd9c9c8
[66258.958415] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff96031fd9c9c0
[66258.958415] RBP: ffffc8962813a680 R08: ffffffffa7756f88 R09: 0000000000009ffb
[66258.962155] R10: 000000000000054a R11: ffffffffa7726fa0 R12: 06fffc0000000000
[66258.962155] R13: 0000000000000000 R14: 00007fff93bf1348 R15: 0000000000a04e9a
[66258.962155] FS: 00007f47cc5c4740(0000) GS:ffff96031fd80000(0000) knlGS:0000000000000000
[66258.962155] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66258.962155] CR2: 00007fff93c7b000 CR3: 0000000850c28000 CR4: 00000000000006f0
[66258.962155] Call Trace:
[66258.962155] <TASK>
[66258.965730] ? die+0x32/0x90
[66258.965730] ? do_trap+0xdf/0x110
[66258.965730] ? folio_flags.constprop.0+0x1c/0x50
[66258.965730] ? do_error_trap+0x8b/0x110
[66258.965730] ? folio_flags.constprop.0+0x1c/0x50
[66258.965730] ? folio_flags.constprop.0+0x1c/0x50
[66258.965730] ? exc_invalid_op+0x53/0x70
[66258.965730] ? folio_flags.constprop.0+0x1c/0x50
[66258.965730] ? asm_exc_invalid_op+0x1a/0x20
[66258.965730] ? folio_flags.constprop.0+0x1c/0x50
[66258.965730] stable_page_flags+0x210/0x940
[66258.965730] kpageflags_read+0x97/0xf0
[66258.965730] vfs_read+0xa0/0x370
[66258.965730] __x64_sys_pread64+0x90/0xc0
[66258.965730] do_syscall_64+0xcd/0x1e0
[66258.965730] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[66258.965730] RIP: 0033:0x7f47cc31274a
[66258.969711] Code: 44 24 78 00 00 00 00 e9 2b f1 ff ff 0f 1f 40 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[66258.969711] RSP: 002b:00007fff93af1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
[66258.969711] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f47cc31274a
[66258.969711] RDX: 0000000000000008 RSI: 00007fff93bf1340 RDI: 0000000000000004
[66258.969711] RBP: 00007fff93af12e0 R08: 0000000000000001 R09: 8100000000a04e99
[66258.969711] R10: 00000000050274d0 R11: 0000000000000246 R12: 00007fff93cf1588
[66258.972680] R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f47cc609040
[66258.972680] </TASK>
[66258.972680] Modules linked in: mce_inject hwpoison_inject
After debugging, I think below race leads to the above panic:
CPU1 CPU2
kpageflags_read
stable_page_flags
PageSwapCache() check 4k page without page refcnt held
folio_test_swapcache(page_folio(page));
folio_test_swapbacked(folio) && /* page is swapbacked. */
page is freed into buddy and merged into larger order.
page is allocated as THP tail page.
test_bit(PG_swapcache, folio_flags(folio, 0)); /* BUG_ON PageTail check in folio_flags. It's tail page now! */
So the PageSwapCache test is fragile too. Any thought on how to fix this 'similar' issue?
Thanks.
>
> The cost of this reliability is that we now consume the word I recently
> freed in folio->page[1]. I think this is acceptable; we've still gained
> a completely reliable folio_test_hugetlb() (which we didn't have before
> I started messing around with the folio dtors). Non-hugetlb users
> can use large_id as a pointer to something else entirely, or even as a
> non-pointer, as long as they can guarantee it can't conflict (ie don't
> use it as a bitfield).
>
> So far, this is working for me. Some stress testing would be appreciated.
>
> Matthew Wilcox (Oracle) (5):
> hugetlb: Make folio_test_hugetlb safer to call
> hugetlb: Add hugetlb_pfn_folio
> memory-failure: Use hugetlb_pfn_folio
> memory-failure: Reorganise get_huge_page_for_hwpoison()
> compaction: Use hugetlb_pfn_folio in isolate_migratepages_block
>
> include/linux/hugetlb.h | 13 ++-----
> include/linux/mm.h | 8 -----
> include/linux/mm_types.h | 4 ++-
> include/linux/page-flags.h | 25 +++----------
> kernel/vmcore_info.c | 3 +-
> mm/compaction.c | 16 ++++-----
> mm/huge_memory.c | 10 ++----
> mm/hugetlb.c | 72 +++++++++++++++++++++++++++++---------
> mm/memory-failure.c | 14 +++++---
> 9 files changed, 87 insertions(+), 78 deletions(-)
>
next prev parent reply other threads:[~2024-03-04 9:10 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 21:47 Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 1/5] hugetlb: Make folio_test_hugetlb safer to call Matthew Wilcox (Oracle)
2024-03-05 6:43 ` Oscar Salvador
2024-03-05 8:39 ` David Hildenbrand
2024-03-01 21:47 ` [PATCH 2/5] hugetlb: Add hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-05 6:58 ` Oscar Salvador
2024-03-01 21:47 ` [PATCH 3/5] memory-failure: Use hugetlb_pfn_folio Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 4/5] memory-failure: Reorganise get_huge_page_for_hwpoison() Matthew Wilcox (Oracle)
2024-03-01 21:47 ` [PATCH 5/5] compaction: Use hugetlb_pfn_folio in isolate_migratepages_block Matthew Wilcox (Oracle)
2024-03-04 9:09 ` Miaohe Lin [this message]
2024-03-04 17:08 ` [PATCH 0/5] Remove some races around folio_test_hugetlb Matthew Wilcox
2024-03-06 7:58 ` Miaohe Lin
2024-03-07 21:16 ` Matthew Wilcox
2024-03-05 9:10 ` David Hildenbrand
2024-03-05 20:35 ` Matthew Wilcox
2024-03-06 15:18 ` David Hildenbrand
2024-03-07 4:31 ` Matthew Wilcox
2024-03-07 9:20 ` David Hildenbrand
2024-03-07 21:14 ` Matthew Wilcox
2024-03-07 21:38 ` David Hildenbrand
2024-03-08 4:31 ` Matthew Wilcox
2024-03-08 8:46 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a9b40ae3-6e2d-56bb-ba75-8cfd2ace4b33@huawei.com \
--to=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox