* [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file @ 2023-03-29 14:53 Ivan Orlov 2023-03-29 20:37 ` Andrew Morton 2023-03-29 21:53 ` Andrew Morton 0 siblings, 2 replies; 5+ messages in thread From: Ivan Orlov @ 2023-03-29 14:53 UTC (permalink / raw) To: akpm Cc: Ivan Orlov, linux-mm, linux-kernel, himadrispandya, skhan, linux-kernel-mentees, syzbot+9578faa5475acb35fa50 Syzkaller reported the following issue: kernel BUG at mm/khugepaged.c:1823! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023 RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline] RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233 Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89 RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093 RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80 RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1 RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000 R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000 FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693 madvise_vma_behavior mm/madvise.c:1086 [inline] madvise_walk_vmas mm/madvise.c:1260 [inline] do_madvise+0x9e5/0x4680 mm/madvise.c:1439 __do_sys_madvise mm/madvise.c:1452 [inline] __se_sys_madvise mm/madvise.c:1450 [inline] __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The 'xas_store' call during page cache scanning can potentially translate 'xas' into the error state (with the reproducer provided by the syzkaller the error code is -ENOMEM). However, there are no further checks after the 'xas_store', and the next call of 'xas_next' at the start of the scanning cycle doesn't increase the xa_index, and the issue occurs. This patch will add the xarray state error checking after the 'xas_store' and the corresponding result error code. Tested via syzbot. Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> --- mm/khugepaged.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 92e6f56a932d..4d9850d9ea7f 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -55,6 +55,7 @@ enum scan_result { SCAN_CGROUP_CHARGE_FAIL, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, + SCAN_STORE_FAILED, }; #define CREATE_TRACE_POINTS @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, goto xa_locked; } xas_store(&xas, hpage); + if (xas_error(&xas)) { + /* revert shmem_charge performed + * in the previous condition + */ + mapping->nrpages--; + shmem_uncharge(mapping->host, 1); + result = SCAN_STORE_FAILED; + goto xa_locked; + } nr_none++; continue; } -- 2.34.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file 2023-03-29 14:53 [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov @ 2023-03-29 20:37 ` Andrew Morton 2023-03-29 21:53 ` Andrew Morton 1 sibling, 0 replies; 5+ messages in thread From: Andrew Morton @ 2023-03-29 20:37 UTC (permalink / raw) To: Ivan Orlov Cc: linux-mm, linux-kernel, himadrispandya, skhan, linux-kernel-mentees, syzbot+9578faa5475acb35fa50, Song Liu, Rik van Riel, Kirill A. Shutemov, Johannes Weiner On Wed, 29 Mar 2023 18:53:30 +0400 Ivan Orlov <ivan.orlov0322@gmail.com> wrote: > Syzkaller reported the following issue: > > kernel BUG at mm/khugepaged.c:1823! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023 > RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline] > RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233 > Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89 > RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093 > RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80 > RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1 > RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3 > R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000 > R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000 > FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693 > madvise_vma_behavior mm/madvise.c:1086 [inline] > madvise_walk_vmas mm/madvise.c:1260 [inline] > do_madvise+0x9e5/0x4680 mm/madvise.c:1439 > __do_sys_madvise mm/madvise.c:1452 [inline] > __se_sys_madvise mm/madvise.c:1450 [inline] > __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > The 'xas_store' call during page cache scanning can potentially > translate 'xas' into the error state (with the reproducer provided > by the syzkaller the error code is -ENOMEM). However, there are no > further checks after the 'xas_store', and the next call of 'xas_next' > at the start of the scanning cycle doesn't increase the xa_index, > and the issue occurs. > > This patch will add the xarray state error checking after the > 'xas_store' and the corresponding result error code. > Thanks. We'll want a Fixes: for this to go with a cc:stable. So when did we break this? I'm thinking 99cb0dbd47a15 ("mm,thp: add read-only THP support for (non-shmem) FS"), which did + if (!page) { + /* + * Stop if extent has been truncated or + * hole-punched, and is now completely + * empty. + */ + if (index == start) { + if (!xas_next_entry(&xas, end - 1)) { + result = SCAN_TRUNCATED; + goto xa_locked; + } + xas_set(&xas, index); + } + if (!shmem_charge(mapping->host, 1)) { + result = SCAN_FAIL; goto xa_locked; } - xas_set(&xas, index); + xas_store(&xas, new_page); + nr_none++; + continue; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file 2023-03-29 14:53 [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov 2023-03-29 20:37 ` Andrew Morton @ 2023-03-29 21:53 ` Andrew Morton 2023-03-30 0:14 ` Yang Shi 1 sibling, 1 reply; 5+ messages in thread From: Andrew Morton @ 2023-03-29 21:53 UTC (permalink / raw) To: Ivan Orlov Cc: linux-mm, linux-kernel, himadrispandya, skhan, linux-kernel-mentees, syzbot+9578faa5475acb35fa50 On Wed, 29 Mar 2023 18:53:30 +0400 Ivan Orlov <ivan.orlov0322@gmail.com> wrote: > Syzkaller reported the following issue: > > ... > > The 'xas_store' call during page cache scanning can potentially > translate 'xas' into the error state (with the reproducer provided > by the syzkaller the error code is -ENOMEM). However, there are no > further checks after the 'xas_store', and the next call of 'xas_next' > at the start of the scanning cycle doesn't increase the xa_index, > and the issue occurs. > > This patch will add the xarray state error checking after the > 'xas_store' and the corresponding result error code. > > Tested via syzbot. > > Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com > Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc > Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> > --- > mm/khugepaged.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 92e6f56a932d..4d9850d9ea7f 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -55,6 +55,7 @@ enum scan_result { > SCAN_CGROUP_CHARGE_FAIL, > SCAN_TRUNCATED, > SCAN_PAGE_HAS_PRIVATE, > + SCAN_STORE_FAILED, > }; > > #define CREATE_TRACE_POINTS > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > goto xa_locked; > } > xas_store(&xas, hpage); > + if (xas_error(&xas)) { > + /* revert shmem_charge performed > + * in the previous condition > + */ > + mapping->nrpages--; > + shmem_uncharge(mapping->host, 1); > + result = SCAN_STORE_FAILED; > + goto xa_locked; > + } > nr_none++; > continue; > } Needs this, I assume. --- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file-fix +++ a/include/trace/events/huge_memory.h @@ -36,7 +36,8 @@ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EMe(SCAN_STORE_FAILED, "store_failed") #undef EM #undef EMe _ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file 2023-03-29 21:53 ` Andrew Morton @ 2023-03-30 0:14 ` Yang Shi 2023-03-30 0:58 ` Zach O'Keefe 0 siblings, 1 reply; 5+ messages in thread From: Yang Shi @ 2023-03-30 0:14 UTC (permalink / raw) To: Andrew Morton, Zach O'Keefe Cc: Ivan Orlov, linux-mm, linux-kernel, himadrispandya, skhan, linux-kernel-mentees, syzbot+9578faa5475acb35fa50 On Wed, Mar 29, 2023 at 2:53 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 29 Mar 2023 18:53:30 +0400 Ivan Orlov <ivan.orlov0322@gmail.com> wrote: > > > Syzkaller reported the following issue: > > > > ... > > > > The 'xas_store' call during page cache scanning can potentially > > translate 'xas' into the error state (with the reproducer provided > > by the syzkaller the error code is -ENOMEM). However, there are no > > further checks after the 'xas_store', and the next call of 'xas_next' > > at the start of the scanning cycle doesn't increase the xa_index, > > and the issue occurs. > > > > This patch will add the xarray state error checking after the > > 'xas_store' and the corresponding result error code. > > > > Tested via syzbot. > > > > Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com > > Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc > > Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> > > --- > > mm/khugepaged.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 92e6f56a932d..4d9850d9ea7f 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -55,6 +55,7 @@ enum scan_result { > > SCAN_CGROUP_CHARGE_FAIL, > > SCAN_TRUNCATED, > > SCAN_PAGE_HAS_PRIVATE, > > + SCAN_STORE_FAILED, > > }; > > > > #define CREATE_TRACE_POINTS > > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > goto xa_locked; > > } > > xas_store(&xas, hpage); > > + if (xas_error(&xas)) { > > + /* revert shmem_charge performed > > + * in the previous condition > > + */ > > + mapping->nrpages--; > > + shmem_uncharge(mapping->host, 1); > > + result = SCAN_STORE_FAILED; > > + goto xa_locked; > > + } > > nr_none++; > > continue; > > } > > Needs this, I assume. > > --- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file-fix > +++ a/include/trace/events/huge_memory.h > @@ -36,7 +36,8 @@ > EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ > EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ > EM( SCAN_TRUNCATED, "truncated") \ > - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > + EMe(SCAN_STORE_FAILED, "store_failed") I'm a little bit reluctant to make the error code list longer, can we just return SCAN_FAIL? IIUC this issue should happen very rarely, maybe not worth a new error code. Basically the rollback approach makes sense to me. IIRC Zach was looking into the same problem, loop him in. He may share some thoughts. > > #undef EM > #undef EMe > _ > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file 2023-03-30 0:14 ` Yang Shi @ 2023-03-30 0:58 ` Zach O'Keefe 0 siblings, 0 replies; 5+ messages in thread From: Zach O'Keefe @ 2023-03-30 0:58 UTC (permalink / raw) To: Yang Shi Cc: Andrew Morton, Ivan Orlov, linux-mm, linux-kernel, himadrispandya, skhan, linux-kernel-mentees, syzbot+9578faa5475acb35fa50 On Wed, Mar 29, 2023 at 5:14 PM Yang Shi <shy828301@gmail.com> wrote: > > On Wed, Mar 29, 2023 at 2:53 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Wed, 29 Mar 2023 18:53:30 +0400 Ivan Orlov <ivan.orlov0322@gmail.com> wrote: > > > > > Syzkaller reported the following issue: > > > > > > ... > > > > > > The 'xas_store' call during page cache scanning can potentially > > > translate 'xas' into the error state (with the reproducer provided > > > by the syzkaller the error code is -ENOMEM). However, there are no > > > further checks after the 'xas_store', and the next call of 'xas_next' > > > at the start of the scanning cycle doesn't increase the xa_index, > > > and the issue occurs. > > > > > > This patch will add the xarray state error checking after the > > > 'xas_store' and the corresponding result error code. > > > > > > Tested via syzbot. > > > > > > Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com > > > Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc > > > Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> > > > --- > > > mm/khugepaged.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index 92e6f56a932d..4d9850d9ea7f 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -55,6 +55,7 @@ enum scan_result { > > > SCAN_CGROUP_CHARGE_FAIL, > > > SCAN_TRUNCATED, > > > SCAN_PAGE_HAS_PRIVATE, > > > + SCAN_STORE_FAILED, > > > }; > > > > > > #define CREATE_TRACE_POINTS > > > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > > goto xa_locked; > > > } > > > xas_store(&xas, hpage); > > > + if (xas_error(&xas)) { > > > + /* revert shmem_charge performed > > > + * in the previous condition > > > + */ > > > + mapping->nrpages--; > > > + shmem_uncharge(mapping->host, 1); > > > + result = SCAN_STORE_FAILED; > > > + goto xa_locked; > > > + } > > > nr_none++; > > > continue; > > > } > > > > Needs this, I assume. > > > > --- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file-fix > > +++ a/include/trace/events/huge_memory.h > > @@ -36,7 +36,8 @@ > > EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ > > EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ > > EM( SCAN_TRUNCATED, "truncated") \ > > - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > > + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > > + EMe(SCAN_STORE_FAILED, "store_failed") > > I'm a little bit reluctant to make the error code list longer, can we > just return SCAN_FAIL? IIUC this issue should happen very rarely, > maybe not worth a new error code. > > Basically the rollback approach makes sense to me. IIRC Zach was > looking into the same problem, loop him in. He may share some > thoughts. Thanks Yang, appreciate being brought into the loop. One of the things I plan to do during paternity leave is update my email filters so I don't miss things like this. Coincidentally, Hugh also just brought this to my attention. Looks to be the syzkaller report posted a few weeks ago[1]. Given there are two series munging with this path right now (or were), I was trying to find time to first review said series, then post a fix on top, if necessary (or it could have been incorporated into David Stevens' "mm/khugepaged: fix khugepaged+shmem races" series). But, I'm perennially behind and haven't been able to find time to do those reviews, and so my "fix" attempt has sat. Thanks, Ivan, for picking up the slack. So, I did test this patch with the syzbot reproducer, and everything looked good :) Thank you. I have similar reservations about increasing the error code list longer, unless there is opportunity to combine other failure sites under a common umbrella. For example, I was debating if a SCAN_OOM error was worthy of inclusion, which we could use in __collapse_huge_page_swapin() on VM_FAULT_OOM. I personally went the route of saying, "no, just use SCAN_FAIL". There also ought to be some comments, somewhere (either in code, or commit description) about why this is the only xas_store() site that deserves special error handling. I was planning on suggesting to sprinkle in a few VM_BUG_ON()'s after some of these sites, with a comment, just in case the implementation of xarray changes and operations which previously didn't require allocating memory now do so. At least to me, it took work to sort it out, so I don't think it's obvious. Now, as mentioned, I'm headed on paternity leave starting Friday, until July 12. So, if there is a v2, I'm likely to miss it, and even Cc'ing me isn't likely to get a response :) As such, feel free to have my Tested-by: Zach O'Keefe <zokeefe@google.com> now, since I've validated it works. My understanding is that no other callsites need attention, so I believe this bug is "fixed" -- all that remains is dealing with the error codes, comments, assertions, etc. Thanks again, Ivan, Best, Zach [1] https://lore.kernel.org/linux-mm/000000000000226a6105f6954b47@google.com/ > > > > #undef EM > > #undef EMe > > _ > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-30 0:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-29 14:53 [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov 2023-03-29 20:37 ` Andrew Morton 2023-03-29 21:53 ` Andrew Morton 2023-03-30 0:14 ` Yang Shi 2023-03-30 0:58 ` Zach O'Keefe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox