* [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails @ 2017-03-29 21:08 Mike Kravetz 2017-03-29 21:17 ` Andrew Morton 2017-04-10 21:38 ` Vegard Nossum 0 siblings, 2 replies; 6+ messages in thread From: Mike Kravetz @ 2017-03-29 21:08 UTC (permalink / raw) To: Andrew Morton, linux-mm, linux-kernel Cc: Dmitry Vyukov, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi, Mike Kravetz Resending because of typo in Andrew's e-mail when first sent Changes to hugetlbfs reservation maps is a two step process. The first step is a call to region_chg to determine what needs to be changed, and prepare that change. This should be followed by a call to call to region_add to commit the change, or region_abort to abort the change. The error path in hugetlb_reserve_pages called region_abort after a failed call to region_chg. As a result, the adds_in_progress counter in the reservation map is off by 1. This is caught by a VM_BUG_ON in resv_map_release when the reservation map is freed. syzkaller fuzzer found this bug, that resulted in the following: kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 evict+0x481/0x920 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x62b/0xa20 fs/inode.c:1542 hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 newseg+0x422/0xd30 ipc/shm.c:575 ipcget_new ipc/util.c:285 [inline] ipcget+0x21e/0x580 ipc/util.c:639 SYSC_shmget ipc/shm.c:673 [inline] SyS_shmget+0x158/0x230 ipc/shm.c:657 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> --- mm/hugetlb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c7025c1..c65d45c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode, return 0; out_err: if (!vma || vma->vm_flags & VM_MAYSHARE) - region_abort(resv_map, from, to); + /* Don't call region_abort if region_chg failed */ + if (chg >= 0) + region_abort(resv_map, from, to); if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) kref_put(&resv_map->refs, resv_map_release); return ret; -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails 2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz @ 2017-03-29 21:17 ` Andrew Morton 2017-03-30 12:28 ` Dmitry Vyukov 2017-04-10 21:38 ` Vegard Nossum 1 sibling, 1 reply; 6+ messages in thread From: Andrew Morton @ 2017-03-29 21:17 UTC (permalink / raw) To: Mike Kravetz Cc: linux-mm, linux-kernel, Dmitry Vyukov, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: > Resending because of typo in Andrew's e-mail when first sent > > Changes to hugetlbfs reservation maps is a two step process. The first > step is a call to region_chg to determine what needs to be changed, and > prepare that change. This should be followed by a call to call to > region_add to commit the change, or region_abort to abort the change. > > The error path in hugetlb_reserve_pages called region_abort after a > failed call to region_chg. As a result, the adds_in_progress counter > in the reservation map is off by 1. This is caught by a VM_BUG_ON > in resv_map_release when the reservation map is freed. > > syzkaller fuzzer found this bug, that resulted in the following: I'll change the above to : syzkaller fuzzer (when using an injected kmalloc failure) found this bug, : that resulted in the following: it's important, because this bug won't be triggered (at all easily, at least) in real-world workloads. > kernel BUG at mm/hugetlb.c:742! > Call Trace: > hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 > evict+0x481/0x920 fs/inode.c:553 > iput_final fs/inode.c:1515 [inline] > iput+0x62b/0xa20 fs/inode.c:1542 > hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 > newseg+0x422/0xd30 ipc/shm.c:575 > ipcget_new ipc/util.c:285 [inline] > ipcget+0x21e/0x580 ipc/util.c:639 > SYSC_shmget ipc/shm.c:673 [inline] > SyS_shmget+0x158/0x230 ipc/shm.c:657 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails 2017-03-29 21:17 ` Andrew Morton @ 2017-03-30 12:28 ` Dmitry Vyukov 2017-03-30 20:20 ` Mike Kravetz 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Vyukov @ 2017-03-30 12:28 UTC (permalink / raw) To: Andrew Morton Cc: Mike Kravetz, linux-mm, LKML, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: > >> Resending because of typo in Andrew's e-mail when first sent >> >> Changes to hugetlbfs reservation maps is a two step process. The first >> step is a call to region_chg to determine what needs to be changed, and >> prepare that change. This should be followed by a call to call to >> region_add to commit the change, or region_abort to abort the change. >> >> The error path in hugetlb_reserve_pages called region_abort after a >> failed call to region_chg. As a result, the adds_in_progress counter >> in the reservation map is off by 1. This is caught by a VM_BUG_ON >> in resv_map_release when the reservation map is freed. >> >> syzkaller fuzzer found this bug, that resulted in the following: > > I'll change the above to > > : syzkaller fuzzer (when using an injected kmalloc failure) found this bug, > : that resulted in the following: > > it's important, because this bug won't be triggered (at all easily, at > least) in real-world workloads. I wonder if memory-constrained cgroups make such bugs much easier to trigger. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails 2017-03-30 12:28 ` Dmitry Vyukov @ 2017-03-30 20:20 ` Mike Kravetz 0 siblings, 0 replies; 6+ messages in thread From: Mike Kravetz @ 2017-03-30 20:20 UTC (permalink / raw) To: Dmitry Vyukov, Andrew Morton Cc: linux-mm, LKML, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi On 03/30/2017 05:28 AM, Dmitry Vyukov wrote: > On Wed, Mar 29, 2017 at 11:17 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: >> On Wed, 29 Mar 2017 14:08:02 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >>> >>> syzkaller fuzzer found this bug, that resulted in the following: >> >> I'll change the above to >> >> : syzkaller fuzzer (when using an injected kmalloc failure) found this bug, >> : that resulted in the following: >> >> it's important, because this bug won't be triggered (at all easily, at >> least) in real-world workloads. > > I wonder if memory-constrained cgroups make such bugs much easier to trigger. > I think you might expose some bugs with memory-constrained cgroups. However, it is unlikely you could trigger this bug using that method. In this bug the injected kmalloc failure was for a 32 byte allocation. My guess is that it would be very very unlikely/lucky to have the allocations done by other routines on the stack succeed, and have this 32 byte allocation fail. -- Mike Kravetz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails 2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz 2017-03-29 21:17 ` Andrew Morton @ 2017-04-10 21:38 ` Vegard Nossum 2017-04-10 23:37 ` Mike Kravetz 1 sibling, 1 reply; 6+ messages in thread From: Vegard Nossum @ 2017-04-10 21:38 UTC (permalink / raw) To: Mike Kravetz Cc: Andrew Morton, Linux Memory Management List, LKML, Dmitry Vyukov, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi On 29 March 2017 at 23:08, Mike Kravetz <mike.kravetz@oracle.com> wrote: > Changes to hugetlbfs reservation maps is a two step process. The first > step is a call to region_chg to determine what needs to be changed, and > prepare that change. This should be followed by a call to call to > region_add to commit the change, or region_abort to abort the change. > > The error path in hugetlb_reserve_pages called region_abort after a > failed call to region_chg. As a result, the adds_in_progress counter > in the reservation map is off by 1. This is caught by a VM_BUG_ON > in resv_map_release when the reservation map is freed. > > syzkaller fuzzer found this bug, that resulted in the following: > > kernel BUG at mm/hugetlb.c:742! > Call Trace: > hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 > evict+0x481/0x920 fs/inode.c:553 > iput_final fs/inode.c:1515 [inline] > iput+0x62b/0xa20 fs/inode.c:1542 > hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 > newseg+0x422/0xd30 ipc/shm.c:575 > ipcget_new ipc/util.c:285 [inline] > ipcget+0x21e/0x580 ipc/util.c:639 > SYSC_shmget ipc/shm.c:673 [inline] > SyS_shmget+0x158/0x230 ipc/shm.c:657 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> > --- > mm/hugetlb.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c7025c1..c65d45c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode, > return 0; > out_err: > if (!vma || vma->vm_flags & VM_MAYSHARE) > - region_abort(resv_map, from, to); > + /* Don't call region_abort if region_chg failed */ > + if (chg >= 0) > + region_abort(resv_map, from, to); > if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > kref_put(&resv_map->refs, resv_map_release); > return ret; Hi guys, I'm running into this on latest linus/master: kernel BUG at mm/hugetlb.c:742! invalid opcode: 0000 [#1] SMP KASAN CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff880064f30dc0 task.stack: ffff880065b38000 RIP: 0010:resv_map_release+0x1cb/0x270 RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287 RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000 RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410 RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418 FS: 00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0 Call Trace: hugetlbfs_evict_inode+0x80/0xa0 ? hugetlbfs_setattr+0x3c0/0x3c0 evict+0x24a/0x620 iput+0x48f/0x8c0 dentry_unlink_inode+0x31f/0x4d0 __dentry_kill+0x292/0x5e0 dput+0x730/0x830 __fput+0x438/0x720 ____fput+0x1a/0x20 task_work_run+0xfe/0x180 exit_to_usermode_loop+0x133/0x150 syscall_return_slowpath+0x184/0x1c0 entry_SYSCALL_64_fastpath+0xab/0xad To reproduce: mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL); Curiously enough, it's the patch from this thread (i.e. commit ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it, according to git bisect. Reverting the commit from linus/master fixes the problem. Also found by syzcaller (no fault injections this time). Vegard -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails 2017-04-10 21:38 ` Vegard Nossum @ 2017-04-10 23:37 ` Mike Kravetz 0 siblings, 0 replies; 6+ messages in thread From: Mike Kravetz @ 2017-04-10 23:37 UTC (permalink / raw) To: Vegard Nossum Cc: Andrew Morton, Linux Memory Management List, LKML, Dmitry Vyukov, Hillf Danton, Michal Hocko, Kirill A . Shutemov, Andrey Ryabinin, Naoya Horiguchi On 04/10/2017 02:38 PM, Vegard Nossum wrote: > On 29 March 2017 at 23:08, Mike Kravetz <mike.kravetz@oracle.com> wrote: >> Changes to hugetlbfs reservation maps is a two step process. The first >> step is a call to region_chg to determine what needs to be changed, and >> prepare that change. This should be followed by a call to call to >> region_add to commit the change, or region_abort to abort the change. >> >> The error path in hugetlb_reserve_pages called region_abort after a >> failed call to region_chg. As a result, the adds_in_progress counter >> in the reservation map is off by 1. This is caught by a VM_BUG_ON >> in resv_map_release when the reservation map is freed. >> >> syzkaller fuzzer found this bug, that resulted in the following: >> >> kernel BUG at mm/hugetlb.c:742! >> Call Trace: >> hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 >> evict+0x481/0x920 fs/inode.c:553 >> iput_final fs/inode.c:1515 [inline] >> iput+0x62b/0xa20 fs/inode.c:1542 >> hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 >> newseg+0x422/0xd30 ipc/shm.c:575 >> ipcget_new ipc/util.c:285 [inline] >> ipcget+0x21e/0x580 ipc/util.c:639 >> SYSC_shmget ipc/shm.c:673 [inline] >> SyS_shmget+0x158/0x230 ipc/shm.c:657 >> entry_SYSCALL_64_fastpath+0x1f/0xc2 >> RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 >> >> Reported-by: Dmitry Vyukov <dvyukov@google.com> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> >> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> >> --- >> mm/hugetlb.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index c7025c1..c65d45c 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -4233,7 +4233,9 @@ int hugetlb_reserve_pages(struct inode *inode, >> return 0; >> out_err: >> if (!vma || vma->vm_flags & VM_MAYSHARE) >> - region_abort(resv_map, from, to); >> + /* Don't call region_abort if region_chg failed */ >> + if (chg >= 0) >> + region_abort(resv_map, from, to); >> if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) >> kref_put(&resv_map->refs, resv_map_release); >> return ret; > > Hi guys, > > I'm running into this on latest linus/master: > > kernel BUG at mm/hugetlb.c:742! > invalid opcode: 0000 [#1] SMP KASAN > CPU: 3 PID: 20281 Comm: syz-executor0 Not tainted 4.11.0-rc6 #335 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > task: ffff880064f30dc0 task.stack: ffff880065b38000 > RIP: 0010:resv_map_release+0x1cb/0x270 > RSP: 0018:ffff880065b3fc38 EFLAGS: 00010287 > RAX: 0000000000010000 RBX: ffff88006b5fe418 RCX: ffffc90001b52000 > RDX: 00000000000005de RSI: ffffffff8172026b RDI: ffff88006b5fe410 > RBP: ffff880065b3fc78 R08: ffff880065b3f958 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 > R13: ffff88006b5fe418 R14: ffff88006b5fe418 R15: ffff88006b5fe418 > FS: 00007f21647c5700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000460750 CR3: 000000005d123000 CR4: 00000000000006e0 > Call Trace: > hugetlbfs_evict_inode+0x80/0xa0 > ? hugetlbfs_setattr+0x3c0/0x3c0 > evict+0x24a/0x620 > iput+0x48f/0x8c0 > dentry_unlink_inode+0x31f/0x4d0 > __dentry_kill+0x292/0x5e0 > dput+0x730/0x830 > __fput+0x438/0x720 > ____fput+0x1a/0x20 > task_work_run+0xfe/0x180 > exit_to_usermode_loop+0x133/0x150 > syscall_return_slowpath+0x184/0x1c0 > entry_SYSCALL_64_fastpath+0xab/0xad > > To reproduce: > > mmap(0, 0x2000, 0, 0x40031, 0xffffffffffffffffULL, 0x8000000000000000ULL); > > Curiously enough, it's the patch from this thread (i.e. commit > ff8c0c53c47530ffea82c22a0a6df6332b56c957) that introduces it, > according to git bisect. Reverting the commit from linus/master fixes > the problem. Thanks for finding this. I do not think commit ff8c0c53 is the root cause of this BUG/issue. Due to the very high offset (0x8000000000000000ULL) passed to mmap, there is some overflow and/or truncation of values happening before getting to the hugetlbfs reservation code. The routine hugetlb_reserve_pages() is passed a negative page offset value (from=4398046511104, to=-4398046511103). Bad!!! The routine region_chg() takes these values to determine how many reservations are needed and calculates/returns a negative value. This appears as an error. So, the code from commit ff8c0c53 prevents the call to region_abort(), adds_in_progress does not get decremented and we hit the BUG. We should have never calculated and acted upon negative page offsets. It was just 'lucky' that things appeared to work before this commit. I have not yet determined all the things that could have gone wrong when passing around these incorrect values. I believe commit ff8c0c53 should remain. I will start working on a fix to this overflow and/or truncation of page offsets. -- Mike Kravetz > > Also found by syzcaller (no fault injections this time). > > > Vegard > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-10 23:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-29 21:08 [PATCH RESEND] mm/hugetlb: Don't call region_abort if region_chg fails Mike Kravetz 2017-03-29 21:17 ` Andrew Morton 2017-03-30 12:28 ` Dmitry Vyukov 2017-03-30 20:20 ` Mike Kravetz 2017-04-10 21:38 ` Vegard Nossum 2017-04-10 23:37 ` Mike Kravetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox