* [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
@ 2024-05-11 3:54 Miaohe Lin
2024-05-14 21:14 ` Andrew Morton
2024-05-17 7:03 ` kernel test robot
0 siblings, 2 replies; 8+ messages in thread
From: Miaohe Lin @ 2024-05-11 3:54 UTC (permalink / raw)
To: akpm; +Cc: shy828301, nao.horiguchi, xuyu, linmiaohe, linux-mm, linux-kernel
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_page
without increasing the page refcnt. But then unpoison_memory() will
decrease the page refcnt unexpectly as it appears like a successfully
hwpoisoned page leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0)
when releasing huge_zero_page.
Fix this issue by marking huge_zero_page reserved. So unpoison_memory()
will skip this page. This will make it consistent with ZERO_PAGE case too.
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 89f58c7603b2..a605bc0437cd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ __SetPageReserved(zero_page);
WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
@@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
+ __ClearPageReserved(zero_page);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
--
2.33.0
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin @ 2024-05-14 21:14 ` Andrew Morton 2024-05-14 21:28 ` Yang Shi 2024-05-17 7:03 ` kernel test robot 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2024-05-14 21:14 UTC (permalink / raw) To: Miaohe Lin; +Cc: shy828301, nao.horiguchi, xuyu, linux-mm, linux-kernel On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > When I did memory failure tests recently, below panic occurs: > > kernel BUG at include/linux/mm.h:1135! > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 > > ... > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void) > __free_pages(zero_page, compound_order(zero_page)); > goto retry; > } > + __SetPageReserved(zero_page); > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); > > /* We take additional reference here. It will be put back by shrinker */ > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, > struct page *zero_page = xchg(&huge_zero_page, NULL); > BUG_ON(zero_page == NULL); > WRITE_ONCE(huge_zero_pfn, ~0UL); > + __ClearPageReserved(zero_page); > __free_pages(zero_page, compound_order(zero_page)); > return HPAGE_PMD_NR; > } This causes a bit of a mess when staged ahead of mm-stable. So to avoid disruption I staged it behind mm-stable. This means that when the -stable maintainers try to merge it, they will ask for a fixed up version for older kernels so you can please just send them this version. To facilitate this I added the below adjustment: (btw, shouldn't get_huge_zero_page() and shrink_huge_zero_page_scan() be renamed to *_folio_*?) From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-huge_memory-mark-huge_zero_page-reserved-fix Date: Tue May 14 01:53:37 PM PDT 2024 Update it for 5691753d73a2 ("mm: convert huge_zero_page to huge_zero_folio") Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved-fix +++ a/mm/huge_memory.c @@ -212,7 +212,7 @@ retry: folio_put(zero_folio); goto retry; } - __SetPageReserved(zero_page); + __folio_set_reserved(zero_folio); WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio)); /* We take additional reference here. It will be put back by shrinker */ @@ -265,7 +265,7 @@ static unsigned long shrink_huge_zero_pa struct folio *zero_folio = xchg(&huge_zero_folio, NULL); BUG_ON(zero_folio == NULL); WRITE_ONCE(huge_zero_pfn, ~0UL); - __ClearPageReserved(zero_page); + __folio_clear_reserved(zero_folio); folio_put(zero_folio); return HPAGE_PMD_NR; } _ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-14 21:14 ` Andrew Morton @ 2024-05-14 21:28 ` Yang Shi 2024-05-14 21:42 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Yang Shi @ 2024-05-14 21:28 UTC (permalink / raw) To: Andrew Morton; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > > > When I did memory failure tests recently, below panic occurs: > > > > kernel BUG at include/linux/mm.h:1135! > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 > > > > ... > > > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void) > > __free_pages(zero_page, compound_order(zero_page)); > > goto retry; > > } > > + __SetPageReserved(zero_page); > > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); > > > > /* We take additional reference here. It will be put back by shrinker */ > > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, > > struct page *zero_page = xchg(&huge_zero_page, NULL); > > BUG_ON(zero_page == NULL); > > WRITE_ONCE(huge_zero_pfn, ~0UL); > > + __ClearPageReserved(zero_page); > > __free_pages(zero_page, compound_order(zero_page)); > > return HPAGE_PMD_NR; > > } > > This causes a bit of a mess when staged ahead of mm-stable. So to > avoid disruption I staged it behind mm-stable. This means that when > the -stable maintainers try to merge it, they will ask for a fixed up > version for older kernels so you can please just send them this > version. Can you please drop this from mm-unstable since both I and David nack'ed a similar patch in another thread. https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/ Both patches actually do the same thing, just this one uses page, the other one uses folio. > > To facilitate this I added the below adjustment: > > (btw, shouldn't get_huge_zero_page() and shrink_huge_zero_page_scan() > be renamed to *_folio_*?) > > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm-huge_memory-mark-huge_zero_page-reserved-fix > Date: Tue May 14 01:53:37 PM PDT 2024 > > Update it for 5691753d73a2 ("mm: convert huge_zero_page to huge_zero_folio") > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Miaohe Lin <linmiaohe@huawei.com> > Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> > Cc: Xu Yu <xuyu@linux.alibaba.com> > Cc: Yang Shi <shy828301@gmail.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved-fix > +++ a/mm/huge_memory.c > @@ -212,7 +212,7 @@ retry: > folio_put(zero_folio); > goto retry; > } > - __SetPageReserved(zero_page); > + __folio_set_reserved(zero_folio); > WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio)); > > /* We take additional reference here. It will be put back by shrinker */ > @@ -265,7 +265,7 @@ static unsigned long shrink_huge_zero_pa > struct folio *zero_folio = xchg(&huge_zero_folio, NULL); > BUG_ON(zero_folio == NULL); > WRITE_ONCE(huge_zero_pfn, ~0UL); > - __ClearPageReserved(zero_page); > + __folio_clear_reserved(zero_folio); > folio_put(zero_folio); > return HPAGE_PMD_NR; > } > _ > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-14 21:28 ` Yang Shi @ 2024-05-14 21:42 ` Andrew Morton 2024-05-14 21:55 ` Yang Shi 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2024-05-14 21:42 UTC (permalink / raw) To: Yang Shi; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote: > On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > > > > > When I did memory failure tests recently, below panic occurs: > > > > > > kernel BUG at include/linux/mm.h:1135! > > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 > > > > > > ... > > > > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void) > > > __free_pages(zero_page, compound_order(zero_page)); > > > goto retry; > > > } > > > + __SetPageReserved(zero_page); > > > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); > > > > > > /* We take additional reference here. It will be put back by shrinker */ > > > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, > > > struct page *zero_page = xchg(&huge_zero_page, NULL); > > > BUG_ON(zero_page == NULL); > > > WRITE_ONCE(huge_zero_pfn, ~0UL); > > > + __ClearPageReserved(zero_page); > > > __free_pages(zero_page, compound_order(zero_page)); > > > return HPAGE_PMD_NR; > > > } > > > > This causes a bit of a mess when staged ahead of mm-stable. So to > > avoid disruption I staged it behind mm-stable. This means that when > > the -stable maintainers try to merge it, they will ask for a fixed up > > version for older kernels so you can please just send them this > > version. > > Can you please drop this from mm-unstable since both I and David > nack'ed a similar patch in another thread. > https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/ That appears to link to the incorrect email thread? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-14 21:42 ` Andrew Morton @ 2024-05-14 21:55 ` Yang Shi 2024-05-15 1:48 ` Miaohe Lin 0 siblings, 1 reply; 8+ messages in thread From: Yang Shi @ 2024-05-14 21:55 UTC (permalink / raw) To: Andrew Morton; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel On Tue, May 14, 2024 at 3:42 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote: > > > On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > > > > > > > When I did memory failure tests recently, below panic occurs: > > > > > > > > kernel BUG at include/linux/mm.h:1135! > > > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 > > > > > > > > ... > > > > > > > > --- a/mm/huge_memory.c > > > > +++ b/mm/huge_memory.c > > > > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void) > > > > __free_pages(zero_page, compound_order(zero_page)); > > > > goto retry; > > > > } > > > > + __SetPageReserved(zero_page); > > > > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); > > > > > > > > /* We take additional reference here. It will be put back by shrinker */ > > > > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, > > > > struct page *zero_page = xchg(&huge_zero_page, NULL); > > > > BUG_ON(zero_page == NULL); > > > > WRITE_ONCE(huge_zero_pfn, ~0UL); > > > > + __ClearPageReserved(zero_page); > > > > __free_pages(zero_page, compound_order(zero_page)); > > > > return HPAGE_PMD_NR; > > > > } > > > > > > This causes a bit of a mess when staged ahead of mm-stable. So to > > > avoid disruption I staged it behind mm-stable. This means that when > > > the -stable maintainers try to merge it, they will ask for a fixed up > > > version for older kernels so you can please just send them this > > > version. > > > > Can you please drop this from mm-unstable since both I and David > > nack'ed a similar patch in another thread. > > https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/ > > That appears to link to the incorrect email thread? I meant that patch is actually same with this one. Just used folio interface instead of page. I'm not sure why Miaohe posted two. Maybe target to different version. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-14 21:55 ` Yang Shi @ 2024-05-15 1:48 ` Miaohe Lin 0 siblings, 0 replies; 8+ messages in thread From: Miaohe Lin @ 2024-05-15 1:48 UTC (permalink / raw) To: Yang Shi, Andrew Morton; +Cc: nao.horiguchi, xuyu, linux-mm, linux-kernel On 2024/5/15 5:55, Yang Shi wrote: > On Tue, May 14, 2024 at 3:42 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> >> On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote: >> >>> On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote: >>>> >>>> On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: >>>> >>>>> When I did memory failure tests recently, below panic occurs: >>>>> >>>>> kernel BUG at include/linux/mm.h:1135! >>>>> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI >>>>> CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 >>>>> >>>>> ... >>>>> >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void) >>>>> __free_pages(zero_page, compound_order(zero_page)); >>>>> goto retry; >>>>> } >>>>> + __SetPageReserved(zero_page); >>>>> WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); >>>>> >>>>> /* We take additional reference here. It will be put back by shrinker */ >>>>> @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, >>>>> struct page *zero_page = xchg(&huge_zero_page, NULL); >>>>> BUG_ON(zero_page == NULL); >>>>> WRITE_ONCE(huge_zero_pfn, ~0UL); >>>>> + __ClearPageReserved(zero_page); >>>>> __free_pages(zero_page, compound_order(zero_page)); >>>>> return HPAGE_PMD_NR; >>>>> } >>>> >>>> This causes a bit of a mess when staged ahead of mm-stable. So to >>>> avoid disruption I staged it behind mm-stable. This means that when >>>> the -stable maintainers try to merge it, they will ask for a fixed up >>>> version for older kernels so you can please just send them this >>>> version. >>> >>> Can you please drop this from mm-unstable since both I and David >>> nack'ed a similar patch in another thread. >>> https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/ >> >> That appears to link to the incorrect email thread? > > I meant that patch is actually same with this one. Just used folio > interface instead of page. I'm not sure why Miaohe posted two. Maybe > target to different version. Sorry for causing confusion. These two patches really target to different branch. This patch is for mainline and another one for mm-unstable. Thanks both. . ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin 2024-05-14 21:14 ` Andrew Morton @ 2024-05-17 7:03 ` kernel test robot 2024-05-20 1:47 ` Miaohe Lin 1 sibling, 1 reply; 8+ messages in thread From: kernel test robot @ 2024-05-17 7:03 UTC (permalink / raw) To: Miaohe Lin Cc: oe-lkp, lkp, linux-mm, akpm, shy828301, nao.horiguchi, xuyu, linmiaohe, linux-kernel, oliver.sang Hello, kernel test robot noticed "kernel_BUG_at_include/linux/page-flags.h" on: commit: 8e6ff9c4aad2c677c53f70d9e193c35cbbafcb88 ("[PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved") url: https://github.com/intel-lab-lkp/linux/commits/Miaohe-Lin/mm-huge_memory-mark-huge_zero_page-reserved/20240511-115840 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git cf87f46fd34d6c19283d9625a7822f20d90b64a4 patch link: https://lore.kernel.org/all/20240511035435.1477004-1-linmiaohe@huawei.com/ patch subject: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved in testcase: trinity version: trinity-i386-abe9de86-1_20230429 with following parameters: runtime: 300s group: group-03 nr_groups: 5 compiler: gcc-13 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +------------------------------------------+------------+------------+ | | cf87f46fd3 | 8e6ff9c4aa | +------------------------------------------+------------+------------+ | kernel_BUG_at_include/linux/page-flags.h | 0 | 11 | | invalid_opcode:#[##] | 0 | 11 | | RIP:get_huge_zero_page | 0 | 11 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 11 | +------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202405171417.1bb0856a-lkp@intel.com [ 272.633454][ T3838] ------------[ cut here ]------------ [ 272.634362][ T3838] kernel BUG at include/linux/page-flags.h:540! [ 272.635422][ T3838] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 272.636518][ T3838] CPU: 0 PID: 3838 Comm: trinity-c2 Not tainted 6.9.0-rc7-00184-g8e6ff9c4aad2 #1 [ 272.638008][ T3838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 272.639707][ T3838] RIP: 0010:get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.650951][ T3838] Code: 04 02 00 00 00 65 ff 0d 0a 0b 75 7e 74 1e 65 48 ff 05 50 e5 74 7e e9 5c fe ff ff 48 c7 c6 a0 bb 52 83 48 89 df e8 64 89 f2 ff <0f> 0b 0f 1f 44 00 00 eb db 48 c7 c6 40 bc 52 83 48 89 df e8 4c 89 All code ======== 0: 04 02 add $0x2,%al 2: 00 00 add %al,(%rax) 4: 00 65 ff add %ah,-0x1(%rbp) 7: 0d 0a 0b 75 7e or $0x7e750b0a,%eax c: 74 1e je 0x2c e: 65 48 ff 05 50 e5 74 incq %gs:0x7e74e550(%rip) # 0x7e74e566 15: 7e 16: e9 5c fe ff ff jmp 0xfffffffffffffe77 1b: 48 c7 c6 a0 bb 52 83 mov $0xffffffff8352bba0,%rsi 22: 48 89 df mov %rbx,%rdi 25: e8 64 89 f2 ff call 0xfffffffffff2898e 2a:* 0f 0b ud2 <-- trapping instruction 2c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 31: eb db jmp 0xe 33: 48 c7 c6 40 bc 52 83 mov $0xffffffff8352bc40,%rsi 3a: 48 89 df mov %rbx,%rdi 3d: e8 .byte 0xe8 3e: 4c rex.WR 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 7: eb db jmp 0xffffffffffffffe4 9: 48 c7 c6 40 bc 52 83 mov $0xffffffff8352bc40,%rsi 10: 48 89 df mov %rbx,%rdi 13: e8 .byte 0xe8 14: 4c rex.WR 15: 89 .byte 0x89 [ 272.654159][ T3838] RSP: 0000:ffffc90001507988 EFLAGS: 00010246 [ 272.655196][ T3838] RAX: 0000000000000000 RBX: ffffea00068d8000 RCX: 0000000000000000 [ 272.656504][ T3838] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 272.657799][ T3838] RBP: ffff8881b94fdf08 R08: 0000000000000000 R09: 0000000000000000 [ 272.659142][ T3838] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00043fb100 [ 272.660533][ T3838] R13: ffff888109ecdb08 R14: ffffc90001507a70 R15: 0000000000100173 [ 272.661849][ T3838] FS: 0000000000000000(0000) GS:ffff8883af200000(0063) knlGS:00000000f7f1a280 [ 272.663303][ T3838] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 272.664430][ T3838] CR2: 00000000ff7fffff CR3: 00000001ba4ca000 CR4: 00000000000406b0 [ 272.665759][ T3838] Call Trace: [ 272.666325][ T3838] <TASK> [ 272.666824][ T3838] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) [ 272.667483][ T3838] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155) [ 272.668253][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.669128][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.669985][ T3838] ? do_error_trap (arch/x86/kernel/traps.c:176) [ 272.670793][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.671731][ T3838] ? handle_invalid_op (arch/x86/kernel/traps.c:214) [ 272.672551][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.673411][ T3838] ? exc_invalid_op (arch/x86/kernel/traps.c:267) [ 272.674224][ T3838] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 272.675064][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.679035][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) [ 272.679922][ T3838] mm_get_huge_zero_page (mm/huge_memory.c:235 (discriminator 1)) [ 272.680764][ T3838] do_huge_pmd_anonymous_page (mm/huge_memory.c:1020) [ 272.681694][ T3838] ? mt_find (include/linux/rcupdate.h:339 (discriminator 1) include/linux/rcupdate.h:814 (discriminator 1) lib/maple_tree.c:6954 (discriminator 1)) [ 272.682432][ T3838] __handle_mm_fault (mm/memory.c:5175 mm/memory.c:5412) [ 272.683260][ T3838] ? handle_pte_fault (mm/memory.c:5352) [ 272.684150][ T3838] ? find_vma (mm/mmap.c:1889) [ 272.684891][ T3838] ? vma_link_file (mm/mmap.c:1889) [ 272.685676][ T3838] ? handle_mm_fault (mm/memory.c:5576) [ 272.686461][ T3838] handle_mm_fault (mm/memory.c:5466 mm/memory.c:5622) [ 272.687265][ T3838] do_user_addr_fault (arch/x86/mm/fault.c:1384) [ 272.688116][ T3838] exc_page_fault (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1482 arch/x86/mm/fault.c:1532) [ 272.689202][ T3838] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623) [ 272.690523][ T3838] RIP: 0010:rep_movs_alternative (arch/x86/lib/copy_user_64.S:57) [ 272.692228][ T3838] Code: 83 f9 08 73 25 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <48> 8b 06 48 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 db 83 f9 08 All code ======== 0: 83 f9 08 cmp $0x8,%ecx 3: 73 25 jae 0x2a 5: 85 c9 test %ecx,%ecx 7: 74 0f je 0x18 9: 8a 06 mov (%rsi),%al b: 88 07 mov %al,(%rdi) d: 48 ff c7 inc %rdi 10: 48 ff c6 inc %rsi 13: 48 ff c9 dec %rcx 16: 75 f1 jne 0x9 18: c3 ret 19: cc int3 1a: cc int3 1b: cc int3 1c: cc int3 1d: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 24: 00 00 00 00 28: 66 90 xchg %ax,%ax 2a:* 48 8b 06 mov (%rsi),%rax <-- trapping instruction 2d: 48 89 07 mov %rax,(%rdi) 30: 48 83 c6 08 add $0x8,%rsi 34: 48 83 c7 08 add $0x8,%rdi 38: 83 e9 08 sub $0x8,%ecx 3b: 74 db je 0x18 3d: 83 f9 08 cmp $0x8,%ecx Code starting with the faulting instruction =========================================== 0: 48 8b 06 mov (%rsi),%rax 3: 48 89 07 mov %rax,(%rdi) 6: 48 83 c6 08 add $0x8,%rsi a: 48 83 c7 08 add $0x8,%rdi e: 83 e9 08 sub $0x8,%ecx 11: 74 db je 0xffffffffffffffee 13: 83 f9 08 cmp $0x8,%ecx The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240517/202405171417.1bb0856a-lkp@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved 2024-05-17 7:03 ` kernel test robot @ 2024-05-20 1:47 ` Miaohe Lin 0 siblings, 0 replies; 8+ messages in thread From: Miaohe Lin @ 2024-05-20 1:47 UTC (permalink / raw) To: kernel test robot Cc: oe-lkp, lkp, linux-mm, akpm, shy828301, nao.horiguchi, xuyu, linux-kernel On 2024/5/17 15:03, kernel test robot wrote: > > > Hello, > > kernel test robot noticed "kernel_BUG_at_include/linux/page-flags.h" on: > > commit: 8e6ff9c4aad2c677c53f70d9e193c35cbbafcb88 ("[PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved") > url: https://github.com/intel-lab-lkp/linux/commits/Miaohe-Lin/mm-huge_memory-mark-huge_zero_page-reserved/20240511-115840 > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git cf87f46fd34d6c19283d9625a7822f20d90b64a4 > patch link: https://lore.kernel.org/all/20240511035435.1477004-1-linmiaohe@huawei.com/ > patch subject: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved > > in testcase: trinity > version: trinity-i386-abe9de86-1_20230429 > with following parameters: > > runtime: 300s > group: group-03 > nr_groups: 5 > > > > compiler: gcc-13 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > +------------------------------------------+------------+------------+ > | | cf87f46fd3 | 8e6ff9c4aa | > +------------------------------------------+------------+------------+ > | kernel_BUG_at_include/linux/page-flags.h | 0 | 11 | > | invalid_opcode:#[##] | 0 | 11 | > | RIP:get_huge_zero_page | 0 | 11 | > | Kernel_panic-not_syncing:Fatal_exception | 0 | 11 | > +------------------------------------------+------------+------------+ > Thanks for your report. > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202405171417.1bb0856a-lkp@intel.com > > > [ 272.633454][ T3838] ------------[ cut here ]------------ > [ 272.634362][ T3838] kernel BUG at include/linux/page-flags.h:540! > [ 272.635422][ T3838] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI > [ 272.636518][ T3838] CPU: 0 PID: 3838 Comm: trinity-c2 Not tainted 6.9.0-rc7-00184-g8e6ff9c4aad2 #1 > [ 272.638008][ T3838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 272.639707][ T3838] RIP: 0010:get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1)) I think the root cause is that PG_reserved is inhibited on compound pages. So my original version of patch breaks the assumption. But since PG_reserved is to be removed, I have dropped this patch. Thanks. . ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-20 1:47 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin 2024-05-14 21:14 ` Andrew Morton 2024-05-14 21:28 ` Yang Shi 2024-05-14 21:42 ` Andrew Morton 2024-05-14 21:55 ` Yang Shi 2024-05-15 1:48 ` Miaohe Lin 2024-05-17 7:03 ` kernel test robot 2024-05-20 1:47 ` Miaohe Lin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox