* [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
@ 2024-05-11 3:54 Miaohe Lin
2024-05-14 21:14 ` Andrew Morton
2024-05-17 7:03 ` kernel test robot
0 siblings, 2 replies; 8+ messages in thread
From: Miaohe Lin @ 2024-05-11 3:54 UTC (permalink / raw)
To: akpm; +Cc: shy828301, nao.horiguchi, xuyu, linmiaohe, linux-mm, linux-kernel
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_page
without increasing the page refcnt. But then unpoison_memory() will
decrease the page refcnt unexpectly as it appears like a successfully
hwpoisoned page leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0)
when releasing huge_zero_page.
Fix this issue by marking huge_zero_page reserved. So unpoison_memory()
will skip this page. This will make it consistent with ZERO_PAGE case too.
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 89f58c7603b2..a605bc0437cd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ __SetPageReserved(zero_page);
WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
@@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
+ __ClearPageReserved(zero_page);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
--
2.33.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin
@ 2024-05-14 21:14 ` Andrew Morton
2024-05-14 21:28 ` Yang Shi
2024-05-17 7:03 ` kernel test robot
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2024-05-14 21:14 UTC (permalink / raw)
To: Miaohe Lin; +Cc: shy828301, nao.horiguchi, xuyu, linux-mm, linux-kernel
On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> When I did memory failure tests recently, below panic occurs:
>
> kernel BUG at include/linux/mm.h:1135!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
>
> ...
>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
> __free_pages(zero_page, compound_order(zero_page));
> goto retry;
> }
> + __SetPageReserved(zero_page);
> WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
>
> /* We take additional reference here. It will be put back by shrinker */
> @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
> struct page *zero_page = xchg(&huge_zero_page, NULL);
> BUG_ON(zero_page == NULL);
> WRITE_ONCE(huge_zero_pfn, ~0UL);
> + __ClearPageReserved(zero_page);
> __free_pages(zero_page, compound_order(zero_page));
> return HPAGE_PMD_NR;
> }
This causes a bit of a mess when staged ahead of mm-stable. So to
avoid disruption I staged it behind mm-stable. This means that when
the -stable maintainers try to merge it, they will ask for a fixed up
version for older kernels so you can please just send them this
version.
To facilitate this I added the below adjustment:
(btw, shouldn't get_huge_zero_page() and shrink_huge_zero_page_scan()
be renamed to *_folio_*?)
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-huge_memory-mark-huge_zero_page-reserved-fix
Date: Tue May 14 01:53:37 PM PDT 2024
Update it for 5691753d73a2 ("mm: convert huge_zero_page to huge_zero_folio")
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved-fix
+++ a/mm/huge_memory.c
@@ -212,7 +212,7 @@ retry:
folio_put(zero_folio);
goto retry;
}
- __SetPageReserved(zero_page);
+ __folio_set_reserved(zero_folio);
WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio));
/* We take additional reference here. It will be put back by shrinker */
@@ -265,7 +265,7 @@ static unsigned long shrink_huge_zero_pa
struct folio *zero_folio = xchg(&huge_zero_folio, NULL);
BUG_ON(zero_folio == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
- __ClearPageReserved(zero_page);
+ __folio_clear_reserved(zero_folio);
folio_put(zero_folio);
return HPAGE_PMD_NR;
}
_
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-14 21:14 ` Andrew Morton
@ 2024-05-14 21:28 ` Yang Shi
2024-05-14 21:42 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Yang Shi @ 2024-05-14 21:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel
On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> > When I did memory failure tests recently, below panic occurs:
> >
> > kernel BUG at include/linux/mm.h:1135!
> > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
> >
> > ...
> >
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
> > __free_pages(zero_page, compound_order(zero_page));
> > goto retry;
> > }
> > + __SetPageReserved(zero_page);
> > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
> >
> > /* We take additional reference here. It will be put back by shrinker */
> > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
> > struct page *zero_page = xchg(&huge_zero_page, NULL);
> > BUG_ON(zero_page == NULL);
> > WRITE_ONCE(huge_zero_pfn, ~0UL);
> > + __ClearPageReserved(zero_page);
> > __free_pages(zero_page, compound_order(zero_page));
> > return HPAGE_PMD_NR;
> > }
>
> This causes a bit of a mess when staged ahead of mm-stable. So to
> avoid disruption I staged it behind mm-stable. This means that when
> the -stable maintainers try to merge it, they will ask for a fixed up
> version for older kernels so you can please just send them this
> version.
Can you please drop this from mm-unstable since both I and David
nack'ed a similar patch in another thread.
https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/
Both patches actually do the same thing, just this one uses page, the
other one uses folio.
>
> To facilitate this I added the below adjustment:
>
> (btw, shouldn't get_huge_zero_page() and shrink_huge_zero_page_scan()
> be renamed to *_folio_*?)
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm-huge_memory-mark-huge_zero_page-reserved-fix
> Date: Tue May 14 01:53:37 PM PDT 2024
>
> Update it for 5691753d73a2 ("mm: convert huge_zero_page to huge_zero_folio")
>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
> Cc: Xu Yu <xuyu@linux.alibaba.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/huge_memory.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved-fix
> +++ a/mm/huge_memory.c
> @@ -212,7 +212,7 @@ retry:
> folio_put(zero_folio);
> goto retry;
> }
> - __SetPageReserved(zero_page);
> + __folio_set_reserved(zero_folio);
> WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio));
>
> /* We take additional reference here. It will be put back by shrinker */
> @@ -265,7 +265,7 @@ static unsigned long shrink_huge_zero_pa
> struct folio *zero_folio = xchg(&huge_zero_folio, NULL);
> BUG_ON(zero_folio == NULL);
> WRITE_ONCE(huge_zero_pfn, ~0UL);
> - __ClearPageReserved(zero_page);
> + __folio_clear_reserved(zero_folio);
> folio_put(zero_folio);
> return HPAGE_PMD_NR;
> }
> _
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-14 21:28 ` Yang Shi
@ 2024-05-14 21:42 ` Andrew Morton
2024-05-14 21:55 ` Yang Shi
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2024-05-14 21:42 UTC (permalink / raw)
To: Yang Shi; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel
On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote:
> On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> >
> > > When I did memory failure tests recently, below panic occurs:
> > >
> > > kernel BUG at include/linux/mm.h:1135!
> > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
> > >
> > > ...
> > >
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
> > > __free_pages(zero_page, compound_order(zero_page));
> > > goto retry;
> > > }
> > > + __SetPageReserved(zero_page);
> > > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
> > >
> > > /* We take additional reference here. It will be put back by shrinker */
> > > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
> > > struct page *zero_page = xchg(&huge_zero_page, NULL);
> > > BUG_ON(zero_page == NULL);
> > > WRITE_ONCE(huge_zero_pfn, ~0UL);
> > > + __ClearPageReserved(zero_page);
> > > __free_pages(zero_page, compound_order(zero_page));
> > > return HPAGE_PMD_NR;
> > > }
> >
> > This causes a bit of a mess when staged ahead of mm-stable. So to
> > avoid disruption I staged it behind mm-stable. This means that when
> > the -stable maintainers try to merge it, they will ask for a fixed up
> > version for older kernels so you can please just send them this
> > version.
>
> Can you please drop this from mm-unstable since both I and David
> nack'ed a similar patch in another thread.
> https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/
That appears to link to the incorrect email thread?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-14 21:42 ` Andrew Morton
@ 2024-05-14 21:55 ` Yang Shi
2024-05-15 1:48 ` Miaohe Lin
0 siblings, 1 reply; 8+ messages in thread
From: Yang Shi @ 2024-05-14 21:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miaohe Lin, nao.horiguchi, xuyu, linux-mm, linux-kernel
On Tue, May 14, 2024 at 3:42 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote:
>
> > On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> > >
> > > > When I did memory failure tests recently, below panic occurs:
> > > >
> > > > kernel BUG at include/linux/mm.h:1135!
> > > > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
> > > >
> > > > ...
> > > >
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
> > > > __free_pages(zero_page, compound_order(zero_page));
> > > > goto retry;
> > > > }
> > > > + __SetPageReserved(zero_page);
> > > > WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
> > > >
> > > > /* We take additional reference here. It will be put back by shrinker */
> > > > @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
> > > > struct page *zero_page = xchg(&huge_zero_page, NULL);
> > > > BUG_ON(zero_page == NULL);
> > > > WRITE_ONCE(huge_zero_pfn, ~0UL);
> > > > + __ClearPageReserved(zero_page);
> > > > __free_pages(zero_page, compound_order(zero_page));
> > > > return HPAGE_PMD_NR;
> > > > }
> > >
> > > This causes a bit of a mess when staged ahead of mm-stable. So to
> > > avoid disruption I staged it behind mm-stable. This means that when
> > > the -stable maintainers try to merge it, they will ask for a fixed up
> > > version for older kernels so you can please just send them this
> > > version.
> >
> > Can you please drop this from mm-unstable since both I and David
> > nack'ed a similar patch in another thread.
> > https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/
>
> That appears to link to the incorrect email thread?
I meant that patch is actually same with this one. Just used folio
interface instead of page. I'm not sure why Miaohe posted two. Maybe
target to different version.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-14 21:55 ` Yang Shi
@ 2024-05-15 1:48 ` Miaohe Lin
0 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2024-05-15 1:48 UTC (permalink / raw)
To: Yang Shi, Andrew Morton; +Cc: nao.horiguchi, xuyu, linux-mm, linux-kernel
On 2024/5/15 5:55, Yang Shi wrote:
> On Tue, May 14, 2024 at 3:42 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> On Tue, 14 May 2024 15:28:12 -0600 Yang Shi <shy828301@gmail.com> wrote:
>>
>>> On Tue, May 14, 2024 at 3:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>>>
>>>> On Sat, 11 May 2024 11:54:35 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>>> When I did memory failure tests recently, below panic occurs:
>>>>>
>>>>> kernel BUG at include/linux/mm.h:1135!
>>>>> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>> CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
>>>>>
>>>>> ...
>>>>>
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -208,6 +208,7 @@ static bool get_huge_zero_page(void)
>>>>> __free_pages(zero_page, compound_order(zero_page));
>>>>> goto retry;
>>>>> }
>>>>> + __SetPageReserved(zero_page);
>>>>> WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
>>>>>
>>>>> /* We take additional reference here. It will be put back by shrinker */
>>>>> @@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
>>>>> struct page *zero_page = xchg(&huge_zero_page, NULL);
>>>>> BUG_ON(zero_page == NULL);
>>>>> WRITE_ONCE(huge_zero_pfn, ~0UL);
>>>>> + __ClearPageReserved(zero_page);
>>>>> __free_pages(zero_page, compound_order(zero_page));
>>>>> return HPAGE_PMD_NR;
>>>>> }
>>>>
>>>> This causes a bit of a mess when staged ahead of mm-stable. So to
>>>> avoid disruption I staged it behind mm-stable. This means that when
>>>> the -stable maintainers try to merge it, they will ask for a fixed up
>>>> version for older kernels so you can please just send them this
>>>> version.
>>>
>>> Can you please drop this from mm-unstable since both I and David
>>> nack'ed a similar patch in another thread.
>>> https://lore.kernel.org/linux-mm/20240511032801.1295023-1-linmiaohe@huawei.com/
>>
>> That appears to link to the incorrect email thread?
>
> I meant that patch is actually same with this one. Just used folio
> interface instead of page. I'm not sure why Miaohe posted two. Maybe
> target to different version.
Sorry for causing confusion. These two patches really target to different branch.
This patch is for mainline and another one for mm-unstable.
Thanks both.
.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin
2024-05-14 21:14 ` Andrew Morton
@ 2024-05-17 7:03 ` kernel test robot
2024-05-20 1:47 ` Miaohe Lin
1 sibling, 1 reply; 8+ messages in thread
From: kernel test robot @ 2024-05-17 7:03 UTC (permalink / raw)
To: Miaohe Lin
Cc: oe-lkp, lkp, linux-mm, akpm, shy828301, nao.horiguchi, xuyu,
linmiaohe, linux-kernel, oliver.sang
Hello,
kernel test robot noticed "kernel_BUG_at_include/linux/page-flags.h" on:
commit: 8e6ff9c4aad2c677c53f70d9e193c35cbbafcb88 ("[PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved")
url: https://github.com/intel-lab-lkp/linux/commits/Miaohe-Lin/mm-huge_memory-mark-huge_zero_page-reserved/20240511-115840
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git cf87f46fd34d6c19283d9625a7822f20d90b64a4
patch link: https://lore.kernel.org/all/20240511035435.1477004-1-linmiaohe@huawei.com/
patch subject: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:
runtime: 300s
group: group-03
nr_groups: 5
compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+------------------------------------------+------------+------------+
| | cf87f46fd3 | 8e6ff9c4aa |
+------------------------------------------+------------+------------+
| kernel_BUG_at_include/linux/page-flags.h | 0 | 11 |
| invalid_opcode:#[##] | 0 | 11 |
| RIP:get_huge_zero_page | 0 | 11 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 11 |
+------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405171417.1bb0856a-lkp@intel.com
[ 272.633454][ T3838] ------------[ cut here ]------------
[ 272.634362][ T3838] kernel BUG at include/linux/page-flags.h:540!
[ 272.635422][ T3838] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 272.636518][ T3838] CPU: 0 PID: 3838 Comm: trinity-c2 Not tainted 6.9.0-rc7-00184-g8e6ff9c4aad2 #1
[ 272.638008][ T3838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 272.639707][ T3838] RIP: 0010:get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.650951][ T3838] Code: 04 02 00 00 00 65 ff 0d 0a 0b 75 7e 74 1e 65 48 ff 05 50 e5 74 7e e9 5c fe ff ff 48 c7 c6 a0 bb 52 83 48 89 df e8 64 89 f2 ff <0f> 0b 0f 1f 44 00 00 eb db 48 c7 c6 40 bc 52 83 48 89 df e8 4c 89
All code
========
0: 04 02 add $0x2,%al
2: 00 00 add %al,(%rax)
4: 00 65 ff add %ah,-0x1(%rbp)
7: 0d 0a 0b 75 7e or $0x7e750b0a,%eax
c: 74 1e je 0x2c
e: 65 48 ff 05 50 e5 74 incq %gs:0x7e74e550(%rip) # 0x7e74e566
15: 7e
16: e9 5c fe ff ff jmp 0xfffffffffffffe77
1b: 48 c7 c6 a0 bb 52 83 mov $0xffffffff8352bba0,%rsi
22: 48 89 df mov %rbx,%rdi
25: e8 64 89 f2 ff call 0xfffffffffff2898e
2a:* 0f 0b ud2 <-- trapping instruction
2c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
31: eb db jmp 0xe
33: 48 c7 c6 40 bc 52 83 mov $0xffffffff8352bc40,%rsi
3a: 48 89 df mov %rbx,%rdi
3d: e8 .byte 0xe8
3e: 4c rex.WR
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
7: eb db jmp 0xffffffffffffffe4
9: 48 c7 c6 40 bc 52 83 mov $0xffffffff8352bc40,%rsi
10: 48 89 df mov %rbx,%rdi
13: e8 .byte 0xe8
14: 4c rex.WR
15: 89 .byte 0x89
[ 272.654159][ T3838] RSP: 0000:ffffc90001507988 EFLAGS: 00010246
[ 272.655196][ T3838] RAX: 0000000000000000 RBX: ffffea00068d8000 RCX: 0000000000000000
[ 272.656504][ T3838] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 272.657799][ T3838] RBP: ffff8881b94fdf08 R08: 0000000000000000 R09: 0000000000000000
[ 272.659142][ T3838] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00043fb100
[ 272.660533][ T3838] R13: ffff888109ecdb08 R14: ffffc90001507a70 R15: 0000000000100173
[ 272.661849][ T3838] FS: 0000000000000000(0000) GS:ffff8883af200000(0063) knlGS:00000000f7f1a280
[ 272.663303][ T3838] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 272.664430][ T3838] CR2: 00000000ff7fffff CR3: 00000001ba4ca000 CR4: 00000000000406b0
[ 272.665759][ T3838] Call Trace:
[ 272.666325][ T3838] <TASK>
[ 272.666824][ T3838] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447)
[ 272.667483][ T3838] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155)
[ 272.668253][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.669128][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.669985][ T3838] ? do_error_trap (arch/x86/kernel/traps.c:176)
[ 272.670793][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.671731][ T3838] ? handle_invalid_op (arch/x86/kernel/traps.c:214)
[ 272.672551][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.673411][ T3838] ? exc_invalid_op (arch/x86/kernel/traps.c:267)
[ 272.674224][ T3838] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 272.675064][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.679035][ T3838] ? get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
[ 272.679922][ T3838] mm_get_huge_zero_page (mm/huge_memory.c:235 (discriminator 1))
[ 272.680764][ T3838] do_huge_pmd_anonymous_page (mm/huge_memory.c:1020)
[ 272.681694][ T3838] ? mt_find (include/linux/rcupdate.h:339 (discriminator 1) include/linux/rcupdate.h:814 (discriminator 1) lib/maple_tree.c:6954 (discriminator 1))
[ 272.682432][ T3838] __handle_mm_fault (mm/memory.c:5175 mm/memory.c:5412)
[ 272.683260][ T3838] ? handle_pte_fault (mm/memory.c:5352)
[ 272.684150][ T3838] ? find_vma (mm/mmap.c:1889)
[ 272.684891][ T3838] ? vma_link_file (mm/mmap.c:1889)
[ 272.685676][ T3838] ? handle_mm_fault (mm/memory.c:5576)
[ 272.686461][ T3838] handle_mm_fault (mm/memory.c:5466 mm/memory.c:5622)
[ 272.687265][ T3838] do_user_addr_fault (arch/x86/mm/fault.c:1384)
[ 272.688116][ T3838] exc_page_fault (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1482 arch/x86/mm/fault.c:1532)
[ 272.689202][ T3838] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
[ 272.690523][ T3838] RIP: 0010:rep_movs_alternative (arch/x86/lib/copy_user_64.S:57)
[ 272.692228][ T3838] Code: 83 f9 08 73 25 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <48> 8b 06 48 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 db 83 f9 08
All code
========
0: 83 f9 08 cmp $0x8,%ecx
3: 73 25 jae 0x2a
5: 85 c9 test %ecx,%ecx
7: 74 0f je 0x18
9: 8a 06 mov (%rsi),%al
b: 88 07 mov %al,(%rdi)
d: 48 ff c7 inc %rdi
10: 48 ff c6 inc %rsi
13: 48 ff c9 dec %rcx
16: 75 f1 jne 0x9
18: c3 ret
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
24: 00 00 00 00
28: 66 90 xchg %ax,%ax
2a:* 48 8b 06 mov (%rsi),%rax <-- trapping instruction
2d: 48 89 07 mov %rax,(%rdi)
30: 48 83 c6 08 add $0x8,%rsi
34: 48 83 c7 08 add $0x8,%rdi
38: 83 e9 08 sub $0x8,%ecx
3b: 74 db je 0x18
3d: 83 f9 08 cmp $0x8,%ecx
Code starting with the faulting instruction
===========================================
0: 48 8b 06 mov (%rsi),%rax
3: 48 89 07 mov %rax,(%rdi)
6: 48 83 c6 08 add $0x8,%rsi
a: 48 83 c7 08 add $0x8,%rdi
e: 83 e9 08 sub $0x8,%ecx
11: 74 db je 0xffffffffffffffee
13: 83 f9 08 cmp $0x8,%ecx
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240517/202405171417.1bb0856a-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
2024-05-17 7:03 ` kernel test robot
@ 2024-05-20 1:47 ` Miaohe Lin
0 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2024-05-20 1:47 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-mm, akpm, shy828301, nao.horiguchi, xuyu,
linux-kernel
On 2024/5/17 15:03, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "kernel_BUG_at_include/linux/page-flags.h" on:
>
> commit: 8e6ff9c4aad2c677c53f70d9e193c35cbbafcb88 ("[PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved")
> url: https://github.com/intel-lab-lkp/linux/commits/Miaohe-Lin/mm-huge_memory-mark-huge_zero_page-reserved/20240511-115840
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git cf87f46fd34d6c19283d9625a7822f20d90b64a4
> patch link: https://lore.kernel.org/all/20240511035435.1477004-1-linmiaohe@huawei.com/
> patch subject: [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved
>
> in testcase: trinity
> version: trinity-i386-abe9de86-1_20230429
> with following parameters:
>
> runtime: 300s
> group: group-03
> nr_groups: 5
>
>
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> +------------------------------------------+------------+------------+
> | | cf87f46fd3 | 8e6ff9c4aa |
> +------------------------------------------+------------+------------+
> | kernel_BUG_at_include/linux/page-flags.h | 0 | 11 |
> | invalid_opcode:#[##] | 0 | 11 |
> | RIP:get_huge_zero_page | 0 | 11 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 11 |
> +------------------------------------------+------------+------------+
>
Thanks for your report.
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202405171417.1bb0856a-lkp@intel.com
>
>
> [ 272.633454][ T3838] ------------[ cut here ]------------
> [ 272.634362][ T3838] kernel BUG at include/linux/page-flags.h:540!
> [ 272.635422][ T3838] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 272.636518][ T3838] CPU: 0 PID: 3838 Comm: trinity-c2 Not tainted 6.9.0-rc7-00184-g8e6ff9c4aad2 #1
> [ 272.638008][ T3838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 272.639707][ T3838] RIP: 0010:get_huge_zero_page (include/linux/page-flags.h:540 (discriminator 1) mm/huge_memory.c:211 (discriminator 1))
I think the root cause is that PG_reserved is inhibited on compound pages. So my original version of patch breaks the assumption.
But since PG_reserved is to be removed, I have dropped this patch.
Thanks.
.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-20 1:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-11 3:54 [PATCH -rc7] mm/huge_memory: mark huge_zero_page reserved Miaohe Lin
2024-05-14 21:14 ` Andrew Morton
2024-05-14 21:28 ` Yang Shi
2024-05-14 21:42 ` Andrew Morton
2024-05-14 21:55 ` Yang Shi
2024-05-15 1:48 ` Miaohe Lin
2024-05-17 7:03 ` kernel test robot
2024-05-20 1:47 ` Miaohe Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox