* unhandlable pages
@ 2022-03-21 17:17 Luck, Tony
2022-03-21 17:21 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Luck, Tony @ 2022-03-21 17:17 UTC (permalink / raw)
To: linux-mm; +Cc: shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm
Validation folks are seeing this on a v5.16 kernel. I don't
see any changes in v5.17 that look like they address it.
Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
Som debugging shows this is an anon page (expected ... that's the
type of page where the error was injected. They see shake_page()
called three times, but it doesn't change anything, so the page
is reported as unhandlable.
-Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages
2022-03-21 17:17 unhandlable pages Luck, Tony
@ 2022-03-21 17:21 ` Matthew Wilcox
2022-03-21 17:28 ` Luck, Tony
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2022-03-21 17:21 UTC (permalink / raw)
To: Luck, Tony
Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm
On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote:
> Validation folks are seeing this on a v5.16 kernel. I don't
> see any changes in v5.17 that look like they address it.
>
> Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
> Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
> Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
>
> Som debugging shows this is an anon page (expected ... that's the
> type of page where the error was injected. They see shake_page()
> called three times, but it doesn't change anything, so the page
> is reported as unhandlable.
Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have
the bottom bit set with the rest of the page->mapping pointing to its
anon_vma. Why do you think it's an anon page?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages
2022-03-21 17:21 ` Matthew Wilcox
@ 2022-03-21 17:28 ` Luck, Tony
2022-03-21 17:32 ` Matthew Wilcox
2022-03-21 21:07 ` Yang Shi
0 siblings, 2 replies; 7+ messages in thread
From: Luck, Tony @ 2022-03-21 17:28 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm
On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote:
> > Validation folks are seeing this on a v5.16 kernel. I don't
> > see any changes in v5.17 that look like they address it.
> >
> > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
> > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
> > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
> > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
> >
> > Som debugging shows this is an anon page (expected ... that's the
> > type of page where the error was injected. They see shake_page()
> > called three times, but it doesn't change anything, so the page
> > is reported as unhandlable.
>
> Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have
> the bottom bit set with the rest of the page->mapping pointing to its
> anon_vma. Why do you think it's an anon page?
Sorry. I didn't do that decode ... just copied what was in the internal
report. If it isn't anon, then does that page dump give info on what
type the page is?
-Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages
2022-03-21 17:28 ` Luck, Tony
@ 2022-03-21 17:32 ` Matthew Wilcox
2022-03-21 21:07 ` Yang Shi
1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2022-03-21 17:32 UTC (permalink / raw)
To: Luck, Tony
Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm
On Mon, Mar 21, 2022 at 10:28:25AM -0700, Luck, Tony wrote:
> On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote:
> > On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote:
> > > Validation folks are seeing this on a v5.16 kernel. I don't
> > > see any changes in v5.17 that look like they address it.
> > >
> > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
> > > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
> > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
> > > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
> > >
> > > Som debugging shows this is an anon page (expected ... that's the
> > > type of page where the error was injected. They see shake_page()
> > > called three times, but it doesn't change anything, so the page
> > > is reported as unhandlable.
> >
> > Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have
> > the bottom bit set with the rest of the page->mapping pointing to its
> > anon_vma. Why do you think it's an anon page?
>
> Sorry. I didn't do that decode ... just copied what was in the internal
> report. If it isn't anon, then does that page dump give info on what
> type the page is?
No (I would have volunteered that if it had a type!) It could be a
truncated page cache page. But it has the Reserved bit set, which makes
me think it's a device driver or something?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages
2022-03-21 17:28 ` Luck, Tony
2022-03-21 17:32 ` Matthew Wilcox
@ 2022-03-21 21:07 ` Yang Shi
2022-03-21 21:19 ` Luck, Tony
2022-03-22 0:10 ` Jane Chu
1 sibling, 2 replies; 7+ messages in thread
From: Yang Shi @ 2022-03-21 21:07 UTC (permalink / raw)
To: Luck, Tony
Cc: Matthew Wilcox, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Mike Kravetz, bp, Miaohe Lin, Andrew Morton
On Mon, Mar 21, 2022 at 10:28 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote:
> > On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote:
> > > Validation folks are seeing this on a v5.16 kernel. I don't
> > > see any changes in v5.17 that look like they address it.
> > >
> > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
> > > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
> > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
> > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
> > > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
> > >
> > > Som debugging shows this is an anon page (expected ... that's the
> > > type of page where the error was injected. They see shake_page()
> > > called three times, but it doesn't change anything, so the page
> > > is reported as unhandlable.
> >
> > Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have
> > the bottom bit set with the rest of the page->mapping pointing to its
> > anon_vma. Why do you think it's an anon page?
>
> Sorry. I didn't do that decode ... just copied what was in the internal
> report. If it isn't anon, then does that page dump give info on what
> type the page is?
As Willy said we can't tell what type the page is. Per the dumped
information, the page has:
- 1 refcount, likely get from hwpoison
- 0 mapcount, unmapped and not unmapped by hwpoison since
dump_page() is called before that,
- NULL mapping
- PG_reserved flag is set and no other flag is set
So I just can say it is very unlikely to be an anonymous page. It is
not slab either. And neither anonymous/page cache nor slab should be
reserved flag set, so it should be some other types.
>
> -Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: unhandlable pages
2022-03-21 21:07 ` Yang Shi
@ 2022-03-21 21:19 ` Luck, Tony
2022-03-22 0:10 ` Jane Chu
1 sibling, 0 replies; 7+ messages in thread
From: Luck, Tony @ 2022-03-21 21:19 UTC (permalink / raw)
To: Yang Shi
Cc: Matthew Wilcox, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Mike Kravetz, bp, Miaohe Lin, Andrew Morton
> So I just can say it is very unlikely to be an anonymous page. It is
> not slab either. And neither anonymous/page cache nor slab should be
> reserved flag set, so it should be some other types.
Thanks to both of you.
Will get folks to dig deeper into which page they injected the UC error into.
That may take a bit as this test runs for a while injecting and recovering
until it hits this unhandlable page case.
-Tony
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages
2022-03-21 21:07 ` Yang Shi
2022-03-21 21:19 ` Luck, Tony
@ 2022-03-22 0:10 ` Jane Chu
1 sibling, 0 replies; 7+ messages in thread
From: Jane Chu @ 2022-03-22 0:10 UTC (permalink / raw)
To: Yang Shi, Luck, Tony
Cc: Matthew Wilcox, Linux MM,
HORIGUCHI NAOYA(堀口 直也),
Mike Kravetz, bp, Miaohe Lin, Andrew Morton
On 3/21/2022 2:07 PM, Yang Shi wrote:
> On Mon, Mar 21, 2022 at 10:28 AM Luck, Tony <tony.luck@intel.com> wrote:
>>
>> On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote:
>>> On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote:
>>>> Validation folks are seeing this on a v5.16 kernel. I don't
>>>> see any changes in v5.17 that look like they address it.
>>>>
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page
>>>> Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored
>>>>
>>>> Som debugging shows this is an anon page (expected ... that's the
>>>> type of page where the error was injected. They see shake_page()
>>>> called three times, but it doesn't change anything, so the page
>>>> is reported as unhandlable.
>>>
>>> Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have
>>> the bottom bit set with the rest of the page->mapping pointing to its
>>> anon_vma. Why do you think it's an anon page?
>>
>> Sorry. I didn't do that decode ... just copied what was in the internal
>> report. If it isn't anon, then does that page dump give info on what
>> type the page is?
>
> As Willy said we can't tell what type the page is. Per the dumped
> information, the page has:
> - 1 refcount, likely get from hwpoison
> - 0 mapcount, unmapped and not unmapped by hwpoison since
> dump_page() is called before that,
> - NULL mapping
> - PG_reserved flag is set and no other flag is set
>
> So I just can say it is very unlikely to be an anonymous page. It is
> not slab either. And neither anonymous/page cache nor slab should be
> reserved flag set, so it should be some other types.
Tony,
Agreed with Matthew, it might have been a device page.
So we have pfn:0x195cda44, does /proc/iomem leave any clue?
thanks,
-jane
>
>>
>> -Tony
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-22 0:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 17:17 unhandlable pages Luck, Tony
2022-03-21 17:21 ` Matthew Wilcox
2022-03-21 17:28 ` Luck, Tony
2022-03-21 17:32 ` Matthew Wilcox
2022-03-21 21:07 ` Yang Shi
2022-03-21 21:19 ` Luck, Tony
2022-03-22 0:10 ` Jane Chu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox