* unhandlable pages @ 2022-03-21 17:17 Luck, Tony 2022-03-21 17:21 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Luck, Tony @ 2022-03-21 17:17 UTC (permalink / raw) To: linux-mm; +Cc: shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm Validation folks are seeing this on a v5.16 kernel. I don't see any changes in v5.17 that look like they address it. Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored Som debugging shows this is an anon page (expected ... that's the type of page where the error was injected. They see shake_page() called three times, but it doesn't change anything, so the page is reported as unhandlable. -Tony ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages 2022-03-21 17:17 unhandlable pages Luck, Tony @ 2022-03-21 17:21 ` Matthew Wilcox 2022-03-21 17:28 ` Luck, Tony 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2022-03-21 17:21 UTC (permalink / raw) To: Luck, Tony Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote: > Validation folks are seeing this on a v5.16 kernel. I don't > see any changes in v5.17 that look like they address it. > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored > > Som debugging shows this is an anon page (expected ... that's the > type of page where the error was injected. They see shake_page() > called three times, but it doesn't change anything, so the page > is reported as unhandlable. Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have the bottom bit set with the rest of the page->mapping pointing to its anon_vma. Why do you think it's an anon page? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages 2022-03-21 17:21 ` Matthew Wilcox @ 2022-03-21 17:28 ` Luck, Tony 2022-03-21 17:32 ` Matthew Wilcox 2022-03-21 21:07 ` Yang Shi 0 siblings, 2 replies; 7+ messages in thread From: Luck, Tony @ 2022-03-21 17:28 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote: > On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote: > > Validation folks are seeing this on a v5.16 kernel. I don't > > see any changes in v5.17 that look like they address it. > > > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 > > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page > > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored > > > > Som debugging shows this is an anon page (expected ... that's the > > type of page where the error was injected. They see shake_page() > > called three times, but it doesn't change anything, so the page > > is reported as unhandlable. > > Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have > the bottom bit set with the rest of the page->mapping pointing to its > anon_vma. Why do you think it's an anon page? Sorry. I didn't do that decode ... just copied what was in the internal report. If it isn't anon, then does that page dump give info on what type the page is? -Tony ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages 2022-03-21 17:28 ` Luck, Tony @ 2022-03-21 17:32 ` Matthew Wilcox 2022-03-21 21:07 ` Yang Shi 1 sibling, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2022-03-21 17:32 UTC (permalink / raw) To: Luck, Tony Cc: linux-mm, shy828301, naoya.horiguchi, mike.kravetz, bp, linmiaohe, akpm On Mon, Mar 21, 2022 at 10:28:25AM -0700, Luck, Tony wrote: > On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote: > > On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote: > > > Validation folks are seeing this on a v5.16 kernel. I don't > > > see any changes in v5.17 that look like they address it. > > > > > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 > > > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 > > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > > > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page > > > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored > > > > > > Som debugging shows this is an anon page (expected ... that's the > > > type of page where the error was injected. They see shake_page() > > > called three times, but it doesn't change anything, so the page > > > is reported as unhandlable. > > > > Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have > > the bottom bit set with the rest of the page->mapping pointing to its > > anon_vma. Why do you think it's an anon page? > > Sorry. I didn't do that decode ... just copied what was in the internal > report. If it isn't anon, then does that page dump give info on what > type the page is? No (I would have volunteered that if it had a type!) It could be a truncated page cache page. But it has the Reserved bit set, which makes me think it's a device driver or something? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages 2022-03-21 17:28 ` Luck, Tony 2022-03-21 17:32 ` Matthew Wilcox @ 2022-03-21 21:07 ` Yang Shi 2022-03-21 21:19 ` Luck, Tony 2022-03-22 0:10 ` Jane Chu 1 sibling, 2 replies; 7+ messages in thread From: Yang Shi @ 2022-03-21 21:07 UTC (permalink / raw) To: Luck, Tony Cc: Matthew Wilcox, Linux MM, HORIGUCHI NAOYA(堀口 直也), Mike Kravetz, bp, Miaohe Lin, Andrew Morton On Mon, Mar 21, 2022 at 10:28 AM Luck, Tony <tony.luck@intel.com> wrote: > > On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote: > > On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote: > > > Validation folks are seeing this on a v5.16 kernel. I don't > > > see any changes in v5.17 that look like they address it. > > > > > > Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 > > > Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 > > > Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > > > Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page > > > Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored > > > > > > Som debugging shows this is an anon page (expected ... that's the > > > type of page where the error was injected. They see shake_page() > > > called three times, but it doesn't change anything, so the page > > > is reported as unhandlable. > > > > Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have > > the bottom bit set with the rest of the page->mapping pointing to its > > anon_vma. Why do you think it's an anon page? > > Sorry. I didn't do that decode ... just copied what was in the internal > report. If it isn't anon, then does that page dump give info on what > type the page is? As Willy said we can't tell what type the page is. Per the dumped information, the page has: - 1 refcount, likely get from hwpoison - 0 mapcount, unmapped and not unmapped by hwpoison since dump_page() is called before that, - NULL mapping - PG_reserved flag is set and no other flag is set So I just can say it is very unlikely to be an anonymous page. It is not slab either. And neither anonymous/page cache nor slab should be reserved flag set, so it should be some other types. > > -Tony ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: unhandlable pages 2022-03-21 21:07 ` Yang Shi @ 2022-03-21 21:19 ` Luck, Tony 2022-03-22 0:10 ` Jane Chu 1 sibling, 0 replies; 7+ messages in thread From: Luck, Tony @ 2022-03-21 21:19 UTC (permalink / raw) To: Yang Shi Cc: Matthew Wilcox, Linux MM, HORIGUCHI NAOYA(堀口 直也), Mike Kravetz, bp, Miaohe Lin, Andrew Morton > So I just can say it is very unlikely to be an anonymous page. It is > not slab either. And neither anonymous/page cache nor slab should be > reserved flag set, so it should be some other types. Thanks to both of you. Will get folks to dig deeper into which page they injected the UC error into. That may take a bit as this test runs for a while injecting and recovering until it hits this unhandlable page case. -Tony ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unhandlable pages 2022-03-21 21:07 ` Yang Shi 2022-03-21 21:19 ` Luck, Tony @ 2022-03-22 0:10 ` Jane Chu 1 sibling, 0 replies; 7+ messages in thread From: Jane Chu @ 2022-03-22 0:10 UTC (permalink / raw) To: Yang Shi, Luck, Tony Cc: Matthew Wilcox, Linux MM, HORIGUCHI NAOYA(堀口 直也), Mike Kravetz, bp, Miaohe Lin, Andrew Morton On 3/21/2022 2:07 PM, Yang Shi wrote: > On Mon, Mar 21, 2022 at 10:28 AM Luck, Tony <tony.luck@intel.com> wrote: >> >> On Mon, Mar 21, 2022 at 05:21:05PM +0000, Matthew Wilcox wrote: >>> On Mon, Mar 21, 2022 at 10:17:29AM -0700, Luck, Tony wrote: >>>> Validation folks are seeing this on a v5.16 kernel. I don't >>>> see any changes in v5.17 that look like they address it. >>>> >>>> Mar 04 14:05:05 JF5300-07B181T kernel: page:00000000696b0b6a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x195cda44 >>>> Mar 04 14:05:05 JF5300-07B181T kernel: flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) >>>> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0057ffffc0801000 ffff6ea817369108 ffff6ea817369108 0000000000000000 >>>> Mar 04 14:05:05 JF5300-07B181T kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 >>>> Mar 04 14:05:05 JF5300-07B181T kernel: page dumped because: hwpoison: unhandlable page >>>> Mar 04 14:05:05 JF5300-07B181T kernel: Memory failure: 0x195cda44: recovery action for unknown page: Ignored >>>> >>>> Som debugging shows this is an anon page (expected ... that's the >>>> type of page where the error was injected. They see shake_page() >>>> called three times, but it doesn't change anything, so the page >>>> is reported as unhandlable. >>> >>> Uhm, that's not PageAnon. page->mapping is NULL, and anon pages have >>> the bottom bit set with the rest of the page->mapping pointing to its >>> anon_vma. Why do you think it's an anon page? >> >> Sorry. I didn't do that decode ... just copied what was in the internal >> report. If it isn't anon, then does that page dump give info on what >> type the page is? > > As Willy said we can't tell what type the page is. Per the dumped > information, the page has: > - 1 refcount, likely get from hwpoison > - 0 mapcount, unmapped and not unmapped by hwpoison since > dump_page() is called before that, > - NULL mapping > - PG_reserved flag is set and no other flag is set > > So I just can say it is very unlikely to be an anonymous page. It is > not slab either. And neither anonymous/page cache nor slab should be > reserved flag set, so it should be some other types. Tony, Agreed with Matthew, it might have been a device page. So we have pfn:0x195cda44, does /proc/iomem leave any clue? thanks, -jane > >> >> -Tony > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-22 0:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-21 17:17 unhandlable pages Luck, Tony 2022-03-21 17:21 ` Matthew Wilcox 2022-03-21 17:28 ` Luck, Tony 2022-03-21 17:32 ` Matthew Wilcox 2022-03-21 21:07 ` Yang Shi 2022-03-21 21:19 ` Luck, Tony 2022-03-22 0:10 ` Jane Chu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox