* [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time
@ 2023-07-15 9:24 Mikhail Gavrilov
2023-07-16 2:41 ` Hugh Dickins
2023-07-16 3:12 ` Bagas Sanjaya
0 siblings, 2 replies; 5+ messages in thread
From: Mikhail Gavrilov @ 2023-07-15 9:24 UTC (permalink / raw)
To: Linux Memory Management List, Linux List Kernel Mailing, hughd
[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]
Hi,
It's ok that I see "mm/pgtable-generic.c:53: bad pmd
(____ptrval____)(8000000100077061)" every boot time?
Unfortunately bisect couldn't say which of commits
# possible first bad commit:
[be872f83bf571f4f9a0ac25e2c9c36e905a36619] mm/pagewalk:
walk_pte_range() allow for pte_offset_map()
# possible first bad commit:
[7780d04046a2288ab85d88bedacc60fa4fad9971] mm/pagewalkers:
ACTION_AGAIN if pte_offset_map_lock() fails
# possible first bad commit:
[2798bbe75b9c2752b46d292e5c2a49f49da36418] mm/page_vma_mapped:
pte_offset_map_nolock() not pte_lockptr()
# possible first bad commit:
[90f43b0a13cddb09e2686f4d976751c0a9b8b197] mm/page_vma_mapped:
reformat map_pte() with less indentation
# possible first bad commit:
[45fe85e9811ede2d65b21724cae50d6a0563e452] mm/page_vma_mapped: delete
bogosity in page_vma_mapped_walk()
# possible first bad commit:
[65747aaf42b7db6acb8e57a2b8e9959928f404dd] mm/filemap: allow
pte_offset_map_lock() to fail
# possible first bad commit:
[0d940a9b270b9220dcff74d8e9123c9788365751] mm/pgtable: allow
pte_offset_map[_lock]() to fail
definitely first bad because my machine on which I am was doing
bisection is unbootable on these commits.
I hope Hugh Dickins can figure out what's going on here. He is the
author of these commits.
All mine machines are based on the AMD platform two 7950X and one 5900HX.
It seems that this message is harmless for the system in any way, but
I can't judge it is a bug or not.
From the user side it looks like regression because on commit
46c475bd676bb05077c8a38b37f175552f035406 this message was absent.
--
Best Regards,
Mike Gavrilov.
[-- Attachment #2: bisect-log-bad-pmd.txt --]
[-- Type: text/plain, Size: 3840 bytes --]
git bisect start
# status: waiting for both good and bad commits
# good: [6aeadf7896bff4ca230702daba8788455e6b866e] Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
git bisect good 6aeadf7896bff4ca230702daba8788455e6b866e
# status: waiting for bad commit, 1 good commit known
# bad: [3a8a670eeeaa40d87bd38a587438952741980c18] Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 3a8a670eeeaa40d87bd38a587438952741980c18
# good: [0c3d6fd4b89c1a6393283249cdd0bd484ad8f2e5] tools: ynl: improve the direct-include header guard logic
git bisect good 0c3d6fd4b89c1a6393283249cdd0bd484ad8f2e5
# bad: [84fccbba93103b22044617e419ba20e1403b4a65] Merge tag 'spi-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect bad 84fccbba93103b22044617e419ba20e1403b4a65
# bad: [833dfc0090b3f8017ddac82d818b2d8e5ceb61db] mm: compaction: mark kcompactd_run() and kcompactd_stop() __meminit
git bisect bad 833dfc0090b3f8017ddac82d818b2d8e5ceb61db
# good: [3ecdeb0f876e91c4a7129ba2ba5baa530aa6c4f9] swap: remove __swp_swapcount()
git bisect good 3ecdeb0f876e91c4a7129ba2ba5baa530aa6c4f9
# good: [975ca3986bec8ebd6d8b45f4a7f77c730e424ac4] x86: allow get_locked_pte() to fail
git bisect good 975ca3986bec8ebd6d8b45f4a7f77c730e424ac4
# bad: [a92cbb82c8d375d47fbaf0e1ad3fd4074a7cb156] perf/core: allow pte_offset_map() to fail
git bisect bad a92cbb82c8d375d47fbaf0e1ad3fd4074a7cb156
# bad: [6ec1905f6ec7f9f79ca3eaeaf04584b4dcddd743] mm/hmm: retry if pte_offset_map() fails
git bisect bad 6ec1905f6ec7f9f79ca3eaeaf04584b4dcddd743
# good: [46c475bd676bb05077c8a38b37f175552f035406] mm/pgtable: kmap_local_page() instead of kmap_atomic()
git bisect good 46c475bd676bb05077c8a38b37f175552f035406
# skip: [2798bbe75b9c2752b46d292e5c2a49f49da36418] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr()
git bisect skip 2798bbe75b9c2752b46d292e5c2a49f49da36418
# bad: [be872f83bf571f4f9a0ac25e2c9c36e905a36619] mm/pagewalk: walk_pte_range() allow for pte_offset_map()
git bisect bad be872f83bf571f4f9a0ac25e2c9c36e905a36619
# skip: [45fe85e9811ede2d65b21724cae50d6a0563e452] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk()
git bisect skip 45fe85e9811ede2d65b21724cae50d6a0563e452
# skip: [65747aaf42b7db6acb8e57a2b8e9959928f404dd] mm/filemap: allow pte_offset_map_lock() to fail
git bisect skip 65747aaf42b7db6acb8e57a2b8e9959928f404dd
# skip: [90f43b0a13cddb09e2686f4d976751c0a9b8b197] mm/page_vma_mapped: reformat map_pte() with less indentation
git bisect skip 90f43b0a13cddb09e2686f4d976751c0a9b8b197
# skip: [0d940a9b270b9220dcff74d8e9123c9788365751] mm/pgtable: allow pte_offset_map[_lock]() to fail
git bisect skip 0d940a9b270b9220dcff74d8e9123c9788365751
# skip: [7780d04046a2288ab85d88bedacc60fa4fad9971] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails
git bisect skip 7780d04046a2288ab85d88bedacc60fa4fad9971
# only skipped commits left to test
# possible first bad commit: [be872f83bf571f4f9a0ac25e2c9c36e905a36619] mm/pagewalk: walk_pte_range() allow for pte_offset_map()
# possible first bad commit: [7780d04046a2288ab85d88bedacc60fa4fad9971] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails
# possible first bad commit: [2798bbe75b9c2752b46d292e5c2a49f49da36418] mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr()
# possible first bad commit: [90f43b0a13cddb09e2686f4d976751c0a9b8b197] mm/page_vma_mapped: reformat map_pte() with less indentation
# possible first bad commit: [45fe85e9811ede2d65b21724cae50d6a0563e452] mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk()
# possible first bad commit: [65747aaf42b7db6acb8e57a2b8e9959928f404dd] mm/filemap: allow pte_offset_map_lock() to fail
# possible first bad commit: [0d940a9b270b9220dcff74d8e9123c9788365751] mm/pgtable: allow pte_offset_map[_lock]() to fail
[-- Attachment #3: dmesg.zip --]
[-- Type: application/zip, Size: 62748 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time
2023-07-15 9:24 [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time Mikhail Gavrilov
@ 2023-07-16 2:41 ` Hugh Dickins
2023-07-16 9:53 ` Mikhail Gavrilov
2023-07-16 3:12 ` Bagas Sanjaya
1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2023-07-16 2:41 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Linux Memory Management List, Linux List Kernel Mailing, hughd
On Sat, 15 Jul 2023, Mikhail Gavrilov wrote:
> Hi,
> It's ok that I see "mm/pgtable-generic.c:53: bad pmd
> (____ptrval____)(8000000100077061)" every boot time?
Many thanks for reporting, Mike. No, I wouldn't call that ok at all.
Though I've more research to do before I can tell how much it matters.
> Unfortunately bisect couldn't say which of commits
> # possible first bad commit:
> [be872f83bf571f4f9a0ac25e2c9c36e905a36619] mm/pagewalk:
> walk_pte_range() allow for pte_offset_map()
> # possible first bad commit:
> [7780d04046a2288ab85d88bedacc60fa4fad9971] mm/pagewalkers:
> ACTION_AGAIN if pte_offset_map_lock() fails
> # possible first bad commit:
> [2798bbe75b9c2752b46d292e5c2a49f49da36418] mm/page_vma_mapped:
> pte_offset_map_nolock() not pte_lockptr()
> # possible first bad commit:
> [90f43b0a13cddb09e2686f4d976751c0a9b8b197] mm/page_vma_mapped:
> reformat map_pte() with less indentation
> # possible first bad commit:
> [45fe85e9811ede2d65b21724cae50d6a0563e452] mm/page_vma_mapped: delete
> bogosity in page_vma_mapped_walk()
> # possible first bad commit:
> [65747aaf42b7db6acb8e57a2b8e9959928f404dd] mm/filemap: allow
> pte_offset_map_lock() to fail
> # possible first bad commit:
> [0d940a9b270b9220dcff74d8e9123c9788365751] mm/pgtable: allow
> pte_offset_map[_lock]() to fail
> definitely first bad because my machine on which I am was doing
> bisection is unbootable on these commits.
> I hope Hugh Dickins can figure out what's going on here. He is the
> author of these commits.
And thanks for the patient bisecting. Yes, it will be 0d940a9b270b
which introduced the unexpected problem, then be872f83bf5 which fixed
the unbootability aspect (that's right, isn't it? with be872f83bf5 in,
your machine booted ok? but in between it was unbootable).
Very useful info, since it narrowed the symptom down to users of
that pagewalker, before it was allowing for NULL from pte_offset_map()
(we were not expecting ever to hit a bad pmd in normal circumstances).
I have now been able to reproduce such a message, by setting
CONFIG_EFI_PGT_DUMP=y - am I guessing correctly that you have that?
For now, I recommend that you leave CONFIG_EFI_PGT_DUMP unset.
I wonder how many other people have it set, but have not yet noticed
this "bad pmd" message you are reporting.
The problem comes from a confluence of surprises: the pagewalker
now makes an exception for init_mm, but efi_mm is another odd case;
and espfix sets up pmd entries in an unconventional way, which happens
to fit the "bad pmd" criterion; then the efi_mm pgt dump discovers them.
I'm not rushing to judgment on where and what the right fix will be,
that needs some reflection. And perhaps more urgent than that, is that
I got not one but 12 such messages (with 4 processors): that's another
surprise, I would have expected the condition to be cleared after the
first message (but that clearing to ruin the running of Win16 binaries).
More will follow, at lower priority; but if I'm wrong about you having
CONFIG_EFI_PGT_DUMP=y, and unsetting it hiding the issue, please speak up.
Thanks,
Hugh
>
> All mine machines are based on the AMD platform two 7950X and one 5900HX.
>
> It seems that this message is harmless for the system in any way, but
> I can't judge it is a bug or not.
> From the user side it looks like regression because on commit
> 46c475bd676bb05077c8a38b37f175552f035406 this message was absent.
>
> --
> Best Regards,
> Mike Gavrilov.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time
2023-07-15 9:24 [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time Mikhail Gavrilov
2023-07-16 2:41 ` Hugh Dickins
@ 2023-07-16 3:12 ` Bagas Sanjaya
2023-07-16 11:23 ` Bagas Sanjaya
1 sibling, 1 reply; 5+ messages in thread
From: Bagas Sanjaya @ 2023-07-16 3:12 UTC (permalink / raw)
To: Mikhail Gavrilov, Linux Memory Management List,
Linux Kernel Mailing List, hughd
Cc: Linux Regressions
[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]
On Sat, Jul 15, 2023 at 02:24:59PM +0500, Mikhail Gavrilov wrote:
> Hi,
> It's ok that I see "mm/pgtable-generic.c:53: bad pmd
> (____ptrval____)(8000000100077061)" every boot time?
> Unfortunately bisect couldn't say which of commits
> # possible first bad commit:
> [be872f83bf571f4f9a0ac25e2c9c36e905a36619] mm/pagewalk:
> walk_pte_range() allow for pte_offset_map()
> # possible first bad commit:
> [7780d04046a2288ab85d88bedacc60fa4fad9971] mm/pagewalkers:
> ACTION_AGAIN if pte_offset_map_lock() fails
> # possible first bad commit:
> [2798bbe75b9c2752b46d292e5c2a49f49da36418] mm/page_vma_mapped:
> pte_offset_map_nolock() not pte_lockptr()
> # possible first bad commit:
> [90f43b0a13cddb09e2686f4d976751c0a9b8b197] mm/page_vma_mapped:
> reformat map_pte() with less indentation
> # possible first bad commit:
> [45fe85e9811ede2d65b21724cae50d6a0563e452] mm/page_vma_mapped: delete
> bogosity in page_vma_mapped_walk()
> # possible first bad commit:
> [65747aaf42b7db6acb8e57a2b8e9959928f404dd] mm/filemap: allow
> pte_offset_map_lock() to fail
> # possible first bad commit:
> [0d940a9b270b9220dcff74d8e9123c9788365751] mm/pgtable: allow
> pte_offset_map[_lock]() to fail
> definitely first bad because my machine on which I am was doing
> bisection is unbootable on these commits.
> I hope Hugh Dickins can figure out what's going on here. He is the
> author of these commits.
>
> All mine machines are based on the AMD platform two 7950X and one 5900HX.
>
> It seems that this message is harmless for the system in any way, but
> I can't judge it is a bug or not.
> >From the user side it looks like regression because on commit
> 46c475bd676bb05077c8a38b37f175552f035406 this message was absent.
What are you doing on your system that leads into this regression?
Anyway, I'm adding this regression to be tracked with regzbot:
#regzbot ^introduced: 0d940a9b270b92
#regzbot title: undescribed regression due to allowing failing pte_offset_map[_lock]()
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time
2023-07-16 2:41 ` Hugh Dickins
@ 2023-07-16 9:53 ` Mikhail Gavrilov
0 siblings, 0 replies; 5+ messages in thread
From: Mikhail Gavrilov @ 2023-07-16 9:53 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Linux Memory Management List, Linux List Kernel Mailing
On Sun, Jul 16, 2023 at 7:42 AM Hugh Dickins <hughd@google.com> wrote:
>
> And thanks for the patient bisecting. Yes, it will be 0d940a9b270b
> which introduced the unexpected problem, then be872f83bf5 which fixed
> the unbootability aspect (that's right, isn't it? with be872f83bf5 in,
> your machine booted ok? but in between it was unbootable).
Absolutely right.
> Very useful info, since it narrowed the symptom down to users of
> that pagewalker, before it was allowing for NULL from pte_offset_map()
> (we were not expecting ever to hit a bad pmd in normal circumstances).
>
> I have now been able to reproduce such a message, by setting
> CONFIG_EFI_PGT_DUMP=y - am I guessing correctly that you have that?
Yes.
$ cat .config | grep CONFIG_EFI_PGT_DUMP
CONFIG_EFI_PGT_DUMP=y
But distro Fedora has been set this setting to "Y" since 2016.
https://src.fedoraproject.org/rpms/kernel/blob/1b7eeb80190501aaf226e90e8f58f994cfc3efe0/f/kernel-x86_64-debug.config#_1293
commit 1b7eeb80190501aaf226e90e8f58f994cfc3efe0
Author: Laura Abbott <labbott@fedoraproject.org>
Date: Thu Nov 10 10:16:25 2016 -0800
Change method of configuration generation
The existing method of managing configuration files gets unweildy.
Changing individual lines in text files gets difficult without
manual organization. Switch to a method of configuration generation
that's inspired from the method used inside Red Hat. Each configuration
option gets its own file which are then combined to form the
configuration files. This makes confirming what's actually enabled much
easier.
> For now, I recommend that you leave CONFIG_EFI_PGT_DUMP unset.
> I wonder how many other people have it set, but have not yet noticed
> this "bad pmd" message you are reporting.
>
> The problem comes from a confluence of surprises: the pagewalker
> now makes an exception for init_mm, but efi_mm is another odd case;
> and espfix sets up pmd entries in an unconventional way, which happens
> to fit the "bad pmd" criterion; then the efi_mm pgt dump discovers them.
>
> I'm not rushing to judgment on where and what the right fix will be,
> that needs some reflection. And perhaps more urgent than that, is that
> I got not one but 12 such messages (with 4 processors): that's another
> surprise, I would have expected the condition to be cleared after the
> first message (but that clearing to ruin the running of Win16 binaries).
>
> More will follow, at lower priority; but if I'm wrong about you having
> CONFIG_EFI_PGT_DUMP=y, and unsetting it hiding the issue, please speak up.
I confirm after unsetting CONFIG_EFI_PGT_DUMP the "bad pmd" message
didn't appear any more.
--
Best Regards,
Mike Gavrilov.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time
2023-07-16 3:12 ` Bagas Sanjaya
@ 2023-07-16 11:23 ` Bagas Sanjaya
0 siblings, 0 replies; 5+ messages in thread
From: Bagas Sanjaya @ 2023-07-16 11:23 UTC (permalink / raw)
To: Mikhail Gavrilov, Linux Memory Management List,
Linux Kernel Mailing List, hughd
Cc: Linux Regressions, Laura Abbott
[-- Attachment #1: Type: text/plain, Size: 521 bytes --]
On Sun, Jul 16, 2023 at 10:12:13AM +0700, Bagas Sanjaya wrote:
> #regzbot ^introduced: 0d940a9b270b92
> #regzbot title: undescribed regression due to allowing failing pte_offset_map[_lock]()
>
Updating entry title (see [1] for why):
#regzbot title: CONFIG_EFI_PGT_DUMP regression due to allowing failing pte_offset_map[_lock]()
Thanks.
[1]: https://lore.kernel.org/linux-mm/CABXGCsMaMgcPskMHPL+E=cOf9YMyaSnxg2dMa2jWO7qbjZGkjQ@mail.gmail.com/
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-07-16 11:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-15 9:24 [bug/bisected] I see "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)" every boot time Mikhail Gavrilov
2023-07-16 2:41 ` Hugh Dickins
2023-07-16 9:53 ` Mikhail Gavrilov
2023-07-16 3:12 ` Bagas Sanjaya
2023-07-16 11:23 ` Bagas Sanjaya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox