* 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
@ 2023-08-31 22:45 Mikhail Gavrilov
2023-08-31 23:35 ` Bagas Sanjaya
2023-09-01 7:29 ` Hugh Dickins
0 siblings, 2 replies; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-08-31 22:45 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Linux List Kernel Mailing, Linux Memory Management List
[-- Attachment #1: Type: text/plain, Size: 4869 bytes --]
Hi,
next release cycle, and another regression.
Yesterday after another kernel update in Fedora Rawhide system stopped booting.
Today thanks to git bisect, I found out that this is a commit:
❯ git bisect bad
a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit
commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
Author: Hugh Dickins <hughd@google.com>
Date: Tue Jul 11 21:30:40 2023 -0700
mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
Patch series "mm: free retracted page table by RCU", v3.
Some mmap_lock avoidance i.e. latency reduction. Initially just for the
case of collapsing shmem or file pages to THPs: the usefulness of
MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it
currently requires.
Likely to be relied upon later in other contexts e.g. freeing of empty
page tables (but that's not work I'm doing). mmap_write_lock avoidance
when collapsing to anon THPs? Perhaps, but again that's not work I've
done: a quick attempt was not as easy as the shmem/file case.
These changes (though of course not these exact patches) have been in
Google's data centre kernel for three years now: we do rely upon them.
This patch (of 13):
Before putting them to use (several commits later), add rcu_read_lock() to
pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a
separate commit, since it risks exposing imbalances: prior commits have
fixed all the known imbalances, but we may find some have been missed.
Link: https://lkml.kernel.org/r/7cd843a9-aa80-14f-5eb2-33427363c20@google.com
Link: https://lkml.kernel.org/r/d3b01da5-2a6-833c-6681-67a3e024a16f@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
include/linux/pgtable.h | 4 ++--
mm/pgtable-generic.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
It looks like the hang happens so early that when booting into a
working kernel and running "journalctl -b -1" I see in the console the
log of the previous kernel which was booted before the problematic
kernel.
Therefore, I apologize that I can't provide the kernel logs.
I can provides only photos when backtrace appears on my monitor:
Here we waiting: https://ibb.co/5xmm0BH
And then I see backtrace: https://ibb.co/TLLGFNP
Unfortunately I can't revert commit
a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds
because of conflicts.
My hardware: https://linux-hardware.org/?probe=dd5735f315
I also attached kernel build config and full bisect log.
--
Best Regards,
Mike Gavrilov.
[-- Attachment #2: .config.zip --]
[-- Type: application/zip, Size: 64077 bytes --]
[-- Attachment #3: git-bisect-system-not-boot.zip --]
[-- Type: application/zip, Size: 1875 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-08-31 22:45 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting Mikhail Gavrilov
@ 2023-08-31 23:35 ` Bagas Sanjaya
2023-09-01 7:29 ` Hugh Dickins
1 sibling, 0 replies; 9+ messages in thread
From: Bagas Sanjaya @ 2023-08-31 23:35 UTC (permalink / raw)
To: Mikhail Gavrilov, Hugh Dickins, Andrew Morton
Cc: Linux List Kernel Mailing, Linux Memory Management List,
Linux Regressions
[-- Attachment #1: Type: text/plain, Size: 3140 bytes --]
On Fri, Sep 01, 2023 at 03:45:28AM +0500, Mikhail Gavrilov wrote:
> Hi,
> next release cycle, and another regression.
> Yesterday after another kernel update in Fedora Rawhide system stopped booting.
> Today thanks to git bisect, I found out that this is a commit:
>
> ❯ git bisect bad
> a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit
> commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
> Author: Hugh Dickins <hughd@google.com>
> Date: Tue Jul 11 21:30:40 2023 -0700
>
> mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
>
> Patch series "mm: free retracted page table by RCU", v3.
>
> Some mmap_lock avoidance i.e. latency reduction. Initially just for the
> case of collapsing shmem or file pages to THPs: the usefulness of
> MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it
> currently requires.
>
> Likely to be relied upon later in other contexts e.g. freeing of empty
> page tables (but that's not work I'm doing). mmap_write_lock avoidance
> when collapsing to anon THPs? Perhaps, but again that's not work I've
> done: a quick attempt was not as easy as the shmem/file case.
>
> These changes (though of course not these exact patches) have been in
> Google's data centre kernel for three years now: we do rely upon them.
>
>
> This patch (of 13):
>
> Before putting them to use (several commits later), add rcu_read_lock() to
> pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a
> separate commit, since it risks exposing imbalances: prior commits have
> fixed all the known imbalances, but we may find some have been missed.
>
> Link: https://lkml.kernel.org/r/7cd843a9-aa80-14f-5eb2-33427363c20@google.com
> Link: https://lkml.kernel.org/r/d3b01da5-2a6-833c-6681-67a3e024a16f@google.com
> Signed-off-by: Hugh Dickins <hughd@google.com>
> <long cc list omitted>...
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> include/linux/pgtable.h | 4 ++--
> mm/pgtable-generic.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> It looks like the hang happens so early that when booting into a
> working kernel and running "journalctl -b -1" I see in the console the
> log of the previous kernel which was booted before the problematic
> kernel.
> Therefore, I apologize that I can't provide the kernel logs.
> I can provides only photos when backtrace appears on my monitor:
> Here we waiting: https://ibb.co/5xmm0BH
> And then I see backtrace: https://ibb.co/TLLGFNP
>
> Unfortunately I can't revert commit
> a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds
> because of conflicts.
>
> My hardware: https://linux-hardware.org/?probe=dd5735f315
> I also attached kernel build config and full bisect log.
>
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: a349d72fd9efc8
#regzbot title: rcu_read_{lock,unlock}() causes unbootable system with backtrace
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-08-31 22:45 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting Mikhail Gavrilov
2023-08-31 23:35 ` Bagas Sanjaya
@ 2023-09-01 7:29 ` Hugh Dickins
2023-09-01 8:45 ` Mikhail Gavrilov
1 sibling, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-09-01 7:29 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Hugh Dickins, Andrew Morton, Bagas Sanjaya, linux-kernel,
linux-mm, regressions
[-- Attachment #1: Type: text/plain, Size: 5948 bytes --]
On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:
> Hi,
> next release cycle, and another regression.
> Yesterday after another kernel update in Fedora Rawhide system stopped booting.
Many thanks for reporting, Mike: I'm sorry that it never showed up
while in linux-next, leaving you to be the one to hit it again.
> Today thanks to git bisect, I found out that this is a commit:
>
> ❯ git bisect bad
> a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit
> commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
> Author: Hugh Dickins <hughd@google.com>
> Date: Tue Jul 11 21:30:40 2023 -0700
>
> mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
...
> Before putting them to use (several commits later), add rcu_read_lock() to
> pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a
> separate commit, since it risks exposing imbalances: prior commits have
> fixed all the known imbalances, but we may find some have been missed.
I assume that it is such an imbalance - somewhere omitting to
pte_unmap() after a pte_offset_map(); but I cannot see where.
> It looks like the hang happens so early that when booting into a
> working kernel and running "journalctl -b -1" I see in the console the
> log of the previous kernel which was booted before the problematic
> kernel.
> Therefore, I apologize that I can't provide the kernel logs.
> I can provides only photos when backtrace appears on my monitor:
> Here we waiting: https://ibb.co/5xmm0BH
> And then I see backtrace: https://ibb.co/TLLGFNP
>
> Unfortunately I can't revert commit
> a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds
> because of conflicts.
>
> My hardware: https://linux-hardware.org/?probe=dd5735f315
> I also attached kernel build config and full bisect log.
Thanks for all the info, which has helped in several ways. The only
thing I can do is to offer you a debug (and then keep running) patch -
suitable for the config you showed there, not for anyone else's config.
I've never used stackdepot before, but I've tried this out in good and
bad cases, and expect it to work for you, shedding light on where is
going wrong - machine should boot up fine, and in dmesg you'll find one
stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
To apply on top of a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
rcu_read_unlock()s"), the bad end point of your bisection; but if you
prefer, I can provide a version to go on top of whatever later Linus
commit suits you.
Patch not for general consumption, just for Mike's debugging:
please report back the stacktrace shown - thanks!
Hugh
---
include/linux/pgtable.h | 5 +----
mm/memory.c | 1 +
mm/mremap.c | 1 +
mm/pgtable-generic.c | 40 ++++++++++++++++++++++++++++++++++++++--
4 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5134edcec668..131392f1c33e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -106,10 +106,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address)
{
return pte_offset_kernel(pmd, address);
}
-static inline void pte_unmap(pte_t *pte)
-{
- rcu_read_unlock();
-}
+void pte_unmap(pte_t *pte);
#endif
/* Find an entry in the second-level page table.. */
diff --git a/mm/memory.c b/mm/memory.c
index 44d11812a88f..b1ee8ab51978 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1033,6 +1033,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
ret = -ENOMEM;
goto out;
}
+ pte_unmap(NULL); /* avoid warning when knowingly nested */
src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl);
if (!src_pte) {
pte_unmap_unlock(dst_pte, dst_ptl);
diff --git a/mm/mremap.c b/mm/mremap.c
index 11e06e4ab33b..56d981add487 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -175,6 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
err = -EAGAIN;
goto out;
}
+ pte_unmap(NULL); /* avoid warning when knowingly nested */
new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl);
if (!new_pte) {
pte_unmap_unlock(old_pte, old_ptl);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 400e5a045848..87cbdc73beda 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -232,11 +232,47 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
#endif
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#include <linux/stacktrace.h>
+#include <linux/stackdepot.h>
+#include <linux/timekeeping.h>
+
+static depot_stack_handle_t depot_stack;
+
+static void pte_map(void)
+{
+ static bool done = false;
+ unsigned long entries[16];
+ unsigned int nr_entries;
+
+ /* rcu_read_lock(); */
+ if (raw_smp_processor_id() != 0 || done)
+ return;
+ if (depot_stack) {
+ pr_warn("WARNING: pte_map was not pte_unmapped:\n");
+ stack_depot_print(depot_stack);
+ pr_warn("End of pte_map warning.\n");
+ done = true;
+ return;
+ }
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+ depot_stack = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
+ if (ktime_get_seconds() > 1800) /* give up after half an hour */
+ done = true;
+}
+
+void pte_unmap(pte_t *pte)
+{
+ /* rcu_read_unlock(); */
+ if (raw_smp_processor_id() != 0)
+ return;
+ depot_stack = 0;
+}
+
pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
{
pmd_t pmdval;
- rcu_read_lock();
+ pte_map();
pmdval = pmdp_get_lockless(pmd);
if (pmdvalp)
*pmdvalp = pmdval;
@@ -250,7 +286,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
}
return __pte_map(&pmdval, addr);
nomap:
- rcu_read_unlock();
+ pte_unmap(NULL);
return NULL;
}
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-01 7:29 ` Hugh Dickins
@ 2023-09-01 8:45 ` Mikhail Gavrilov
2023-09-01 9:08 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-09-01 8:45 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Bagas Sanjaya, linux-kernel, linux-mm, regressions
[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]
On Fri, Sep 1, 2023 at 12:29 PM Hugh Dickins <hughd@google.com> wrote:
>
>
> Thanks for all the info, which has helped in several ways. The only
> thing I can do is to offer you a debug (and then keep running) patch -
> suitable for the config you showed there, not for anyone else's config.
>
> I've never used stackdepot before, but I've tried this out in good and
> bad cases, and expect it to work for you, shedding light on where is
> going wrong - machine should boot up fine, and in dmesg you'll find one
> stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
>
> To apply on top of a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
> rcu_read_unlock()s"), the bad end point of your bisection; but if you
> prefer, I can provide a version to go on top of whatever later Linus
> commit suits you.
>
> Patch not for general consumption, just for Mike's debugging:
> please report back the stacktrace shown - thanks!
>
Thanks for digging into the problem.
With the attached patch I got FTBFS when build kernel at commit a349d72fd9ef.
LD [M] drivers/gpu/drm/amd/amdgpu/amdgpu.o
MODPOST Module.symvers
ERROR: modpost: "pte_unmap" [arch/x86/kvm/kvm.ko] undefined!
ERROR: modpost: "pte_unmap" [drivers/vfio/vfio_iommu_type1.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:144: Module.symvers] Error 1
make[1]: *** [/home/mikhail/packaging-work/git/linux/Makefile:1984:
modpost] Error 2
make: *** [Makefile:234: __sub-make] Error 2
--
Best Regards,
Mike Gavrilov.
[-- Attachment #2: build-log.zip --]
[-- Type: application/zip, Size: 130549 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-01 8:45 ` Mikhail Gavrilov
@ 2023-09-01 9:08 ` Hugh Dickins
2023-09-01 12:17 ` Mikhail Gavrilov
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-09-01 9:08 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Hugh Dickins, Andrew Morton, Bagas Sanjaya, linux-kernel,
linux-mm, regressions
[-- Attachment #1: Type: text/plain, Size: 4952 bytes --]
On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:
> On Fri, Sep 1, 2023 at 12:29 PM Hugh Dickins <hughd@google.com> wrote:
> >
> >
> > Thanks for all the info, which has helped in several ways. The only
> > thing I can do is to offer you a debug (and then keep running) patch -
> > suitable for the config you showed there, not for anyone else's config.
> >
> > I've never used stackdepot before, but I've tried this out in good and
> > bad cases, and expect it to work for you, shedding light on where is
> > going wrong - machine should boot up fine, and in dmesg you'll find one
> > stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
> >
> > To apply on top of a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
> > rcu_read_unlock()s"), the bad end point of your bisection; but if you
> > prefer, I can provide a version to go on top of whatever later Linus
> > commit suits you.
> >
> > Patch not for general consumption, just for Mike's debugging:
> > please report back the stacktrace shown - thanks!
> >
>
> Thanks for digging into the problem.
> With the attached patch I got FTBFS when build kernel at commit a349d72fd9ef.
>
>
> LD [M] drivers/gpu/drm/amd/amdgpu/amdgpu.o
> MODPOST Module.symvers
> ERROR: modpost: "pte_unmap" [arch/x86/kvm/kvm.ko] undefined!
> ERROR: modpost: "pte_unmap" [drivers/vfio/vfio_iommu_type1.ko] undefined!
> make[2]: *** [scripts/Makefile.modpost:144: Module.symvers] Error 1
> make[1]: *** [/home/mikhail/packaging-work/git/linux/Makefile:1984:
> modpost] Error 2
> make: *** [Makefile:234: __sub-make] Error 2
Sorry about that, please try this instead, adds EXPORT_SYMBOL(pte_unmap).
---
include/linux/pgtable.h | 5 +----
mm/memory.c | 1 +
mm/mremap.c | 1 +
mm/pgtable-generic.c | 41 +++++++++++++++++++++++++++++++++++++++--
4 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5134edcec668..131392f1c33e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -106,10 +106,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address)
{
return pte_offset_kernel(pmd, address);
}
-static inline void pte_unmap(pte_t *pte)
-{
- rcu_read_unlock();
-}
+void pte_unmap(pte_t *pte);
#endif
/* Find an entry in the second-level page table.. */
diff --git a/mm/memory.c b/mm/memory.c
index 44d11812a88f..b1ee8ab51978 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1033,6 +1033,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
ret = -ENOMEM;
goto out;
}
+ pte_unmap(NULL); /* avoid warning when knowingly nested */
src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl);
if (!src_pte) {
pte_unmap_unlock(dst_pte, dst_ptl);
diff --git a/mm/mremap.c b/mm/mremap.c
index 11e06e4ab33b..56d981add487 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -175,6 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
err = -EAGAIN;
goto out;
}
+ pte_unmap(NULL); /* avoid warning when knowingly nested */
new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl);
if (!new_pte) {
pte_unmap_unlock(old_pte, old_ptl);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 400e5a045848..958ee5cf91b1 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -232,11 +232,48 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
#endif
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#include <linux/stacktrace.h>
+#include <linux/stackdepot.h>
+#include <linux/timekeeping.h>
+
+static depot_stack_handle_t depot_stack;
+
+static void pte_map(void)
+{
+ static bool done = false;
+ unsigned long entries[16];
+ unsigned int nr_entries;
+
+ /* rcu_read_lock(); */
+ if (raw_smp_processor_id() != 0 || done)
+ return;
+ if (depot_stack) {
+ pr_warn("WARNING: pte_map was not pte_unmapped:\n");
+ stack_depot_print(depot_stack);
+ pr_warn("End of pte_map warning.\n");
+ done = true;
+ return;
+ }
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+ depot_stack = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
+ if (ktime_get_seconds() > 1800) /* give up after half an hour */
+ done = true;
+}
+
+void pte_unmap(pte_t *pte)
+{
+ /* rcu_read_unlock(); */
+ if (raw_smp_processor_id() != 0)
+ return;
+ depot_stack = 0;
+}
+EXPORT_SYMBOL(pte_unmap);
+
pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
{
pmd_t pmdval;
- rcu_read_lock();
+ pte_map();
pmdval = pmdp_get_lockless(pmd);
if (pmdvalp)
*pmdvalp = pmdval;
@@ -250,7 +287,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
}
return __pte_map(&pmdval, addr);
nomap:
- rcu_read_unlock();
+ pte_unmap(NULL);
return NULL;
}
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-01 9:08 ` Hugh Dickins
@ 2023-09-01 12:17 ` Mikhail Gavrilov
2023-09-01 22:48 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-09-01 12:17 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Bagas Sanjaya, linux-kernel, linux-mm, regressions
[-- Attachment #1: Type: text/plain, Size: 687 bytes --]
On Fri, Sep 1, 2023 at 2:08 PM Hugh Dickins <hughd@google.com> wrote:
>
>
> Sorry about that, please try this instead, adds EXPORT_SYMBOL(pte_unmap).
>
Thanks, now I have a working kernel builded at commit a349d72fd9ef.
> I've never used stackdepot before, but I've tried this out in good and
> bad cases, and expect it to work for you, shedding light on where is
> going wrong - machine should boot up fine, and in dmesg you'll find one
> stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
Interesting, I checked twice but I didn't find any entry with
"pte_map" in the kernel log after applying your patch.
--
Best Regards,
Mike Gavrilov.
[-- Attachment #2: dmesg.zip --]
[-- Type: application/zip, Size: 47546 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-01 12:17 ` Mikhail Gavrilov
@ 2023-09-01 22:48 ` Hugh Dickins
2023-09-02 9:51 ` Mikhail Gavrilov
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2023-09-01 22:48 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Hugh Dickins, Andrew Morton, Bagas Sanjaya, linux-kernel,
linux-mm, regressions
[-- Attachment #1: Type: text/plain, Size: 2437 bytes --]
On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:
> On Fri, Sep 1, 2023 at 2:08 PM Hugh Dickins <hughd@google.com> wrote:
> >
> >
> > Sorry about that, please try this instead, adds EXPORT_SYMBOL(pte_unmap).
> >
>
> Thanks, now I have a working kernel builded at commit a349d72fd9ef.
>
> > I've never used stackdepot before, but I've tried this out in good and
> > bad cases, and expect it to work for you, shedding light on where is
> > going wrong - machine should boot up fine, and in dmesg you'll find one
> > stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.
>
> Interesting, I checked twice but I didn't find any entry with
> "pte_map" in the kernel log after applying your patch.
That was very disappointing: I found it hard to explain, but was thinking
of sending you a similar patch, doing the same check on all your 32 CPUs -
maybe the stall being on CPU 0 in your photo was accidental.
But now I think I have the shameful answer (which studying your dmesg,
and the 82328 jiffies at 86 seconds in your photo, did help me towards).
That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a
video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE
from the underflow in __rcu_read_unlock()).
Please revert the debug patch I sent yesterday (or earlier today), please
try booting with this one on top of a349d72fd9ef; and if that's successful,
then please go back to your original Rawhide tree and apply this on top of
that, to confirm that boots to a working system too - thanks.
With my apologies,
[PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()
[ Commit message yet to be written: it's actually something to go to
6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of
no case where it is actually hit. ]
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/pagewalk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 2022333805d3..9e7d0276c38a 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte = pte_offset_map(pmd, addr);
if (pte) {
err = walk_pte_range_inner(pte, addr, end, walk);
- if (walk->mm != &init_mm)
+ if (walk->mm != &init_mm && addr < TASK_SIZE)
pte_unmap(pte);
}
} else {
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-01 22:48 ` Hugh Dickins
@ 2023-09-02 9:51 ` Mikhail Gavrilov
2023-09-02 15:50 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-09-02 9:51 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Bagas Sanjaya, linux-kernel, linux-mm, regressions
On Sat, Sep 2, 2023 at 3:48 AM Hugh Dickins <hughd@google.com> wrote:
> That was very disappointing: I found it hard to explain, but was thinking
> of sending you a similar patch, doing the same check on all your 32 CPUs -
> maybe the stall being on CPU 0 in your photo was accidental.
>
> But now I think I have the shameful answer (which studying your dmesg,
> and the 82328 jiffies at 86 seconds in your photo, did help me towards).
>
> That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a
> video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE
> from the underflow in __rcu_read_unlock()).
>
> Please revert the debug patch I sent yesterday (or earlier today), please
> try booting with this one on top of a349d72fd9ef; and if that's successful,
> then please go back to your original Rawhide tree and apply this on top of
> that, to confirm that boots to a working system too - thanks.
>
> With my apologies,
>
> [PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()
>
> [ Commit message yet to be written: it's actually something to go to
> 6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of
> no case where it is actually hit. ]
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> mm/pagewalk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 2022333805d3..9e7d0276c38a 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> pte = pte_offset_map(pmd, addr);
> if (pte) {
> err = walk_pte_range_inner(pte, addr, end, walk);
> - if (walk->mm != &init_mm)
> + if (walk->mm != &init_mm && addr < TASK_SIZE)
> pte_unmap(pte);
> }
> } else {
> --
> 2.35.3
Great, this is the right patch.
Both build a349d72fd9ef and latest in Rawhide (now it is 99d99825fc07)
works fine after applying this patch.
So thank you a lot.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
--
Best Regards,
Mike Gavrilov.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting
2023-09-02 9:51 ` Mikhail Gavrilov
@ 2023-09-02 15:50 ` Hugh Dickins
0 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2023-09-02 15:50 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Hugh Dickins, Andrew Morton, Bagas Sanjaya, linux-kernel,
linux-mm, regressions
On Sat, 2 Sep 2023, Mikhail Gavrilov wrote:
>
> Great, this is the right patch.
> Both build a349d72fd9ef and latest in Rawhide (now it is 99d99825fc07)
> works fine after applying this patch.
> So thank you a lot.
> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Great, thanks so much Mike: and Linus already took it into his tree:
ee40d543e97d mm/pagewalk: fix bootstopping regression from extra pte_unmap()
Hugh
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-02 15:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31 22:45 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting Mikhail Gavrilov
2023-08-31 23:35 ` Bagas Sanjaya
2023-09-01 7:29 ` Hugh Dickins
2023-09-01 8:45 ` Mikhail Gavrilov
2023-09-01 9:08 ` Hugh Dickins
2023-09-01 12:17 ` Mikhail Gavrilov
2023-09-01 22:48 ` Hugh Dickins
2023-09-02 9:51 ` Mikhail Gavrilov
2023-09-02 15:50 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox