* [PATCH v2] mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
@ 2023-07-01 1:04 John Hubbard
2023-07-03 8:53 ` Ryan Roberts
0 siblings, 1 reply; 2+ messages in thread
From: John Hubbard @ 2023-07-01 1:04 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, linux-mm, John Hubbard, James Houghton, Muchun Song,
Adrian Hunter, Al Viro, Alex Williamson, Alexander Potapenko,
Alexander Shishkin, Andrey Konovalov, Andrey Ryabinin,
Christian Brauner, Christoph Hellwig, Daniel Vetter, Dave Airlie,
Dimitri Sivanich, Dmitry Vyukov, Ian Rogers, Jason Gunthorpe,
Jiri Olsa, Johannes Weiner, Kirill A . Shutemov, Lorenzo Stoakes,
Mark Rutland, Matthew Wilcox, Miaohe Lin, Michal Hocko,
Mike Kravetz, Mike Rapoport, Namhyung Kim, Naoya Horiguchi,
Oleksandr Tyshchenko, Pavel Tatashin, Roman Gushchin,
Ryan Roberts, SeongJae Park, Shakeel Butt, Uladzislau Rezki,
Vincenzo Frascino, Yu Zhao
The following crash happens for me when running the -mm selftests
(below). Specifically, it happens while running the uffd-stress
subtests:
kernel BUG at mm/hugetlb.c:7249!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
RIP: 0010:huge_pte_alloc+0x12c/0x1a0
...
Call Trace:
<TASK>
? __die_body+0x63/0xb0
? die+0x9f/0xc0
? do_trap+0xab/0x180
? huge_pte_alloc+0x12c/0x1a0
? do_error_trap+0xc6/0x110
? huge_pte_alloc+0x12c/0x1a0
? handle_invalid_op+0x2c/0x40
? huge_pte_alloc+0x12c/0x1a0
? exc_invalid_op+0x33/0x50
? asm_exc_invalid_op+0x16/0x20
? __pfx_put_prev_task_idle+0x10/0x10
? huge_pte_alloc+0x12c/0x1a0
hugetlb_fault+0x1a3/0x1120
? finish_task_switch+0xb3/0x2a0
? lock_is_held_type+0xdb/0x150
handle_mm_fault+0xb8a/0xd40
? find_vma+0x5d/0xa0
do_user_addr_fault+0x257/0x5d0
exc_page_fault+0x7b/0x1f0
asm_exc_page_fault+0x22/0x30
That happens because a BUG() statement in huge_pte_alloc() attempts to
check that a pte, if present, is a hugetlb pte, but it does so in a
non-lockless-safe manner that leads to a false BUG() report.
We got here due to a couple of bugs, each of which by itself was not
quite enough to cause a problem:
First of all, before commit c33c794828f2("mm: ptep_get() conversion"),
the BUG() statement in huge_pte_alloc() was itself fragile: it relied
upon compiler behavior to only read the pte once, despite using it twice
in the same conditional.
Next, commit c33c794828f2 ("mm: ptep_get() conversion") broke that
delicate situation, by causing all direct pte reads to be done via
READ_ONCE(). And so READ_ONCE() got called twice within the same BUG()
conditional, leading to comparing (potentially, occasionally) different
versions of the pte, and thus to false BUG() reports.
Fix this by taking a single snapshot of the pte before using it in the
BUG conditional.
Now, that commit is only partially to blame here but, people doing
bisections will invariably land there, so this will help them find a fix
for a real crash. And also, the previous behavior was unlikely to ever
expose this bug--it was fragile, yet not actually broken.
So that's why I chose this commit for the Fixes tag, rather than the
commit that created the original BUG() statement.
Fixes: c33c794828f2 ("mm: ptep_get() conversion")
Acked-by: James Houghton <jthoughton@google.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
Changes since v1:
Added Acked-by's.
Fixed as per Ryan Roberts (thanks!): changed to ptep_get_lockless().
mm/hugetlb.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bce28cca73a1..64a3239b6407 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7246,7 +7246,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
pte = (pte_t *)pmd_alloc(mm, pud, addr);
}
}
- BUG_ON(pte && pte_present(ptep_get(pte)) && !pte_huge(ptep_get(pte)));
+
+ if (pte) {
+ pte_t pteval = ptep_get_lockless(pte);
+
+ BUG_ON(pte_present(pteval) && !pte_huge(pteval));
+ }
return pte;
}
base-commit: bf1fa6f15553df04f2bdd06190ccd5f388ab0777
--
2.41.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v2] mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
2023-07-01 1:04 [PATCH v2] mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison John Hubbard
@ 2023-07-03 8:53 ` Ryan Roberts
0 siblings, 0 replies; 2+ messages in thread
From: Ryan Roberts @ 2023-07-03 8:53 UTC (permalink / raw)
To: John Hubbard, Andrew Morton
Cc: LKML, linux-mm, James Houghton, Muchun Song, Adrian Hunter,
Al Viro, Alex Williamson, Alexander Potapenko,
Alexander Shishkin, Andrey Konovalov, Andrey Ryabinin,
Christian Brauner, Christoph Hellwig, Daniel Vetter, Dave Airlie,
Dimitri Sivanich, Dmitry Vyukov, Ian Rogers, Jason Gunthorpe,
Jiri Olsa, Johannes Weiner, Kirill A . Shutemov, Lorenzo Stoakes,
Mark Rutland, Matthew Wilcox, Miaohe Lin, Michal Hocko,
Mike Kravetz, Mike Rapoport, Namhyung Kim, Naoya Horiguchi,
Oleksandr Tyshchenko, Pavel Tatashin, Roman Gushchin,
SeongJae Park, Shakeel Butt, Uladzislau Rezki, Vincenzo Frascino,
Yu Zhao
On 01/07/2023 02:04, John Hubbard wrote:
> The following crash happens for me when running the -mm selftests
> (below). Specifically, it happens while running the uffd-stress
> subtests:
>
> kernel BUG at mm/hugetlb.c:7249!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
> Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
> RIP: 0010:huge_pte_alloc+0x12c/0x1a0
> ...
> Call Trace:
> <TASK>
> ? __die_body+0x63/0xb0
> ? die+0x9f/0xc0
> ? do_trap+0xab/0x180
> ? huge_pte_alloc+0x12c/0x1a0
> ? do_error_trap+0xc6/0x110
> ? huge_pte_alloc+0x12c/0x1a0
> ? handle_invalid_op+0x2c/0x40
> ? huge_pte_alloc+0x12c/0x1a0
> ? exc_invalid_op+0x33/0x50
> ? asm_exc_invalid_op+0x16/0x20
> ? __pfx_put_prev_task_idle+0x10/0x10
> ? huge_pte_alloc+0x12c/0x1a0
> hugetlb_fault+0x1a3/0x1120
> ? finish_task_switch+0xb3/0x2a0
> ? lock_is_held_type+0xdb/0x150
> handle_mm_fault+0xb8a/0xd40
> ? find_vma+0x5d/0xa0
> do_user_addr_fault+0x257/0x5d0
> exc_page_fault+0x7b/0x1f0
> asm_exc_page_fault+0x22/0x30
>
> That happens because a BUG() statement in huge_pte_alloc() attempts to
> check that a pte, if present, is a hugetlb pte, but it does so in a
> non-lockless-safe manner that leads to a false BUG() report.
>
> We got here due to a couple of bugs, each of which by itself was not
> quite enough to cause a problem:
>
> First of all, before commit c33c794828f2("mm: ptep_get() conversion"),
> the BUG() statement in huge_pte_alloc() was itself fragile: it relied
> upon compiler behavior to only read the pte once, despite using it twice
> in the same conditional.
>
> Next, commit c33c794828f2 ("mm: ptep_get() conversion") broke that
> delicate situation, by causing all direct pte reads to be done via
> READ_ONCE(). And so READ_ONCE() got called twice within the same BUG()
> conditional, leading to comparing (potentially, occasionally) different
> versions of the pte, and thus to false BUG() reports.
>
> Fix this by taking a single snapshot of the pte before using it in the
> BUG conditional.
>
> Now, that commit is only partially to blame here but, people doing
> bisections will invariably land there, so this will help them find a fix
> for a real crash. And also, the previous behavior was unlikely to ever
> expose this bug--it was fragile, yet not actually broken.
>
> So that's why I chose this commit for the Fixes tag, rather than the
> commit that created the original BUG() statement.
>
> Fixes: c33c794828f2 ("mm: ptep_get() conversion")
> Acked-by: James Houghton <jthoughton@google.com>
> Acked-by: Muchun Song <songmuchun@bytedance.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrey Konovalov <andreyknvl@gmail.com>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Ian Rogers <irogers@google.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>
> Changes since v1:
>
> Added Acked-by's.
>
> Fixed as per Ryan Roberts (thanks!): changed to ptep_get_lockless().
>
>
> mm/hugetlb.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bce28cca73a1..64a3239b6407 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7246,7 +7246,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> pte = (pte_t *)pmd_alloc(mm, pud, addr);
> }
> }
> - BUG_ON(pte && pte_present(ptep_get(pte)) && !pte_huge(ptep_get(pte)));
> +
> + if (pte) {
> + pte_t pteval = ptep_get_lockless(pte);
> +
> + BUG_ON(pte_present(pteval) && !pte_huge(pteval));
> + }
>
> return pte;
> }
>
> base-commit: bf1fa6f15553df04f2bdd06190ccd5f388ab0777
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-07-03 8:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-01 1:04 [PATCH v2] mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison John Hubbard
2023-07-03 8:53 ` Ryan Roberts
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox