From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Zi Yan <zi.yan@cs.rutgers.edu>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
Date: Tue, 30 Apr 2024 14:31:38 +0100 [thread overview]
Message-ID: <20240430133138.732088-1-ryan.roberts@arm.com> (raw)
__split_huge_pmd_locked() can be called for a present THP, devmap or
(non-present) migration entry. It calls pmdp_invalidate()
unconditionally on the pmdp and only determines if it is present or not
based on the returned old pmd.
But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
pmd_present() calls to return true - even for a swap pmd. Therefore any
lockless pgtable walker could see the migration entry pmd in this state
and start interpretting the fields (e.g. pmd_pfn()) as if it were
present, leading to BadThings (TM). GUP-fast appears to be one such
lockless pgtable walker.
While the obvious fix is for core-mm to avoid such calls for non-present
pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
this case either), all other arches that implement pmd_mkinvalid() do it
in such a way that it is robust to being called with a non-present pmd.
So it is simpler and safer to make arm64 robust too. This approach means
we can even add tests to debug_vm_pgtable.c to validate the required
behaviour.
This is a theoretical bug found during code review. I don't have any
test case to trigger it in practice.
Cc: stable@vger.kernel.org
Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
Hi all,
v1 of this fix [1] took the approach of fixing core-mm to never call
pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
suffers this problem; all other arches are robust. So his suggestion was to
instead make arm64 robust in the same way and add tests to validate it. Despite
my stated reservations in the context of the v1 discussion, having thought on it
for a bit, I now agree with Zi Yan. Hence this post.
Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
remove it from there and have this go in through the arm64 tree? Assuming there
is agreement that this approach is right one.
This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
[1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
Thanks,
Ryan
arch/arm64/include/asm/pgtable.h | 12 +++++--
mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..7d580271a46d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
static inline pmd_t pmd_mkinvalid(pmd_t pmd)
{
- pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
- pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
+ /*
+ * If not valid then either we are already present-invalid or we are
+ * not-present (i.e. none or swap entry). We must not convert
+ * not-present to present-invalid. Unbelievably, the core-mm may call
+ * pmd_mkinvalid() for a swap entry and all other arches can handle it.
+ */
+ if (pmd_valid(pmd)) {
+ pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
+ pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
+ }
return pmd;
}
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 65c19025da3d..7e9c387d06b0 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
#endif /* CONFIG_HUGETLB_PAGE */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
+static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
+{
+ unsigned long max_swap_offset;
+ swp_entry_t swp_set, swp_clear, swp_convert;
+ pmd_t pmd_set, pmd_clear;
+
+ /*
+ * See generic_max_swapfile_size(): probe the maximum offset, then
+ * create swap entry will all possible bits set and a swap entry will
+ * all bits clear.
+ */
+ max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
+ swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
+ swp_clear = swp_entry(0, 0);
+
+ /* Convert to pmd. */
+ pmd_set = swp_entry_to_pmd(swp_set);
+ pmd_clear = swp_entry_to_pmd(swp_clear);
+
+ /*
+ * Sanity check that the pmds are not-present, not-huge and swap entry
+ * is recoverable without corruption.
+ */
+ WARN_ON(pmd_present(pmd_set));
+ WARN_ON(pmd_trans_huge(pmd_set));
+ swp_convert = pmd_to_swp_entry(pmd_set);
+ WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
+ WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
+ WARN_ON(pmd_present(pmd_clear));
+ WARN_ON(pmd_trans_huge(pmd_clear));
+ swp_convert = pmd_to_swp_entry(pmd_clear);
+ WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
+ WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
+
+ /* Now invalidate the pmd. */
+ pmd_set = pmd_mkinvalid(pmd_set);
+ pmd_clear = pmd_mkinvalid(pmd_clear);
+
+ /*
+ * Since its a swap pmd, invalidation should effectively be a noop and
+ * the checks we already did should give the same answer. Check the
+ * invalidation didn't corrupt any fields.
+ */
+ WARN_ON(pmd_present(pmd_set));
+ WARN_ON(pmd_trans_huge(pmd_set));
+ swp_convert = pmd_to_swp_entry(pmd_set);
+ WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
+ WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
+ WARN_ON(pmd_present(pmd_clear));
+ WARN_ON(pmd_trans_huge(pmd_clear));
+ swp_convert = pmd_to_swp_entry(pmd_clear);
+ WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
+ WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
+}
+#else
+static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
+#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
+
static void __init pmd_thp_tests(struct pgtable_debug_args *args)
{
pmd_t pmd;
@@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
#endif /* __HAVE_ARCH_PMDP_INVALIDATE */
+
+ swp_pmd_mkinvalid_tests(args);
}
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
--
2.25.1
next reply other threads:[~2024-04-30 13:31 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 13:31 Ryan Roberts [this message]
2024-04-30 13:55 ` Will Deacon
2024-04-30 14:04 ` Ryan Roberts
2024-04-30 16:23 ` Catalin Marinas
2024-04-30 16:25 ` Ryan Roberts
2024-04-30 15:00 ` Zi Yan
2024-04-30 17:57 ` Catalin Marinas
2024-05-01 8:05 ` Ryan Roberts
2024-05-01 10:04 ` Catalin Marinas
2024-05-01 10:13 ` Ryan Roberts
2024-05-02 18:00 ` Catalin Marinas
2024-05-01 11:35 ` Ryan Roberts
2024-05-01 11:38 ` Ryan Roberts
2024-05-01 12:07 ` Zi Yan
2024-05-01 12:58 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430133138.732088-1-ryan.roberts@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox