From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 982B9C4345F for ; Wed, 1 May 2024 11:35:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E172A6B0085; Wed, 1 May 2024 07:35:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC7886B0087; Wed, 1 May 2024 07:35:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8E836B0088; Wed, 1 May 2024 07:35:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AC44B6B0085 for ; Wed, 1 May 2024 07:35:41 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3CF41160893 for ; Wed, 1 May 2024 11:35:41 +0000 (UTC) X-FDA: 82069622082.21.92E6F92 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf11.hostedemail.com (Postfix) with ESMTP id 2827A40016 for ; Wed, 1 May 2024 11:35:38 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf11.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714563339; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u+arOgH3Wd1skEF4WeQUqcCIh7KBhTwz5hXNQM0yFKs=; b=PTBVUKSxLi/cZcgSSoM4TOOzNVFr7M3i0nH8sFyTes/Z7EfGLXQvu99rf2NC+794ZCn3ls lB/TweJ0BZEizC3Yp5LVb6McSvXmRcQHZ9Psw1oaJEtgh+M3P82+Wq4SkcP/lU0bk0JXhT qYFFcSZm5YF6xwvFP3ZWC9zD4v7OLVM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf11.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714563339; a=rsa-sha256; cv=none; b=cr+GimO227rlt2/TsBsk0s8smReYl2zQcH/xVSPs7H6v80z6ixNPv9fcFIPnsRLPtpKDOj NAgS3qvgVFfGz1rGg/DU357JN0F+jidSII576k/MvuAmLiVTygsXj0x+w1xTxMkO0BVqT4 jbaBGoe3q3qzV5YMeK2JDRWqXZ0O8rM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 561EC339; Wed, 1 May 2024 04:36:04 -0700 (PDT) Received: from [10.57.65.146] (unknown [10.57.65.146]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ADA683F793; Wed, 1 May 2024 04:35:36 -0700 (PDT) Message-ID: <9fb15315-6317-4bf4-a736-a8a44288b0c2@arm.com> Date: Wed, 1 May 2024 12:35:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Content-Language: en-GB To: Catalin Marinas , Will Deacon , Mark Rutland , Anshuman Khandual , Andrew Morton , Zi Yan , "Aneesh Kumar K.V" Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20240430133138.732088-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: <20240430133138.732088-1-ryan.roberts@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 1xfu7gti87gkzuia6rn1qo5385skfwnz X-Rspamd-Queue-Id: 2827A40016 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714563338-213403 X-HE-Meta: U2FsdGVkX1++i8UC3cMbGhbQjpEnJwvpe4oOzZp5Ok56tyuoE1BpeygQvzywsR6tkRMXE7lbMwVfn5AuNAz9UaUqJHOSNaV1G/CPNo8IlS/dzICDzc6hsuofWbe/WcVDfVRC/W1d5DnshGvtRzdCrW9szNOgFUTgV9Sq/6SBQQw0D5/SOHX7VyNc23PYyFAYDhL1sMc92PtbemcefNzkf3B+B7PjsVmVmvs70CPKtrCfVs3GBlNZK+OV7iX5BQ2doGUH4f1En39JFzTvHcLrv4MXn/IIEpF3hRly93CbIPcxMxP2fZgMinx/FtcsgLA1pmG66hoSfbb+elDpeWFEowsO6U/EWkxUuqKAY4JnORsobP1nafDQwEqlzH4RRUpnbVTwZ+SARyeayytWHV962QU81OFGbfe5lBxs2zStL60Bl9pJtY3Hv1hzgKJNktZJ/nGymvEP3ATv5GkYQa1bwDCe5V00niJJvgrmBdJLy8rk+RjLEczJdn/ObDN4Nfd+SsIilrpigyQxPEZdNY84rUS4fXT0L8raVaiBjWSxshUUtYd7IOw405qyxa3ziFg9zFXuLtZsNAwNvh6uwV+WVLgroQMqglN4bfkrW93Z3FA1PueHa/cKLS7MPd1y/1AHFQBbME2AE52DCKqjat6LmyIjZmCcDMzyh2N6XY7P92Ellu0/rRw1xLIfgsgHTPFqCyBIZl02fwhlzwL3mcMTfcLd9BHfysfNY1VeD0OSv0suHeYlFDYy1fWlY8VuyA1TJyRZ4wucDplpKWWW0HerhpCrfeYKXSk04ZoiqFQ7mkYboU6GrHkhdtbdtMWF1zlp8gLx6LBQsSR6DuXfDENRzYg1KQoQqucUlMFbRrFHoQVf2Y5mp6HRb+XYRyRg03MIAAE0irFQnuw6nIQETYdWAyw4r+L+u4Bc9rYVBI9/7MkuXIijLDgJH9K3bMuBiIlR8DNoSRPK2Kq3PwxG9D+ JcYEugAW BK7UISjRtGuMTKX9dEy2toOX650ySzTo+l/lbEYLCS8Vh8ICrUWYIz2oUTCTo6qDdUS2R/aUvHFCTsL8w9sjAWk8ZlY2N6I6T0yIxh7vzdPv99PHtVfsPIrVk7ZTaTsGOdaDrm9YJztKsu8KZczOK5X3bISom+SzX5qo0misxwMcA22Vrntd/so7AInJbhGi3zOa4BtMBLn/H0XtDfZ0TYfs1N1we7TTCDShTES8+A7AU9rXQ5tRzzqTW8g2udUU0rVhQsn08fYa8GSYrtr5+WLnWaWdShAXB5Z0hUDDZCExE0JFfQkOuK+PCkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Zi Yan, I'm hoping you might have some input on the below... On 30/04/2024 14:31, Ryan Roberts wrote: > __split_huge_pmd_locked() can be called for a present THP, devmap or > (non-present) migration entry. It calls pmdp_invalidate() > unconditionally on the pmdp and only determines if it is present or not > based on the returned old pmd. > > But arm64's pmd_mkinvalid(), called by pmdp_invalidate(), > unconditionally sets the PMD_PRESENT_INVALID flag, which causes future > pmd_present() calls to return true - even for a swap pmd. Therefore any > lockless pgtable walker could see the migration entry pmd in this state > and start interpretting the fields (e.g. pmd_pfn()) as if it were > present, leading to BadThings (TM). GUP-fast appears to be one such > lockless pgtable walker. > > While the obvious fix is for core-mm to avoid such calls for non-present > pmds (pmdp_invalidate() will also issue TLBI which is not necessary for > this case either), all other arches that implement pmd_mkinvalid() do it > in such a way that it is robust to being called with a non-present pmd. OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below... > So it is simpler and safer to make arm64 robust too. This approach means > we can even add tests to debug_vm_pgtable.c to validate the required > behaviour. > > This is a theoretical bug found during code review. I don't have any > test case to trigger it in practice. > > Cc: stable@vger.kernel.org > Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration") > Signed-off-by: Ryan Roberts > --- > > Hi all, > > v1 of this fix [1] took the approach of fixing core-mm to never call > pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64 > suffers this problem; all other arches are robust. So his suggestion was to > instead make arm64 robust in the same way and add tests to validate it. Despite > my stated reservations in the context of the v1 discussion, having thought on it > for a bit, I now agree with Zi Yan. Hence this post. > > Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is > remove it from there and have this go in through the arm64 tree? Assuming there > is agreement that this approach is right one. > > This applies on top of v6.9-rc5. Passes all the mm selftests on arm64. > > [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ > > Thanks, > Ryan > > > arch/arm64/include/asm/pgtable.h | 12 +++++-- > mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++ > 2 files changed, 71 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index afdd56d26ad7..7d580271a46d 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd) > > static inline pmd_t pmd_mkinvalid(pmd_t pmd) > { > - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); > - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); > + /* > + * If not valid then either we are already present-invalid or we are > + * not-present (i.e. none or swap entry). We must not convert > + * not-present to present-invalid. Unbelievably, the core-mm may call > + * pmd_mkinvalid() for a swap entry and all other arches can handle it. > + */ > + if (pmd_valid(pmd)) { > + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); > + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); > + } > > return pmd; > } > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > index 65c19025da3d..7e9c387d06b0 100644 > --- a/mm/debug_vm_pgtable.c > +++ b/mm/debug_vm_pgtable.c > @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { } > #endif /* CONFIG_HUGETLB_PAGE */ > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) > +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) > +{ Printing various values at different locations in this function for debug: > + unsigned long max_swap_offset; > + swp_entry_t swp_set, swp_clear, swp_convert; > + pmd_t pmd_set, pmd_clear; > + > + /* > + * See generic_max_swapfile_size(): probe the maximum offset, then > + * create swap entry will all possible bits set and a swap entry will > + * all bits clear. > + */ > + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL)))); > + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); > + swp_clear = swp_entry(0, 0); > + > + /* Convert to pmd. */ > + pmd_set = swp_entry_to_pmd(swp_set); > + pmd_clear = swp_entry_to_pmd(swp_clear); [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00 > + > + /* > + * Sanity check that the pmds are not-present, not-huge and swap entry > + * is recoverable without corruption. > + */ > + WARN_ON(pmd_present(pmd_set)); > + WARN_ON(pmd_trans_huge(pmd_set)); > + swp_convert = pmd_to_swp_entry(pmd_set); > + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); > + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); > + WARN_ON(pmd_present(pmd_clear)); > + WARN_ON(pmd_trans_huge(pmd_clear)); > + swp_convert = pmd_to_swp_entry(pmd_clear); > + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); > + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); > + > + /* Now invalidate the pmd. */ > + pmd_set = pmd_mkinvalid(pmd_set); > + pmd_clear = pmd_mkinvalid(pmd_clear); [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00 > + > + /* > + * Since its a swap pmd, invalidation should effectively be a noop and > + * the checks we already did should give the same answer. Check the > + * invalidation didn't corrupt any fields. > + */ > + WARN_ON(pmd_present(pmd_set)); > + WARN_ON(pmd_trans_huge(pmd_set)); > + swp_convert = pmd_to_swp_entry(pmd_set); [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff) > + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); > + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); > + WARN_ON(pmd_present(pmd_clear)); > + WARN_ON(pmd_trans_huge(pmd_clear)); > + swp_convert = pmd_to_swp_entry(pmd_clear); [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8) > + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); > + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); This line fails on x86_64. The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8. I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things. So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes. > +} > +#else > +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { } > +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ > + > static void __init pmd_thp_tests(struct pgtable_debug_args *args) > { > pmd_t pmd; > @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args) > WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); > WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); > #endif /* __HAVE_ARCH_PMDP_INVALIDATE */ > + > + swp_pmd_mkinvalid_tests(args); > } > > #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > -- > 2.25.1 >