From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8761C4345F for ; Wed, 1 May 2024 12:58:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FC016B008C; Wed, 1 May 2024 08:58:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 186B96B0092; Wed, 1 May 2024 08:58:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04C486B0095; Wed, 1 May 2024 08:58:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D91DD6B008C for ; Wed, 1 May 2024 08:58:30 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 59E4B80B02 for ; Wed, 1 May 2024 12:58:30 +0000 (UTC) X-FDA: 82069830780.30.C6270D1 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 60E191C0029 for ; Wed, 1 May 2024 12:58:28 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714568308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j1GxNp6W7Zsc9FpBNsTUQVyfy1kDRZKDdg9BiHHnB+0=; b=NgIdmnn/trsIhEl5ELrezCUhEmLUG72ViJLiLp/OqaK3U8YgDuL9drddfrL94jm854j/pY KKWYwJB1RFUD19FLUa6Adkfa7lGboBh6o8qPrXXm0pmFvAj0oB0KMnEgcqTNUvsBib6Qlp jb9KYyxBqb27vHSz8vCu/5sfOFGtlYA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714568308; a=rsa-sha256; cv=none; b=rBH1M98tFu1bWABXJ+6gWLM8xrlXfRNwxR9TQndPD2SeLe2qlm/xds5G1o2k3WngbDNP68 FuFqwffjGUZB/9Zl3aDqtxse37PnjjnTdfXovPP/HuQ4et64GnaQO17q2RSz5Sy2yENEXR w5yEK142/E/CfaCrHvFrVt4uSYhOqX4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7868D2F4; Wed, 1 May 2024 05:58:53 -0700 (PDT) Received: from [10.57.65.146] (unknown [10.57.65.146]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A03A43F73F; Wed, 1 May 2024 05:58:25 -0700 (PDT) Message-ID: <2a1b4275-39c5-4c9e-830f-cde16e81c12c@arm.com> Date: Wed, 1 May 2024 13:58:24 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Content-Language: en-GB To: Zi Yan Cc: Catalin Marinas , Will Deacon , Mark Rutland , Anshuman Khandual , Andrew Morton , "Aneesh Kumar K.V" , David Hildenbrand , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20240430133138.732088-1-ryan.roberts@arm.com> <9fb15315-6317-4bf4-a736-a8a44288b0c2@arm.com> <0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com> <4DDEF271-9DDE-4D24-9F0C-13046CE78C6C@nvidia.com> From: Ryan Roberts In-Reply-To: <4DDEF271-9DDE-4D24-9F0C-13046CE78C6C@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 60E191C0029 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: bgbkae315i6krfwf6aaypp88qnoxkh5p X-HE-Tag: 1714568308-977195 X-HE-Meta: U2FsdGVkX19+/mLDxlkZTGx4LadqOk1CVrHL6XN/6FhrCf9uJEW6K7hFeH94B6ZgVMvIJIc1t7az61JSxchreYQF+2F22aMtMuHfem2QmeoVW/zq+VGjjSW1wT9if1SO8o+7MU3sACqMujjbsM8k5Yp9Fj7lPGu2HR9RHWEJGkZ8aIg1xKccS85mkaWRhlVaS/nwJzJR/+IH6eeSAmo/el+oi7xt/lqoVmoHPMxW+3/uL5L8NyzrKiFQGno3tJIzUUMVZ1dStYPE2oj/UsPBfen3uMyO5IuA3zIbNT33gawrKB7qCIWleClUzD4d6KMsN0XCLLXe/CoQAdr5R2SfXIknchThqVNQ+LDp50XKW4a6DcpxAPG4GtMt8c7b5jKMZJzYO79etfBCBvzoIPw+yMLIncGlzPwM+S5N+tiL1kRJZrVjnYbXq1fd+b75ql+NKWQUfX+E83NPrLJEGGVXC/Iappvcp+9h6wpzMa+Nyg2RPAYMTWun8NGmVN6fWZuJRYkKvUK6rtRztN82A/wDcudSclhmAyFqCxW/5vEcfUWebfX71ZSsbklWPwSnT4K/h38ayiLL0/nOnP3/txzGnXytULkNwZglmiGZdlwAOUIw7ze7ssxG39Aqn3qGN2zXZIlJqtMnx2RKJqSuyMWoTiH7FIdRZEab4oASUcNIpoX4z12Z/L3CkEyiyebsv0e55/2l07FKUMit6y1agZ8pjI/xCuQQ/VvEJLVrvqNMNrL7Qohc4rS0vCcaxS0aXeVVXz01WID9DCaCDE6pf1MEBBeQWg6BUw0/cDCcUmdbRV0hTY1hk33lVvtJ2hOc1YQw4xwBDB9df05xVOYVEgrDuueIG0/xU56gCKmNutlgNas6r/opn599zH1HcUJ8AGoFaMq9EOJpElM1ybVUlSyL4hHJcECYzBnZgsT6kx2QHgglJ3d2E/qY1agMhJU+jO46p9H8lVRoqZ4F2nbcGWx Lp+qp5CC 5dJUD7Qs4iMH2c3qOqR/UgzdirYhKbPtDRiGMHx9fHQ8tPGgtLZkS6eIbCC54zbmnz5sw43tzoGagq4khiT5JRQ15u2EGQd6buenA6qLbus+8GrIsCd9ToKE1Tcak68iVTa2lMk4+DN9gdkPNqCKxHLOQH27m1IcP7r+yIykjjg41F3L9GFHvomokidT1noxFXQfO5wtp4cH/HRbUAGXP2lUONXkP8s/Vm9edIVzVx+wpLsttUVyUL4724E7CnVK0z6kKD8bYk/oda0Uw4mOZ/24bEjoptAWz5RXAEQldKdiqoE59eaKeEnpZPuDMYkonze7q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 01/05/2024 13:07, Zi Yan wrote: > On 1 May 2024, at 7:38, Ryan Roberts wrote: > >> Pulling in David, who may be able to advise... >> >> >> On 01/05/2024 12:35, Ryan Roberts wrote: >>> Zi Yan, I'm hoping you might have some input on the below... >>> >>> >>> On 30/04/2024 14:31, Ryan Roberts wrote: >>>> __split_huge_pmd_locked() can be called for a present THP, devmap or >>>> (non-present) migration entry. It calls pmdp_invalidate() >>>> unconditionally on the pmdp and only determines if it is present or not >>>> based on the returned old pmd. >>>> >>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(), >>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future >>>> pmd_present() calls to return true - even for a swap pmd. Therefore any >>>> lockless pgtable walker could see the migration entry pmd in this state >>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were >>>> present, leading to BadThings (TM). GUP-fast appears to be one such >>>> lockless pgtable walker. >>>> >>>> While the obvious fix is for core-mm to avoid such calls for non-present >>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for >>>> this case either), all other arches that implement pmd_mkinvalid() do it >>>> in such a way that it is robust to being called with a non-present pmd. >>> >>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below... >>> >>>> So it is simpler and safer to make arm64 robust too. This approach means >>>> we can even add tests to debug_vm_pgtable.c to validate the required >>>> behaviour. >>>> >>>> This is a theoretical bug found during code review. I don't have any >>>> test case to trigger it in practice. >>>> >>>> Cc: stable@vger.kernel.org >>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration") >>>> Signed-off-by: Ryan Roberts >>>> --- >>>> >>>> Hi all, >>>> >>>> v1 of this fix [1] took the approach of fixing core-mm to never call >>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64 >>>> suffers this problem; all other arches are robust. So his suggestion was to >>>> instead make arm64 robust in the same way and add tests to validate it. Despite >>>> my stated reservations in the context of the v1 discussion, having thought on it >>>> for a bit, I now agree with Zi Yan. Hence this post. >>>> >>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is >>>> remove it from there and have this go in through the arm64 tree? Assuming there >>>> is agreement that this approach is right one. >>>> >>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64. >>>> >>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ >>>> >>>> Thanks, >>>> Ryan >>>> >>>> >>>> arch/arm64/include/asm/pgtable.h | 12 +++++-- >>>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++ >>>> 2 files changed, 71 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>>> index afdd56d26ad7..7d580271a46d 100644 >>>> --- a/arch/arm64/include/asm/pgtable.h >>>> +++ b/arch/arm64/include/asm/pgtable.h >>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd) >>>> >>>> static inline pmd_t pmd_mkinvalid(pmd_t pmd) >>>> { >>>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >>>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >>>> + /* >>>> + * If not valid then either we are already present-invalid or we are >>>> + * not-present (i.e. none or swap entry). We must not convert >>>> + * not-present to present-invalid. Unbelievably, the core-mm may call >>>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it. >>>> + */ >>>> + if (pmd_valid(pmd)) { >>>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >>>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >>>> + } >>>> >>>> return pmd; >>>> } >>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>>> index 65c19025da3d..7e9c387d06b0 100644 >>>> --- a/mm/debug_vm_pgtable.c >>>> +++ b/mm/debug_vm_pgtable.c >>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { } >>>> #endif /* CONFIG_HUGETLB_PAGE */ >>>> >>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) >>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) >>>> +{ >>> >>> Printing various values at different locations in this function for debug: >>> >>>> + unsigned long max_swap_offset; >>>> + swp_entry_t swp_set, swp_clear, swp_convert; >>>> + pmd_t pmd_set, pmd_clear; >>>> + >>>> + /* >>>> + * See generic_max_swapfile_size(): probe the maximum offset, then >>>> + * create swap entry will all possible bits set and a swap entry will >>>> + * all bits clear. >>>> + */ >>>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL)))); >>>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); >>>> + swp_clear = swp_entry(0, 0); >>>> + >>>> + /* Convert to pmd. */ >>>> + pmd_set = swp_entry_to_pmd(swp_set); >>>> + pmd_clear = swp_entry_to_pmd(swp_clear); >>> >>> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00 >>> >>>> + >>>> + /* >>>> + * Sanity check that the pmds are not-present, not-huge and swap entry >>>> + * is recoverable without corruption. >>>> + */ >>>> + WARN_ON(pmd_present(pmd_set)); >>>> + WARN_ON(pmd_trans_huge(pmd_set)); >>>> + swp_convert = pmd_to_swp_entry(pmd_set); >>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >>>> + WARN_ON(pmd_present(pmd_clear)); >>>> + WARN_ON(pmd_trans_huge(pmd_clear)); >>>> + swp_convert = pmd_to_swp_entry(pmd_clear); >>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); >>>> + >>>> + /* Now invalidate the pmd. */ >>>> + pmd_set = pmd_mkinvalid(pmd_set); >>>> + pmd_clear = pmd_mkinvalid(pmd_clear); >>> >>> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00 >>> >>>> + >>>> + /* >>>> + * Since its a swap pmd, invalidation should effectively be a noop and >>>> + * the checks we already did should give the same answer. Check the >>>> + * invalidation didn't corrupt any fields. >>>> + */ >>>> + WARN_ON(pmd_present(pmd_set)); >>>> + WARN_ON(pmd_trans_huge(pmd_set)); >>>> + swp_convert = pmd_to_swp_entry(pmd_set); >>> >>> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff) >>> >>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >>>> + WARN_ON(pmd_present(pmd_clear)); >>>> + WARN_ON(pmd_trans_huge(pmd_clear)); >>>> + swp_convert = pmd_to_swp_entry(pmd_clear); >>> >>> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8) >>> >>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); >>> >>> This line fails on x86_64. >>> >>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8. >>> >>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things. >>> >>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes. > > If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix > is better. I will dig into the x86 code more to figure out what goes wrong. > Last time, I only checked PAGE_* bits in these pmd|pte_* operations. > Sorry for the misinformation. No worries, I'll do the amends we originally agreed for the original fix and resend. > >>> >>>> +} >>>> +#else >>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { } >>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ >>>> + >>>> static void __init pmd_thp_tests(struct pgtable_debug_args *args) >>>> { >>>> pmd_t pmd; >>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args) >>>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); >>>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); >>>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */ >>>> + >>>> + swp_pmd_mkinvalid_tests(args); >>>> } >>>> >>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>> -- >>>> 2.25.1 >>>> >>> > > > -- > Best Regards, > Yan, Zi