From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FA2EC4829A for ; Tue, 13 Feb 2024 14:02:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A07C66B008A; Tue, 13 Feb 2024 09:02:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B79F6B008C; Tue, 13 Feb 2024 09:02:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8579E6B0092; Tue, 13 Feb 2024 09:02:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 757666B008A for ; Tue, 13 Feb 2024 09:02:14 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CD19A1A0BD8 for ; Tue, 13 Feb 2024 14:02:13 +0000 (UTC) X-FDA: 81786944946.06.9192048 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf15.hostedemail.com (Postfix) with ESMTP id 2F2DBA002E for ; Tue, 13 Feb 2024 14:02:07 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707832928; a=rsa-sha256; cv=none; b=7MnnHaFZrZX0Xhf32MviYjqR0RVHHSt8YFQM9FQlcWpezXlN9kkI15qNrbrb2e51AZWY1Q qokU+GskxEGu5HLrjazZwNqJsjFjdkpRKtrBzMGu91JE0PBePh8jnPBr6nlUtwLg9Lywst zO9lEfkQ2QcnoSBf62QG3T6ok4T0RYI= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707832928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ksvCjU88mP7dd1yRAZYR72/eJTQHWcVerDS423jXgyw=; b=GSmTOEv/kgisFyNecnn9Whb0iwlnmZtkw2NGlO4GxfYoNIbFLfDG/7JTrghZMGnHbSdEOZ r3iXnNXQxSUKauKF0Hd8KS7cGSOEX42qbIZAZA45LIOEGr4FMUMBsG3EA5zUy5wz818wDq 0pBw7Aaaybevmdn40w7cG8ZO7WXJ0gw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18234DA7; Tue, 13 Feb 2024 06:02:48 -0800 (PST) Received: from [10.1.36.184] (XHFQ2J9959.cambridge.arm.com [10.1.36.184]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3D7B53F762; Tue, 13 Feb 2024 06:02:03 -0800 (PST) Message-ID: <3de2130b-9f0f-4a11-ac06-7bf814de641c@arm.com> Date: Tue, 13 Feb 2024 14:02:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings Content-Language: en-GB To: David Hildenbrand , Ard Biesheuvel Cc: Mark Rutland , Catalin Marinas , Will Deacon , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Nicholas Piggin , Christophe Leroy , "Aneesh Kumar K.V" , "Naveen N. Rao" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240202080756.1453939-1-ryan.roberts@arm.com> <20240202080756.1453939-20-ryan.roberts@arm.com> <64395ae4-3a7d-45dd-8f1d-ea6b232829c5@arm.com> <41499621-482f-455b-9f68-b43ea8052557@redhat.com> <1d302d7a-50ab-4ab4-b049-75ed4a71a87d@arm.com> <99e2a92c-f2a2-4e1e-8ce2-08caae2cb7e4@redhat.com> <64b872bd-4b12-4dbd-b043-1ad11aeaa19a@redhat.com> From: Ryan Roberts In-Reply-To: <64b872bd-4b12-4dbd-b043-1ad11aeaa19a@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2F2DBA002E X-Stat-Signature: do1x4duuo7x8kyowyazq99f9gkosr3yj X-Rspam-User: X-HE-Tag: 1707832927-851829 X-HE-Meta: U2FsdGVkX18D7t735XbVktoRl37luIIs+Wksoekke45IClMtxnRp6pcvFQ/skersu/lQ5EiH1pjNe/tNMVhHujUbPRfjSGj4wRMNevAh4dhwTqtUHxVdfY0NeJuzAlk7Tklcz9bjLcbGCcUstRh68JJzJ/zGsgqRWUIS3XK7OruTKD+6AE8DcVu7ZfsBqbwcp6fz7CVPdZvL9hEta8MMIvpNa9iO7cxoPLwgyi0lybFX8Ybi97DIF/g2UpUxY0+ZLgJZ02IWKrfq5lLC0faRSBrZrds7lTaEjFfZV/T19SJf4E/tphcCUxgyRVnvwvaw7BTSD5d7640L/dS+sj3yMCt+L/+Dda7eR4kv1LmFWdHwjoFsC+iCRvzK3iV4Ho2TH166o54Grhe5npUERi3Kshqb0+HUduF+3z2ihT0f2CtmO2hf36LpVNn1Ci4h1QLA5oPmR3Z/kO3A+LIypjDaH8DxQQ9j4qR3y27EeMcazRQVZwBQW/yD/Qc1uIaZ2w6qra8XLuj/+s55FRQkwNmdxZx8Rj50kEYwY9jIppMeTwGJNcPn1Am5kzBzzegetc5WMLuVXuSP17Jrq0i5IkbmJKydOvmsadOZVyzctz9SaA3RAXxjioslDEY463l/4QBvpq/h0kMNal6ZNSyUYzBY/mCj3hOscmu8UX5F1PRjw5I005/oKCiTvZUwR4BWWp2XahNwUuQ8ZkhlqjToXFCC0nt1zNoJFE6hZQDkGUKtRzC8jmAbyBzHpg3QvR7ZPnek1aj7+RFPFXimr0r8Zp6mKUcplNtFiqM2KwYZxGZ0nFlycA13xcqT0d4aVtj9Cr6cdu0WJ9155onpG2wHKoXmpeVIYxFjppTLcmtF8EdSt0wNEPhSouU5ODgRq/b/VY+SatecU6Llorlj1mC88zdVIodYi0pxUS4fgMpMZCnfkgeIJCcgjFdP2l5KLrkj3iWYSgC3vi0O4A6hghSFEpw /nzUEYsn LzKD+nHc13h4ZPwuvxf2W8d8pLAu6JrkYxDUQ+zxowLjJiaNsyf1VFx9y3p9WpE2D+FZ2ypoJmWRoSN3qGQl8Adnr1tJVfSEXHzPZXLiRKF9qWj42CC3oe8gxahIW9u5HzdgL0HYwmpJBr0n1ym290hdoUpier7ocEhzhoa8LreZodChYh7rxUGtYaM9dxQ+zrSFmKfdv/kLSxTGpWM3Ngbw4oGYAokPcDwEHYd/v5W3D3p0WecZqRivJ45ciIW2xaO2P6R3582tItDtCXHx4rsQrzySCM/qp42ymNbGppNSICrgiLSiVoPhNhwrx5A6m7VIp6mH2o1SRwFE7MSir2alC51rtsx6rUchYYITmlYEYbzbUwMqUMK8g8RtqK6ibZt6LqKKhOxclyWzL6ZBR/HcVg3P8bgSNwtee X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/02/2024 13:45, David Hildenbrand wrote: > On 13.02.24 14:33, Ard Biesheuvel wrote: >> On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote: >>> >>> On 13/02/2024 13:13, David Hildenbrand wrote: >>>> On 13.02.24 14:06, Ryan Roberts wrote: >>>>> On 13/02/2024 12:19, David Hildenbrand wrote: >>>>>> On 13.02.24 13:06, Ryan Roberts wrote: >>>>>>> On 12/02/2024 20:38, Ryan Roberts wrote: >>>>>>>> [...] >>>>>>>> >>>>>>>>>>>> +static inline bool mm_is_user(struct mm_struct *mm) >>>>>>>>>>>> +{ >>>>>>>>>>>> +    /* >>>>>>>>>>>> +     * Don't attempt to apply the contig bit to kernel mappings, >>>>>>>>>>>> because >>>>>>>>>>>> +     * dynamically adding/removing the contig bit can cause page >>>>>>>>>>>> faults. >>>>>>>>>>>> +     * These racing faults are ok for user space, since they get >>>>>>>>>>>> serialized >>>>>>>>>>>> +     * on the PTL. But kernel mappings can't tolerate faults. >>>>>>>>>>>> +     */ >>>>>>>>>>>> +    return mm != &init_mm; >>>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> We also have the efi_mm as a non-user mm, though I don't think we >>>>>>>>>>> manipulate >>>>>>>>>>> that while it is live, and I'm not sure if that needs any special >>>>>>>>>>> handling. >>>>>>>>>> >>>>>>>>>> Well we never need this function in the hot (order-0 folio) path, so I >>>>>>>>>> think I >>>>>>>>>> could add a check for efi_mm here with performance implication. It's >>>>>>>>>> probably >>>>>>>>>> safest to explicitly exclude it? What do you think? >>>>>>>>> >>>>>>>>> Oops: This should have read "I think I could add a check for efi_mm here >>>>>>>>> *without* performance implication" >>>>>>>> >>>>>>>> It turns out that efi_mm is only defined when CONFIG_EFI is enabled I >>>>>>>> can do >>>>>>>> this: >>>>>>>> >>>>>>>> return mm != &init_mm && (!IS_ENABLED(CONFIG_EFI) || mm != &efi_mm); >>>>>>>> >>>>>>>> Is that acceptable? This is my preference, but nothing else outside of efi >>>>>>>> references this symbol currently. >>>>>>>> >>>>>>>> Or perhaps I can convince myself that its safe to treat efi_mm like >>>>>>>> userspace. >>>>>>>> There are a couple of things that need to be garanteed for it to be safe: >>>>>>>> >>>>>>>>      - The PFNs of present ptes either need to have an associated struct >>>>>>>> page or >>>>>>>>        need to have the PTE_SPECIAL bit set (either pte_mkspecial() or >>>>>>>>        pte_mkdevmap()) >>>>>>>> >>>>>>>>      - Live mappings must either be static (no changes that could cause >>>>>>>> fold/unfold >>>>>>>>        while live) or the system must be able to tolerate a temporary fault >>>>>>>> >>>>>>>> Mark suggests efi_mm is not manipulated while live, so that meets the >>>>>>>> latter >>>>>>>> requirement, but I'm not sure about the former? >>>>>>> >>>>>>> I've gone through all the efi code, and conclude that, as Mark suggests, the >>>>>>> mappings are indeed static. And additionally, the ptes are populated >>>>>>> using only >>>>>>> the _private_ ptep API, so there is no issue here. As just discussed with >>>>>>> Mark, >>>>>>> my prefereence is to not make any changes to code, and just add a comment >>>>>>> describing why efi_mm is safe. >>>>>>> >>>>>>> Details: >>>>>>> >>>>>>> * Registered with ptdump >>>>>>>        * ptep_get_lockless() >>>>>>> * efi_create_mapping -> create_pgd_mapping … -> init_pte: >>>>>>>        * __ptep_get() >>>>>>>        * __set_pte() >>>>>>> * efi_memattr_apply_permissions -> efi_set_mapping_permissions … -> >>>>>>> set_permissions >>>>>>>        * __ptep_get() >>>>>>>        * __set_pte() >>>>>> >>>>>> Sound good. We could add some VM_WARN_ON if we ever get the efi_mm via the >>>>>> "official" APIs. >>>>> >>>>> We could, but that would lead to the same linkage issue, which I'm trying to >>>>> avoid in the first place: >>>>> >>>>> VM_WARN_ON(IS_ENABLED(CONFIG_EFI) && mm == efi_mm); >>>>> >>>>> This creates new source code dependencies, which I would rather avoid if >>>>> possible. >>>> >>>> Just a thought, you could have a is_efi_mm() function that abstracts all that. >>>> >>>> diff --git a/include/linux/efi.h b/include/linux/efi.h >>>> index c74f47711f0b..152f5fa66a2a 100644 >>>> --- a/include/linux/efi.h >>>> +++ b/include/linux/efi.h >>>> @@ -692,6 +692,15 @@ extern struct efi { >>>> >>>>   extern struct mm_struct efi_mm; >>>> >>>> +static inline void is_efi_mm(struct mm_struct *mm) >>>> +{ >>>> +#ifdef CONFIG_EFI >>>> +       return mm == &efi_mm; >>>> +#else >>>> +       return false; >>>> +#endif >>>> +} >>>> + >>>>   static inline int >>>>   efi_guidcmp (efi_guid_t left, efi_guid_t right) >>>>   { >>>> >>>> >>> >>> That would definitely work, but in that case, I might as well just check for it >>> in mm_is_user() (and personally I would change the name to mm_is_efi()): >>> >>> >>> static inline bool mm_is_user(struct mm_struct *mm) >>> { >>>          return mm != &init_mm && !mm_is_efi(mm); >>> } >>> >>> Any objections? >>> >> >> Any reason not to use IS_ENABLED(CONFIG_EFI) in the above? The extern >> declaration is visible to the compiler, and any references should >> disappear before the linker could notice that efi_mm does not exist. >> > > Sure, as long as the linker is happy why not. I'll let Ryan mess with that :) I'm not sure if you are suggesting dropping the mm_is_efi() helper and just use IS_ENABLED(CONFIG_EFI) in mm_is_user() to guard efi_mm, or if you are suggesting using IS_ENABLED(CONFIG_EFI) in mm_is_efi() instead of the ifdefery? The former was what I did initially; It works great, but I didn't like that I was introducing a new code dependecy between efi and arm64 (nothing else outside of efi references efi_mm). So then concluded that it is safe to not worry about efi_mm (thanks for your confirmation). But then David wanted a VM_WARN check, which reintroduces the code dependency. So he suggested the mm_is_efi() helper to hide that... This is all starting to feel circular... Since I've jsut updated the code to do it David's way, I propose leaving it as is since nobody has strong feelings. > >> In any case, feel free to add >> >> Acked-by: Ard Biesheuvel Great thanks! > > Thanks for the review. >