From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21555C6379F for ; Tue, 14 Feb 2023 00:09:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A2B9D6B0078; Mon, 13 Feb 2023 19:09:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DB336B007B; Mon, 13 Feb 2023 19:09:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87BF46B007D; Mon, 13 Feb 2023 19:09:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 78B8B6B0078 for ; Mon, 13 Feb 2023 19:09:55 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3A027C0592 for ; Tue, 14 Feb 2023 00:09:55 +0000 (UTC) X-FDA: 80463964350.21.84CDA91 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf22.hostedemail.com (Postfix) with ESMTP id C6FF7C0014 for ; Tue, 14 Feb 2023 00:09:51 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=pQbN+hQ5; spf=pass (imf22.hostedemail.com: domain of debug@rivosinc.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=debug@rivosinc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676333392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wafRvFtpoaUPeKy4BH9nydSNV62pORjUA+IPmwJBn+I=; b=MnmGPpqcEV6ru3S6QpDrjFZ9wmZX3J5Bd1DUKphZl9vALt67BauSZnkE2msDrUM+TOXEO1 tpk1w7vZPwnflJ60uUwN45Ui+vofm9DlIGcbU5lAsqmpq5r55qWb0CzC5gwAVC0iW9scyA EXT/fibbQQyXAPdhuzMMdmjq6clhk5I= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=rivosinc-com.20210112.gappssmtp.com header.s=20210112 header.b=pQbN+hQ5; spf=pass (imf22.hostedemail.com: domain of debug@rivosinc.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=debug@rivosinc.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676333392; a=rsa-sha256; cv=none; b=GVNxTvJtz5JvKXiUfRxHnjTTyiov8AZ7mFQZPn/SB/bwFEhsissNqR4dKeQ6UyT0Kd4ihk C+cIn5OFVg0ssWVsHafqN+403MgjgplgX8JQOQzaerC/THxX72eUcjohSorgAuaHLzGHx4 AVyjwV6bEC8Kf9n3pZ3a8S3aO5u0zmU= Received: by mail-pj1-f46.google.com with SMTP id o13so13529105pjg.2 for ; Mon, 13 Feb 2023 16:09:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=wafRvFtpoaUPeKy4BH9nydSNV62pORjUA+IPmwJBn+I=; b=pQbN+hQ5P2HnOheTT+gYeCeyltivCsBRliCKe9K1C3GNPGDHI5h6dp/QlFRrnehCN6 Tdk3QhsBk9F4N04SlocSX28nAejEB/9gSbJSRDAZcbnNKSHK0FVahXIJmsK+OuTCkyno swq2CeZXFUvGzOxygjkaMVEJRi0SAlssyhRhQ25BhYC+jK2bzHdVBo6MWGAFZ1ChP0ZG hwFwvg3+lBPPmlICv+xW6tFJTxeb4rTi4SHa8nHHvISaI07vVIWinTfd6LjmKXqf7aW6 h8U5xHT0i5eJOpsXcsOpiYUDWjOmPNOknbuNdtQfLEoyd3lJW4/6McCYOfg9HGuNMJTl FiUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wafRvFtpoaUPeKy4BH9nydSNV62pORjUA+IPmwJBn+I=; b=XLfP/U7gQasytm8b+rtyoeu6104yY37qAQhg6EuByFv0LUEOKwJIM162HPYoC+u3Yl kpysLzddC2RP4XZK4N5VOCjUhCBPy06wOKMpCktVfNu4e4rCf4Q6zUOxmrpr5DSbVJQ0 3kKZX0ogyGnju+7KA4pfe9wpAg2EhZSaFUr/ovxezi3l80cWp8nSlH2MZmN0dZ2aQ/UU 1M2YqIEdnGtb/1na4JWTxYh3BFv9J6PL6eH45Y0/ncuuTOu9CwN3cXDDuKc0+dMRca4j kPYMqBYEgGJdUQQcRpN4HKdrn91p0Uz7+0onfBCHSCuNUAgvAYDOqeLGFRWJX/XixL53 nPWA== X-Gm-Message-State: AO0yUKW3x0HXtxJI46NIq5TCl7LUYWOmcDNXCKQe6ixmI5h2W16jzRju YmW1G30wf4xTVV+TQyi3qq7ClQ== X-Google-Smtp-Source: AK7set9HjlAaKSWUg5OUHj0WQmlr8FQa9bAoHK9nkuOdv8y1POrZgcGxG7My2GrW4SaF7tGeKgFNXQ== X-Received: by 2002:a17:903:230a:b0:196:6162:1a76 with SMTP id d10-20020a170903230a00b0019661621a76mr903634plh.0.1676333390417; Mon, 13 Feb 2023 16:09:50 -0800 (PST) Received: from debug.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id iw6-20020a170903044600b0019ab6beea1esm81093plb.87.2023.02.13.16.09.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Feb 2023 16:09:49 -0800 (PST) Date: Mon, 13 Feb 2023 16:09:47 -0800 From: Deepak Gupta To: Rick Edgecombe Cc: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, Yu-cheng Yu Subject: Re: [PATCH v5 19/39] mm: Fixup places that call pte_mkwrite() directly Message-ID: <20230214000947.GB4016181@debug.ba.rivosinc.com> References: <20230119212317.8324-1-rick.p.edgecombe@intel.com> <20230119212317.8324-20-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20230119212317.8324-20-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C6FF7C0014 X-Rspam-User: X-Stat-Signature: d6zujoiqyn9fesfkew94huhymfi5x3n3 X-HE-Tag: 1676333391-631798 X-HE-Meta: U2FsdGVkX18eF8398qYdKQ/ycLc14wqQDbXtqYL3VXqZNPl4CA+oW5CfujJkK5V/z5MrgdDO0B2OkFGoDBbf2sF01xY4Wa6mHVPQDfgqmk6aXWvZPLGjA7Gq8kVjQBtZuUVySdSNtNBlHwUB1vjVDvelsr8FYLwH7BDx7a8Q3Q5+QqBU+TrM+5nvMvYTF2EIaoT9UwW3V38tJzmujtwGw3m0tq28OQN+bgInz4GJxLWDwcgHuSrDwOsDAbMAev7qLaKwmVQGOqRoszzo7G+YwtHxWQ5p+3U8U1LWyTl7GmRCQDUDUFI3w3taQmHa0ZNLR+WY6roR75GxAACuOE2WqpsUpymnw6C/iG6trkeOjz6UJNCqbIla9kyCsfqPYpVLkvXmYzFyuVi9yzhG6U6wnTIvClGR6ZwJxFNHRg5KeOynRHXIqb+BnKb0ZBImNXqo1duz8YlUCdPSwsWUg+TLG4ntZxiAVRE/Cdfs1LitnCr5++YBIdBdXb29I/qnGTQPA5nZPdunK00YviFBpeub3waXRwJvVrxhxn/NE5STYobds/z0QSOny3LgnN3RJ7rAZaW7xg0cE9SbEvlJ1fbjf9WdAxFXNOR6uyzCf5F96SW+UAeE1dX2oa5g0RB4N+v4FUZfGgQzPKERO+1qQF8ialjua8zamm46W1mfYY34ov5OZc6H0yPa+4i5UfAiHWzmOVAZ33AlSqkQHGCQ94VF5H4zpurTvZAQMSwN808NcRJNarg0V08DKYy8MH+stt17DBj0Dl82ZaIs9GJKNPKHCihSiYqOdyiUYknZYuv2C5UwLAAywuKJ17wS4LZGI7ee0NwvpmJPBJMgBRNExYdo5FOdvExBj3uQ4IglPwpF72bBwPnzwWE/NMr5M0GCVv9U/ygtOYNJWKH1tDfuncfmovXTFIww45WkgnZLXT/sMUs96Ka3BagkIeQaDcKHCs8xTf5BMwxAA4ouU5HHIiv /+eoQP5f 7hYrqPUOlLWiH3J8qqaUYUENJd3Z0QvdC1te1lk6JPpYYnaUYLTDMabVURUFg42B422mNEgHAzcmL4mBJqQq4I2P/nZIIPsYayg7gpBhb60Ff/h5yQk/BQKT2sGq+cUUZHl9bds2UjgssbzdNR9eBQ+4TeplMGPkuGhweJuYCdxKA3F0mTXsHdVmnYPNLlIbqVv/A65C7JJJ0RP4ahjwHfT/w3BaVRzqBxsdTeulo5NcqiW4GRKBiz7g5ityL1q3NTbP+XkNvtB2LLnHmfL6jjEb9Pb6iGqmri5cltPytLfQhGah5BnomtxV1vXH566VKoIuOC5nWfuz/SGBWCNm6QS3NUI7WtcBFC9mj6ordmuaE3p11OIl4Sopf5frEQsw0nOP9YnG9kSO6oVboMeTQtMAvjAMyO/P6vYMXzJ7HBHa2tU6jBk8Jm2fOxDsMl5D0RFLsHArx8SI9R/40rRksSJfjxiZTGBNG3bBKk0dP4T5S5rve/jOoco1kCXUJ5++7OnEm1t4lGc6uhP8KWtOSzc0qL++ZsjtnGlKNwqUTWSMmCMdsmSBg0/WAh4RWwnwkw8Q3AUYEPHe3W6RAC6K8cd+scW1Y5gcU6t7+3uXDh5SgtHEFHzweZl2yYBs+WX5l6+O8RTCe6zfhKtyv4UyA3pDMie4fSg/rLaXjcQ3G7wiGVCpzy5yhpv+Wx4rkvb09XrSjx+V/98YW0eqHXvVf/S3ufW8fBVlY/1MNfUBZ0DexsInUEyCJ1KmCjyr9LXe7BUj5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since I've a general question on outcome of discussion of how to handle `pte_mkwrite`, so I am top posting. I have posted patches yesterday targeting riscv zisslpcfi extension. https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Since there're similarities in extension(s), patches have similarity too. One of the similarity was updating `maybe_mkwrite`. I was asked (by dhildenb on my patch #11) to look at x86 approach on how to approach this so that core-mm approach fits multiple architectures along with the need to update `pte_mkwrite` to consume vma flags. In x86 CET patch series, I see that locations where `pte_mkwrite` is invoked are updated to check for shadow stack vma and not necessarily `pte_mkwrite` itself is updated to consume vma flags. Let me know if my understanding is correct and that's the current direction (to update call sites for vma check where `pte_mkwrite` is invoked) Being said that as I've mentioned in my patch series that there're similarities between x86, arm and now riscv for implementing shadow stack and indirect branch tracking, overall it'll be a good thing if we can collaborate and come up with common bits. Rest inline. On Thu, Jan 19, 2023 at 01:22:57PM -0800, Rick Edgecombe wrote: >From: Yu-cheng Yu > >The x86 Control-flow Enforcement Technology (CET) feature includes a new >type of memory called shadow stack. This shadow stack memory has some >unusual properties, which requires some core mm changes to function >properly. > >With the introduction of shadow stack memory there are two ways a pte can >be writable: regular writable memory and shadow stack memory. > >In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() >or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases >where a PTE is made writable. However, there are places where pte_mkwrite() >is called directly and the logic should now also create a shadow stack PTE >in the case of a shadow stack VMA. > >- do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE > directly and call pte_mkwrite(). Teach it about pte_mkwrite_shstk() > >- When userfaultfd is creating a PTE after userspace handles the fault > it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() > >To make the code cleaner, introduce is_shstk_write() which simplifies >checking for VM_WRITE | VM_SHADOW_STACK together. > >In other cases where pte_mkwrite() is called directly, the VMA will not >be VM_SHADOW_STACK, and so shadow stack memory should not be created. > - In the case of pte_savedwrite(), shadow stack VMA's are excluded. > - In the case of the "dirty_accountable" optimization in mprotect(), > shadow stack VMA's won't be VM_SHARED, so it is not necessary. > >Tested-by: Pengfei Xu >Tested-by: John Allen >Signed-off-by: Yu-cheng Yu >Co-developed-by: Rick Edgecombe >Signed-off-by: Rick Edgecombe >Cc: Kees Cook >--- > >v5: > - Fix typo in commit log > >v3: > - Restore do_anonymous_page() that accidetally moved commits (Kirill) > - Open code maybe_mkwrite() cases from v2, so the behavior doesn't change > to mark that non-writable PTEs dirty. (Nadav) > >v2: > - Updated commit log with comment's from Dave Hansen > - Dave also suggested (I understood) to maybe tweak vm_get_page_prot() > to avoid having to call maybe_mkwrite(). After playing around with > this I opted to *not* do this. Shadow stack memory memory is > effectively writable, so having the default permissions be writable > ended up mapping the zero page as writable and other surprises. So > creating shadow stack memory needs to be done with manual logic > like pte_mkwrite(). > - Drop change in change_pte_range() because it couldn't actually trigger > for shadow stack VMAs. > - Clarify reasoning for skipped cases of pte_mkwrite(). > >Yu-cheng v25: > - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). > > arch/x86/include/asm/pgtable.h | 3 +++ > arch/x86/mm/pgtable.c | 6 ++++++ > include/linux/pgtable.h | 7 +++++++ > mm/memory.c | 5 ++++- > mm/migrate_device.c | 4 +++- > mm/userfaultfd.c | 10 +++++++--- > 6 files changed, 30 insertions(+), 5 deletions(-) > >diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h >index 45b1a8f058fe..87d3068734ec 100644 >--- a/arch/x86/include/asm/pgtable.h >+++ b/arch/x86/include/asm/pgtable.h >@@ -951,6 +951,9 @@ static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) > } > #endif /* CONFIG_PAGE_TABLE_ISOLATION */ > >+#define is_shstk_write is_shstk_write >+extern bool is_shstk_write(unsigned long vm_flags); >+ > #endif /* __ASSEMBLY__ */ > > >diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >index e4f499eb0f29..d103945ba502 100644 >--- a/arch/x86/mm/pgtable.c >+++ b/arch/x86/mm/pgtable.c >@@ -880,3 +880,9 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) > > #endif /* CONFIG_X86_64 */ > #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ >+ >+bool is_shstk_write(unsigned long vm_flags) >+{ >+ return (vm_flags & (VM_SHADOW_STACK | VM_WRITE)) == >+ (VM_SHADOW_STACK | VM_WRITE); >+} Can we call this function something along the lines `is_shadow_stack_vma`? Reason being, we're actually checking for vma property here. Also can we move this into common code? Common code can then further call `arch_is_shadow_stack_vma`. Respective arch can implement their own shadow stack encoding. I see that x86 is using one of the arch bit. Current riscv implementation uses presence of only `VM_WRITE` as shadow stack encoding. Please see patch #11 and #12 in the series I posted (URL at the top of this e-mail). >diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >index 14a820a45a37..49ce1f055242 100644 >--- a/include/linux/pgtable.h >+++ b/include/linux/pgtable.h >@@ -1578,6 +1578,13 @@ static inline bool arch_has_pfn_modify_check(void) > } > #endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ > >+#ifndef is_shstk_write >+static inline bool is_shstk_write(unsigned long vm_flags) >+{ >+ return false; >+} >+#endif >+ > /* > * Architecture PAGE_KERNEL_* fallbacks > * >diff --git a/mm/memory.c b/mm/memory.c >index aad226daf41b..5e5107232a26 100644 >--- a/mm/memory.c >+++ b/mm/memory.c >@@ -4088,7 +4088,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > > entry = mk_pte(page, vma->vm_page_prot); > entry = pte_sw_mkyoung(entry); >- if (vma->vm_flags & VM_WRITE) >+ >+ if (is_shstk_write(vma->vm_flags)) >+ entry = pte_mkwrite_shstk(pte_mkdirty(entry)); >+ else if (vma->vm_flags & VM_WRITE) > entry = pte_mkwrite(pte_mkdirty(entry)); > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, >diff --git a/mm/migrate_device.c b/mm/migrate_device.c >index 721b2365dbca..53d417683e01 100644 >--- a/mm/migrate_device.c >+++ b/mm/migrate_device.c >@@ -645,7 +645,9 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, > goto abort; > } > entry = mk_pte(page, vma->vm_page_prot); >- if (vma->vm_flags & VM_WRITE) >+ if (is_shstk_write(vma->vm_flags)) >+ entry = pte_mkwrite_shstk(pte_mkdirty(entry)); >+ else if (vma->vm_flags & VM_WRITE) > entry = pte_mkwrite(pte_mkdirty(entry)); > } > >diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c >index 0499907b6f1a..832f0250ca61 100644 >--- a/mm/userfaultfd.c >+++ b/mm/userfaultfd.c >@@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > int ret; > pte_t _dst_pte, *dst_pte; > bool writable = dst_vma->vm_flags & VM_WRITE; >+ bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; > bool vm_shared = dst_vma->vm_flags & VM_SHARED; > bool page_in_cache = page_mapping(page); > spinlock_t *ptl; >@@ -84,9 +85,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > writable = false; > } > >- if (writable) >- _dst_pte = pte_mkwrite(_dst_pte); >- else >+ if (writable) { >+ if (shstk) >+ _dst_pte = pte_mkwrite_shstk(_dst_pte); >+ else >+ _dst_pte = pte_mkwrite(_dst_pte); >+ } else > /* > * We need this to make sure write bit removed; as mk_pte() > * could return a pte with write bit set. >-- >2.17.1 >