From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 031CFC636CC for ; Thu, 16 Feb 2023 15:36:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ADCB6B0075; Thu, 16 Feb 2023 10:36:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 95CE66B0078; Thu, 16 Feb 2023 10:36:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FE316B007B; Thu, 16 Feb 2023 10:36:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 70B4F6B0075 for ; Thu, 16 Feb 2023 10:36:45 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3D8B71210E7 for ; Thu, 16 Feb 2023 15:36:45 +0000 (UTC) X-FDA: 80473557570.16.2047E52 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id F3089C0010 for ; Thu, 16 Feb 2023 15:36:41 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KrE+TIwx; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676561802; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A5Fb8PImBrYGyHcsdG89CK7ZB/CdIFJmA9weCo0XdLc=; b=bDmONmJ/PxjCPjw/batTzswrJkFyF+nvDbZxx35qJ7JHBWjtkPwPn4cqIJgmOK6nJrzc1b 48FFr+f4xSAF9jENY+qwZ5r4klzHL+o5Kr0Wh85yspXitOcdYLJeNuam3jSRkkF1rnfdCC SLKKaALIluYgd7cboK05SC6FurtXik8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KrE+TIwx; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676561802; a=rsa-sha256; cv=none; b=0QbRzp9PdkqbyGe8XI2BjYIE31le2Ge261o1Mv133DKItQdDqKgYrQVDY7aPiAR8Qf7pI7 h/keIch0555QikIpQqAVsAtRUXvvF53rTLTc03AvHMhp66Q2PCe9xzx8mwNg1yIJfA+kXZ MWhDv72mWk3C5cuq6J0rPuiSHU7VyfM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676561801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A5Fb8PImBrYGyHcsdG89CK7ZB/CdIFJmA9weCo0XdLc=; b=KrE+TIwxLyMkulYmp0k8vMdZqQQHLb1BtCKpynhp9oA95Izz4e1HnoMn0tUHRZCNe4QgbU V1JEJuaO0hXxzJ7zzLfDjYJubYPHv6JmrcfvXo/UlQHrEe01rwdKtZ1fycvr5OMfn8dSLj AsH8FMGgemQxHn2pJALv8rR4SlvdCQc= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-433-UUSyLWyhMUK3i9nw_pvDHQ-1; Thu, 16 Feb 2023 10:36:40 -0500 X-MC-Unique: UUSyLWyhMUK3i9nw_pvDHQ-1 Received: by mail-wr1-f71.google.com with SMTP id n18-20020adfc612000000b002c3d80ab568so312349wrg.18 for ; Thu, 16 Feb 2023 07:36:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=A5Fb8PImBrYGyHcsdG89CK7ZB/CdIFJmA9weCo0XdLc=; b=jL3qqowddWEqQSPCxXn6tNsRRymu3fOEkNM62FjG6uHj3Z7IVgA7CtTzYMibYzJUoL ec/0SBaojlMzLij3sPtNrGuW2hvdfOXt3C0Sxm1x5wMOhswz81VnsPO00qQUMQAm+nAe /6aSE8m9E7IoG/aNgy1eLT20kWg9mRQ8uzBuqMK353CzaYdDGLHZniVsvdQgeYNQzxIi u4Lg2DnZdMEdb9gnwa1aZuF6eSnBdW/EraLcVHulvqGyLF5eZ5hCc5CWFd8vFmBrmfml JWpKHMm1Etx09NiOtDKaRUxILlLo6tpfwJSMMtYWLyD88WEUzDmpchRNf6qA9F8X3eCa Hu9w== X-Gm-Message-State: AO0yUKUoInN6cT2pHCTfMNtX3bKn0PKV55kbpIXrv7AQuB+IQe/22BDE COcmnd3EbxCknm2SvElX1/1L1doi+lzYzUMN4evHpAklb8+Ne0u8ZXUCVB9P1b0zOPRAjPayNIs wUSENn0Hk3K0= X-Received: by 2002:a5d:574a:0:b0:2c5:582b:bad1 with SMTP id q10-20020a5d574a000000b002c5582bbad1mr5462443wrw.30.1676561798897; Thu, 16 Feb 2023 07:36:38 -0800 (PST) X-Google-Smtp-Source: AK7set/ecoRRJ/I3UFNiSfLEqudNbDaalrprKJA/jDlYKXRQpXFBDA3mNLzN3AuEsyjLo2c1BpYZZQ== X-Received: by 2002:a5d:574a:0:b0:2c5:582b:bad1 with SMTP id q10-20020a5d574a000000b002c5582bbad1mr5462420wrw.30.1676561798525; Thu, 16 Feb 2023 07:36:38 -0800 (PST) Received: from ?IPV6:2003:cb:c708:bc00:2acb:9e46:1412:686a? (p200300cbc708bc002acb9e461412686a.dip0.t-ipconnect.de. [2003:cb:c708:bc00:2acb:9e46:1412:686a]) by smtp.gmail.com with ESMTPSA id a16-20020a5d4d50000000b002c55521903bsm1718324wru.51.2023.02.16.07.36.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Feb 2023 07:36:37 -0800 (PST) Message-ID: <311e3201-4dde-9f45-2cea-1a570b85179c@redhat.com> Date: Thu, 16 Feb 2023 16:36:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH v1] sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, sparclinux@vger.kernel.org, Andrew Morton , "David S. Miller" , Peter Xu , Hev , Anatoly Pugachev , Raghavendra K T , Thorsten Leemhuis , Mike Kravetz , "Kirill A. Shutemov" , Juergen Gross References: <20221212130213.136267-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20221212130213.136267-1-david@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F3089C0010 X-Stat-Signature: shymf1a7so665ig675pqg5g1cexebh3w X-HE-Tag: 1676561801-209868 X-HE-Meta: U2FsdGVkX1/2sB0kUbu1ijfqJ6IfHqo8R0pKVbrxbJ29nT8YnoAIduCPG9WGNUrX7lOj5qqvhatkHkH6aRNOQ+Wr6HlL6zROFfbd1TB/mw5VoJ7ttzWA5IjmRtdx5ezIzF5nzSYTB8G6mHRBW6mi00wh+26L7gmRY7lyWGIulZZTP6bPSlEKZnOM0RVUxp7W2CyGturSCv1ZiMqY+GlrBU2hRRLFPvQNy459IbtRAaOBXQNVoedH7VWac/5us574VIYFE0apgHXwKNFkK5XvvOWgL2oBanLCDK2HUPFJ3Wjts3wA3zx9Ytguga8bi+5w6g58COjAyw9YqW1QMQMQCJ+bO0V7bsFwsqyL2fHH/iVU/n6kuSi/oDAbDisZHN1BUMyFQkYl7Ab8KzZO/ajfd0NKkeaQLINHbqSxN4ZC8Y5dNBoVUV34SpAnLEKfOMiTVnrRxpEtZpMqM+TRGmMmSRrzK7ZiPKys58RHz30sMFy48pXO+Vl52Is9n1rLf9qRPfcpdQzx2GfLIYgYdAR/pRWeGF3CaLOO+BB7bS0lco44hEkumQDrjGxrPsYFAcbbPygGJHJGH6QcNw8AKEsMxNgMqQlJlEctilawK8nMdU21McoRts8EpEij96fkKDmmZToKS6qSZUGqgjlQDtGez4tnXOZA6ET2mhwG8lWgakVC96EIGGO8gPT0PbdtpaxN34bTf4EDqZHkxpH47yWZkO7+Kzo+RGJVhvD+qBh9uM/v04sZme1xr1ikzHffo5p6ycMdptUhipKj6GMNVSrH2S4oKzUptZ+F4MeEce4AETiLIgUImfItp/GcHKFupMPXC3sQG8ix8GcsMgetdmmWq4242WrS49IUvLMPLaRrtUhtqzGP40iKOrcB2p3Z4/h7vSokNEnV/SPnKn8fFItMule4a92cWG9dwNDtLpqk6MCvsgoDeQfJ/IekEiSx742d87HJzW+QqfapSSwf+Qn AVd85yGu hEMefg4s+r731afhD0ORlIWtAWQer6AB/iBRwV5wlqUwWlkW91hTYu4FhMwIdOAiiUIg81hWokeN2JJhD0xwylagZ20pNZCxgTXvAuXqcJbnk9vzAfswAvZDT1ib+KpTt3CPkYZTbjJfTbh8llPrPNfItU9yLpMsrssithDW0vwmO/Q7IMVfyS4ypXIs/pOEwHPJSSyHiovZSEa7sF1x4LnCXZm61hCQ7v2j3i1IUEg3O1Przas4Ipx5ZCshIlZKu8e4tNVT9yhWkw919AYkWeBTdbkNq/hqU0z/J4FEEX/D+tUNjgP1ZECyT2lR5npBMXLiXobDq2L3GCgif7ddzzRZZsXqh7pIgdm5bnOF1xFnEDZi6yliFlEY4WhYXrkI5cTU9wD4WvPa4YUaYFCpDVKpLiqh/rQEM9R1i50bltCI+lybXkxEQvUAK8yMh3qkuYgdERdIcFKjBsQGUTjRBsvrPDQ9qROLx19+VNAZdlY5U0+obU4Wif5dt5hBW2tITH/T1uScFGw74UtJhnMdF80YCeWKI/DGvZs9yEPcsdaUAzO9JZ98PVaRdWVwBg32HjfYVqcrU0QTiEM21InES5WhzGp5HHzbIbQPh9VEwjprFxsKwPERhEpy7J9DRBjWi8zg6fCGF/g0nVjVRHkA/4gy0fPxcn5Q0jLbgyyXyoF6lMamGcNAhJf0ZfvCj5OMhHbWjI9UgPNM8DtUFZ5qYGvxplQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ping. We are seeing more issues [1] with that handling that require workarounds. @Andrew, if David is unavailable for sparc64 activity, can we route this through your tree? [1] https://lkml.kernel.org/r/20230216153059.256739-1-peterx@redhat.com On 12.12.22 14:02, David Hildenbrand wrote: > On sparc64, there is no HW modified bit, therefore, SW tracks via a SW > bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() > currently also unconditionally sets the HW writable bit, which is wrong. > > pte_mkdirty() is not supposed to make a PTE actually writable, unless the > SW writable bit (pte_write()) indicates that the PTE is not > write-protected. Fortunately, sparc64 also defines a SW writable bit. > > For example, this already turned into a problem in the context of > THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: > carry over dirty bit when thp splits on pmd") and might be an issue during > page migration in mm/migrate.c:remove_migration_pte() as well where we: > if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) > pte = pte_mkdirty(pte); > > But more general, anything like: > maybe_mkwrite(pte_mkdirty(pte), vma) > code is broken on sparc64, because it will unconditionally set the HW > writable bit even if the SW writable bit is not set. > > Simple reproducer that will result in a writable PTE after ptrace > access, to highlight the problem and as an easy way to verify if it has > been fixed: > > -------------------------------------------------------------------------- > #include > #include > #include > #include > #include > #include > #include > > static void signal_handler(int sig) > { > if (sig == SIGSEGV) > printf("[PASS] SIGSEGV generated\n"); > else > printf("[FAIL] wrong signal generated\n"); > exit(0); > } > > int main(void) > { > size_t pagesize = getpagesize(); > char data = 1; > off_t offs; > int mem_fd; > char *map; > int ret; > > mem_fd = open("/proc/self/mem", O_RDWR); > if (mem_fd < 0) { > fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno); > return 1; > } > > map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE|MAP_ANON, -1 ,0); > if (map == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %d\n", errno); > return 1; > } > > printf("original: %x\n", *map); > > /* debug access */ > offs = lseek(mem_fd, (uintptr_t) map, SEEK_SET); > ret = write(mem_fd, &data, 1); > if (ret != 1) { > fprintf(stderr, "pwrite(/proc/self/mem) failed with %d: %d\n", ret, errno); > return 1; > } > if (*map != data) { > fprintf(stderr, "pwrite(/proc/self/mem) not visible\n"); > return 1; > } > > printf("ptrace: %x\n", *map); > > /* Install signal handler. */ > if (signal(SIGSEGV, signal_handler) == SIG_ERR) { > fprintf(stderr, "signal() failed\n"); > return 1; > } > > /* Ordinary access. */ > *map = 2; > > printf("access: %x\n", *map); > > printf("[FAIL] SIGSEGV not generated\n"); > > return 0; > } > -------------------------------------------------------------------------- > > Without this commit (sun4u in QEMU): > # ./reproducer > original: 0 > ptrace: 1 > access: 2 > [FAIL] SIGSEGV not generated > > Let's fix this by setting the HW writable bit only if both, the SW dirty > bit and the SW writable bit are set. This matches, for example, how > s390x handles pte_mkwrite() and pte_mkdirty() -- except, that they have > to clear the _PAGE_PROTECT bit. > > We have to move pte_dirty() and pte_dirty() up. The code patching > mechanism and handling constants > 22bit is a bit special on sparc64. > > With this commit (sun4u in QEMU): > # ./reproducer > original: 0 > ptrace: 1 > [PASS] SIGSEGV generated > > This handling seems to have been in place forever. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Cc: Andrew Morton > Cc: "David S. Miller" > Cc: Peter Xu > Cc: Hev > Cc: Anatoly Pugachev > Cc: Raghavendra K T > Cc: Thorsten Leemhuis > Cc: Mike Kravetz > Cc: "Kirill A. Shutemov" > Cc: Juergen Gross > Signed-off-by: David Hildenbrand > --- > > Only tested under QEMU with sun4u, as I cannot seem to get sun4v running > in QEMU. Survives a simple debian 10 boot. > > This also tackles what's documented in: > https://lkml.kernel.org/r/20221125185857.3110155-1-peterx@redhat.com > and once loongarch also has been fixed, we might be able to remove all > that special-casing. > > --- > arch/sparc/include/asm/pgtable_64.h | 117 ++++++++++++++++------------ > 1 file changed, 67 insertions(+), 50 deletions(-) > > diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h > index 3bc9736bddb1..7f2e57747563 100644 > --- a/arch/sparc/include/asm/pgtable_64.h > +++ b/arch/sparc/include/asm/pgtable_64.h > @@ -354,6 +354,42 @@ static inline pgprot_t pgprot_noncached(pgprot_t prot) > */ > #define pgprot_noncached pgprot_noncached > > +static inline unsigned long pte_dirty(pte_t pte) > +{ > + unsigned long mask; > + > + __asm__ __volatile__( > + "\n661: mov %1, %0\n" > + " nop\n" > + " .section .sun4v_2insn_patch, \"ax\"\n" > + " .word 661b\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > + " .previous\n" > + : "=r" (mask) > + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > + > + return (pte_val(pte) & mask); > +} > + > +static inline unsigned long pte_write(pte_t pte) > +{ > + unsigned long mask; > + > + __asm__ __volatile__( > + "\n661: mov %1, %0\n" > + " nop\n" > + " .section .sun4v_2insn_patch, \"ax\"\n" > + " .word 661b\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > + " .previous\n" > + : "=r" (mask) > + : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > + > + return (pte_val(pte) & mask); > +} > + > #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) > pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags); > #define arch_make_huge_pte arch_make_huge_pte > @@ -415,28 +451,44 @@ static inline bool is_hugetlb_pte(pte_t pte) > } > #endif > > -static inline pte_t pte_mkdirty(pte_t pte) > +static inline pte_t __pte_mkhwwrite(pte_t pte) > { > - unsigned long val = pte_val(pte), tmp; > + unsigned long val = pte_val(pte); > > + /* > + * Note: we only want to set the HW writable bit if the SW writable bit > + * and the SW dirty bit are set. > + */ > __asm__ __volatile__( > - "\n661: or %0, %3, %0\n" > + "\n661: or %0, %2, %0\n" > " nop\n" > - "\n662: nop\n" > + " .section .sun4v_1insn_patch, \"ax\"\n" > + " .word 661b\n" > + " or %0, %3, %0\n" > + " .previous\n" > + : "=r" (val) > + : "0" (val), "i" (_PAGE_W_4U), "i" (_PAGE_W_4V)); > + > + return __pte(val); > +} > + > +static inline pte_t pte_mkdirty(pte_t pte) > +{ > + unsigned long val = pte_val(pte), mask; > + > + __asm__ __volatile__( > + "\n661: mov %1, %0\n" > " nop\n" > " .section .sun4v_2insn_patch, \"ax\"\n" > " .word 661b\n" > - " sethi %%uhi(%4), %1\n" > - " sllx %1, 32, %1\n" > - " .word 662b\n" > - " or %1, %%lo(%4), %1\n" > - " or %0, %1, %0\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > " .previous\n" > - : "=r" (val), "=r" (tmp) > - : "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U), > - "i" (_PAGE_MODIFIED_4V | _PAGE_W_4V)); > + : "=r" (mask) > + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > > - return __pte(val); > + pte = __pte(val | mask); > + return pte_write(pte) ? __pte_mkhwwrite(pte) : pte; > } > > static inline pte_t pte_mkclean(pte_t pte) > @@ -478,7 +530,8 @@ static inline pte_t pte_mkwrite(pte_t pte) > : "=r" (mask) > : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > > - return __pte(val | mask); > + pte = __pte(val | mask); > + return pte_dirty(pte) ? __pte_mkhwwrite(pte) : pte; > } > > static inline pte_t pte_wrprotect(pte_t pte) > @@ -581,42 +634,6 @@ static inline unsigned long pte_young(pte_t pte) > return (pte_val(pte) & mask); > } > > -static inline unsigned long pte_dirty(pte_t pte) > -{ > - unsigned long mask; > - > - __asm__ __volatile__( > - "\n661: mov %1, %0\n" > - " nop\n" > - " .section .sun4v_2insn_patch, \"ax\"\n" > - " .word 661b\n" > - " sethi %%uhi(%2), %0\n" > - " sllx %0, 32, %0\n" > - " .previous\n" > - : "=r" (mask) > - : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > - > - return (pte_val(pte) & mask); > -} > - > -static inline unsigned long pte_write(pte_t pte) > -{ > - unsigned long mask; > - > - __asm__ __volatile__( > - "\n661: mov %1, %0\n" > - " nop\n" > - " .section .sun4v_2insn_patch, \"ax\"\n" > - " .word 661b\n" > - " sethi %%uhi(%2), %0\n" > - " sllx %0, 32, %0\n" > - " .previous\n" > - : "=r" (mask) > - : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > - > - return (pte_val(pte) & mask); > -} > - > static inline unsigned long pte_exec(pte_t pte) > { > unsigned long mask; > > base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476 -- Thanks, David / dhildenb