From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ACD7C636CC for ; Wed, 1 Feb 2023 00:50:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A2B26B0071; Tue, 31 Jan 2023 19:50:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 053276B0072; Tue, 31 Jan 2023 19:50:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E35A46B0074; Tue, 31 Jan 2023 19:50:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D4AC26B0071 for ; Tue, 31 Jan 2023 19:50:15 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A3762140D5A for ; Wed, 1 Feb 2023 00:50:15 +0000 (UTC) X-FDA: 80416891590.23.A462C50 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 9C61D80014 for ; Wed, 1 Feb 2023 00:50:12 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cCEdodcf; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675212612; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M3PuwOvU6J6nuZvtVe/S589vQXmh8GCRArFcGV4OQfw=; b=XeKQEFjrUnLpcg+6TFT3s/ybsTqaPsWBSvhfPZFue1ItVkYqoLpcRwIiF4/p/7LLHoH56M 8SllTzd5dU4AdQvCpmW6mHvMa1iHD3+ls+fetKj5HaeoQ96NuHmODCipYWoZx78MPmdy+k YzaH0clnX8NOqvvN0RinV5FWPvcNdmM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cCEdodcf; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675212613; a=rsa-sha256; cv=none; b=fZ/z5LhkYvulup27pYKSQs85tDD3BegGUeLF2dD2aMpVl5ZOhzPIKQmiOQeIfWo+9yV3G8 QjCk1SA0+CCykSTlfNwSrPcLDWCYVtgRGABlH6BqjvSnX21jV5keYmXtTltBUc8O2Ji/wC kjRm+GrA+ncDEL7CMoOCHZbFSgJHmHE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675212611; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M3PuwOvU6J6nuZvtVe/S589vQXmh8GCRArFcGV4OQfw=; b=cCEdodcfUbhFxrJpd+1s9CZW45YIPdLHi/a1vE9806wd4Cc4KMmUqJmuZjHJjwFARGjQNX bFFqKvvq/WuJSW7e7je/Ar5nWt0emrKRpzJb9XWVlpT8BeF0X1sHUSlm5OD11MEQmi0JL6 lIODeFTgSoY0VYzNTLm7pYl31fTq6KY= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-342-gRkRRaNzMGSrHKik7OXrZw-1; Tue, 31 Jan 2023 19:50:10 -0500 X-MC-Unique: gRkRRaNzMGSrHKik7OXrZw-1 Received: by mail-qk1-f198.google.com with SMTP id bk3-20020a05620a1a0300b007092ce2a17eso10350366qkb.22 for ; Tue, 31 Jan 2023 16:50:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=M3PuwOvU6J6nuZvtVe/S589vQXmh8GCRArFcGV4OQfw=; b=vCDMEM/doVVgd7uhDdKcVDDbSUo7+Q7+PAWFK2pWK9c9/XA7NDM9e0H92iPf/YrHXv 33M+esTSHOYwq7VgTCYjNcCmM6rTysa3d39NspmKCVHscDDbxyxVQVM4xl+gqaaGrhb1 pIDrp+Cxd/NiuDihJoYd+mnpvrYgBQts8YwEmQIF2w43QCBgt4JeFHgL8tbkBsfSBUg1 zsiqi53LbIYF4fDgC1Uw6QSkElc3hGJaU1fLxYGm018Mg7IuuV8iN7JTokRabdrFtbma 2oFavE0JuSf49TrddtGNro5Bs4frAwfIOtkP316YsapSi78Vfx5Lb2FkJrhDaHfhEwSs +Otw== X-Gm-Message-State: AO0yUKXHTYjkDTJumuI/hiobpYQAf6vIK4edoXxv8LSbV40ti96PZJoM u/Qf/TIyYp73Ak1Iuf56TQOhaJpWo4gJ4XcLv2OhUEyOy5xXSkRWEgKIKgPxqHBNeHQJak+k4Kd tlvxe6pwMgoc= X-Received: by 2002:a0c:cd12:0:b0:537:7bd7:29d8 with SMTP id b18-20020a0ccd12000000b005377bd729d8mr1432016qvm.5.1675212609929; Tue, 31 Jan 2023 16:50:09 -0800 (PST) X-Google-Smtp-Source: AK7set/MlnTC6C7olo0qdiEsPNhEql51dxyDlFlECbadqTZe8CjHAaRAUKN77oNmyjDdy8XaTsGJlg== X-Received: by 2002:a0c:cd12:0:b0:537:7bd7:29d8 with SMTP id b18-20020a0ccd12000000b005377bd729d8mr1431989qvm.5.1675212609617; Tue, 31 Jan 2023 16:50:09 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id 10-20020a370b0a000000b007203bbbbb31sm4543664qkl.47.2023.01.31.16.50.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 16:50:08 -0800 (PST) Date: Tue, 31 Jan 2023 19:50:05 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, sparclinux@vger.kernel.org, Andrew Morton , "David S. Miller" , Hev , Anatoly Pugachev , Raghavendra K T , Thorsten Leemhuis , Mike Kravetz , "Kirill A. Shutemov" , Juergen Gross Subject: Re: [PATCH v1] sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit Message-ID: References: <20221212130213.136267-1-david@redhat.com> <671d9bbb-0f19-2710-00ef-47734085dddc@redhat.com> MIME-Version: 1.0 In-Reply-To: <671d9bbb-0f19-2710-00ef-47734085dddc@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: mjgr6uuy4i8oc4qgwijxcrf8kjrxyaq6 X-Rspam-User: X-Rspamd-Queue-Id: 9C61D80014 X-Rspamd-Server: rspam06 X-HE-Tag: 1675212612-856133 X-HE-Meta: U2FsdGVkX1/fVV/FX5NFGw8C/MpRh5f1JEkdJ0a13ENcCpMNBZIlu1U1r2ZPv2H7YZaBI25/oV/E0TyOI0jNogGHg8xrO2QxpcAc62yjgalSSS88DJwDG/STfrE91x/g5NO+P15LiNjogd+toJ88pXaxUzGgzfJood5kGAoSgn4EdumLbkJRHFoy3fRdGn0HwLRFTgW0BqnoQ/JWGgpaVscv7XtW4+ClIxk3TUj0nUs9ehNhC6fwxKO3vRQSVB1M9bn3j0Kkodomz2z+d38kR81bX8Ql5A1bx6HxBn3PUoxM9iBEc+rnxxn8TgOVsqXECqCxOSa4NkSB9ejrFgN5ufMVAzoomfdVdpO9ZIyXiyoPceMinIjXjNKY4t9+vj8FENO7NR47iGOeJwNu0/zwbZrxPeRaS8XVFOB4Hzax7W3x1jbQKXyyFHzTIYcnYrj5er0r8lg53mPBvDlCFCxb+oslRMgcpSWSwBp7iC89Pzc4mQ3T9ejrUxYNfbqAqVa+zcsmfuIguKVMMnN00qiTwZFx7sbFMcG8QhbEt8zY5QNmnHaRTGDiUR5DGxtBZLnKlw8nsBuxs32JVZeHW+XknsynXGzWyRVZCto0g3y1aT5zmMaDVbd7j+982ZQLVi306Gpjj+qVrq0np4nrrTEVcjRR9QyoEjnzq1nL/pl2j33Wdw45Aci1K5eLMSibpTes1z7iM8gz3AOIP5s7rps4l1AGMQ/TB5ZJCRy3vzo7s1JqB1TgNZiqk38ERuMnpA6OjPgu6xEOSYAv1V0m8v4jZmwOshRQi/t/KRuh9syylcF75xgwiQ7EgO+uK+s3uSFm7TEMgTHe3vr8zbldPiPoDj6iGpzhyuBhDTsVYVtYdqFYjjbmar3Haq12ty+EdtLTxtEYmQXdteWFQzTKep0RtmD95v56sL6Qf474w/+AfZZH12KT25FT4xDwGJ8myJNOewXG6Vk2X6ppUlKi7j5 FyiA5r6s KsN8TnNnx2JSit7UQOWdc4Jwj9OVfteZuk/10eMhmkK75OKGKgyEkmxGktP8DWcnQYhw349jtyi9/NmY74EmBj9fl1Wx86f0m2f+HUbQpWgqsGJnDn8Jwu+8TMoum1NshnoFEGZoy1ZYTXXFXnSarA887FUPMitHYvIOuoGklrtbGIJEaWR4n4+nxcNvFQxo+9qTgnsv762N2dZxfmKBpM1MSDsPP+p1YxqFtv8c2nRyrb4yRz+wj/mMaaq5GKVXPuUDv1SyFIvWYtm+Emlb1/Sgp580xS3JK3wiTvaPpOfGI8fk7X0gK4KKp4UsIKEqYJhjL43LVIT2s+H86JNaBQ1dwQQ2hdXgRGncznR8FDSmLKGVRur2ugoUibmr6CUcnH4hRVSCYXTeYfQ466oqQYXolj+Fm32O6cuM9/UHPtOShAtku8NSRrniUeolHpOgxF2B/ilfa5w1WKt6CAJrCaOcpPKQ8Yt9d86BHBcBwyBvUE4cPLPJ2sP07IxmeCjNnvFY5uNn1ksvUv6keHIVKs9prpFizjuqNLsivbWwylELwicAzCEEEHQub6ToN614AsjJVOhXNr8H0OOlGKBWal4Y+XZPLgD6yteH+/RtNVNsxV4Er81hGnXNQCAB3ZQ4ntW9dZ9pCFRoz626DMUjHmRASbLJiieMP3C3PAbhlL0yZ25tdKvxaWU6CVEN6b3fCqdUSVrOLUEtzMMf2DV9J3xNO4XnaoTI9Fuh0ihy+rbDNVZeJyNgG2fbCsp6mS9aPtC1gvHgqnfqkzzT6VotzQ1TM1H1pePJ8qS3Z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 31, 2023 at 09:47:01AM +0100, David Hildenbrand wrote: > On 12.12.22 14:02, David Hildenbrand wrote: > > On sparc64, there is no HW modified bit, therefore, SW tracks via a SW > > bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() > > currently also unconditionally sets the HW writable bit, which is wrong. > > > > pte_mkdirty() is not supposed to make a PTE actually writable, unless the > > SW writable bit (pte_write()) indicates that the PTE is not > > write-protected. Fortunately, sparc64 also defines a SW writable bit. > > > > For example, this already turned into a problem in the context of > > THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: > > carry over dirty bit when thp splits on pmd") and might be an issue during > > page migration in mm/migrate.c:remove_migration_pte() as well where we: > > if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) > > pte = pte_mkdirty(pte); > > > > But more general, anything like: > > maybe_mkwrite(pte_mkdirty(pte), vma) > > code is broken on sparc64, because it will unconditionally set the HW > > writable bit even if the SW writable bit is not set. > > > > Simple reproducer that will result in a writable PTE after ptrace > > access, to highlight the problem and as an easy way to verify if it has > > been fixed: > > > > -------------------------------------------------------------------------- > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > > > static void signal_handler(int sig) > > { > > if (sig == SIGSEGV) > > printf("[PASS] SIGSEGV generated\n"); > > else > > printf("[FAIL] wrong signal generated\n"); > > exit(0); > > } > > > > int main(void) > > { > > size_t pagesize = getpagesize(); > > char data = 1; > > off_t offs; > > int mem_fd; > > char *map; > > int ret; > > > > mem_fd = open("/proc/self/mem", O_RDWR); > > if (mem_fd < 0) { > > fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno); > > return 1; > > } > > > > map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE|MAP_ANON, -1 ,0); > > if (map == MAP_FAILED) { > > fprintf(stderr, "mmap() failed: %d\n", errno); > > return 1; > > } > > > > printf("original: %x\n", *map); > > > > /* debug access */ > > offs = lseek(mem_fd, (uintptr_t) map, SEEK_SET); > > ret = write(mem_fd, &data, 1); > > if (ret != 1) { > > fprintf(stderr, "pwrite(/proc/self/mem) failed with %d: %d\n", ret, errno); > > return 1; > > } > > if (*map != data) { > > fprintf(stderr, "pwrite(/proc/self/mem) not visible\n"); > > return 1; > > } > > > > printf("ptrace: %x\n", *map); > > > > /* Install signal handler. */ > > if (signal(SIGSEGV, signal_handler) == SIG_ERR) { > > fprintf(stderr, "signal() failed\n"); > > return 1; > > } > > > > /* Ordinary access. */ > > *map = 2; > > > > printf("access: %x\n", *map); > > > > printf("[FAIL] SIGSEGV not generated\n"); > > > > return 0; > > } > > -------------------------------------------------------------------------- > > > > Without this commit (sun4u in QEMU): > > # ./reproducer > > original: 0 > > ptrace: 1 > > access: 2 > > [FAIL] SIGSEGV not generated > > > > Let's fix this by setting the HW writable bit only if both, the SW dirty > > bit and the SW writable bit are set. This matches, for example, how > > s390x handles pte_mkwrite() and pte_mkdirty() -- except, that they have > > to clear the _PAGE_PROTECT bit. > > > > We have to move pte_dirty() and pte_dirty() up. The code patching > > mechanism and handling constants > 22bit is a bit special on sparc64. > > > > With this commit (sun4u in QEMU): > > # ./reproducer > > original: 0 > > ptrace: 1 > > [PASS] SIGSEGV generated > > > > This handling seems to have been in place forever. > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Cc: Andrew Morton > > Cc: "David S. Miller" > > Cc: Peter Xu > > Cc: Hev > > Cc: Anatoly Pugachev > > Cc: Raghavendra K T > > Cc: Thorsten Leemhuis > > Cc: Mike Kravetz > > Cc: "Kirill A. Shutemov" > > Cc: Juergen Gross > > Signed-off-by: David Hildenbrand > > --- > > Ping I agree with David that the current sparc64 impl of pte_mkdirty is suspecious. What David mentioned on page migration above is correct and has another report here from Nick recently: https://lore.kernel.org/all/CADyTPEzsvdRC15+Z5T3oryofwRYqHmHzwqRmJKJoHB3d7Tdayw@mail.gmail.com/ If this patch is hopefully correct (which I cannot tell as I know little on sparc64) and can be merged, it'll be the cleanest solution, comparing to what I provided here: https://lore.kernel.org/all/Y9bvwz4FIOQ+D8c4@x1n/ And I assume it'll also fix things like the reproducer being attached on wrongly applying write bit with FOLL_FORCE, so it fixes more than that. I plan to keep posting that fix I referenced above for the breakage because that'll still be the safest so far, but that can change if someone from sparc64 can have a look at this and ack it. Thanks, -- Peter Xu