From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B497C433EF for ; Fri, 10 Jun 2022 18:42:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62B788D00DA; Fri, 10 Jun 2022 14:42:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B41D8D00CB; Fri, 10 Jun 2022 14:42:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4064F8D00DA; Fri, 10 Jun 2022 14:42:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2BA8B8D00CB for ; Fri, 10 Jun 2022 14:42:11 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EF70760556 for ; Fri, 10 Jun 2022 18:42:10 +0000 (UTC) X-FDA: 79563196020.01.38A61DA Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 558F216007A for ; Fri, 10 Jun 2022 18:42:10 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id bo5so175536pfb.4 for ; Fri, 10 Jun 2022 11:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2h6QeJaNJ/l4Ap+MNJ9DmJJR14ra8zD82bDczXJvOPg=; b=HiwkAb7BfoZkeg4NEv/W2OaaZPE9d+Gk/Yw5FW3aUcRAzAp3Mucr0nJhaUhGn6HGCH cgeRrGXstQus6STOmurztQB9uXTdjM9win6gHdsdMDnpJvEreGBYyHVbKnnuo68Uu4fg LIIq/3kyVYLT2pMy+0ve9L53OCXoOPeJB0Yg7v8OQ1pqwRDzTTvpnJ/NX0VHM/v1CMkC hj9X1y9hpB2kJXIxXVeB00/eLvXbdLdHvfaOHC1/RNiNIHxy4YKh2JpUJzpgj0PYgt94 gUTTV3JofLeMy5CWJX+DnR7vxWWj97iQZ3LCtbk95je4ai1UFwtMnqAPYx2Aef/Ua3BW gHeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2h6QeJaNJ/l4Ap+MNJ9DmJJR14ra8zD82bDczXJvOPg=; b=hVZ6KUaeR81Sbu8rn1sRgoR1RIPrDrAbpNH0MikdMd8hlArlNcKwigYXSlYZia0yWE Lu/N+tbMQeifDVPPsEF4uYZNTI/z9Es9W/NhetVESu6Mnzjxmzwpox0RdqdzpfETw8k6 2WknnN9tAvdd4AFL/kWGxXMcR6TxisB/wsqLocMmtu3bFnZdmR329vsId3i0jo4nMKtt msmYNevIOpCd1OA7UlntRRKkE7huRHXZK/Lc9CDfghwdcIX0V7gjlWlrZMa6JP5lE/iP UUF7Nv/gTCGUQDQSahVGZ8V9Ejdvggkrb/CHYemEPiJF17pLVi1qzzikCY6RdMM+lxX1 hHwg== X-Gm-Message-State: AOAM5315LLUiv6h8B1WtRFyAVbrP7zQ72p1+N7mabnBtUtGRnkJC4cuS 3MbotyefOfAnv+koPKWSIbY= X-Google-Smtp-Source: ABdhPJyoFDsLxAOfecE6ELepO0mzZc9lCt51m44axIKo3X/u7JuVQU/8QY+4YXTGI+02KUKOhScEdQ== X-Received: by 2002:a05:6a00:1387:b0:51c:2712:7859 with SMTP id t7-20020a056a00138700b0051c27127859mr26113490pfg.38.1654886527983; Fri, 10 Jun 2022 11:42:07 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id 21-20020a621815000000b0051b4e53c487sm19874600pfy.45.2022.06.10.11.42.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jun 2022 11:42:07 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH v2] mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection From: Nadav Amit In-Reply-To: <20220610181436.84713-1-david@redhat.com> Date: Fri, 10 Jun 2022 11:42:06 -0700 Cc: LKML , linux-mm , Linus Torvalds , Andrew Morton , Dave Hansen , Andrea Arcangeli , Peter Xu , Yang Shi , Hugh Dickins , Mel Gorman Content-Transfer-Encoding: 7bit Message-Id: <5DFB7262-6E32-4984-A346-B7DE5040B12F@gmail.com> References: <20220610181436.84713-1-david@redhat.com> To: David Hildenbrand X-Mailer: Apple Mail (2.3696.100.31) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1654886530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2h6QeJaNJ/l4Ap+MNJ9DmJJR14ra8zD82bDczXJvOPg=; b=VK5lrw5dMaxtt7b/CP1amXrExZN2/dV7pV7SGrR0hnJ0pPmO30Aia084ESyoi895aW8pfN f0O7jjWDvVFQop6A47hl+ZgElrvzNbH6dlnSYSLa4SW+0iK3EDyT5H/S5nuqkl75FrXTSZ Y84Pv3LprD4AYru6HNVK3PvYwjuHkq0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1654886530; a=rsa-sha256; cv=none; b=wVxBJBqeggyHNgEQmph5AHpOl6WDvnMWneEHOtfbcD1n2CQ9/sQy7mBglVcYaP8AILka4/ r01k376A+GD7574LtTxAdT7/QWu1nZ0w3zkCPVeBNtf2Af+dTDq2661+Cdn4dDVxGnnJIh rthK0zZ4pv6n2hT1n7+aimLGtCFgeH0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HiwkAb7B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HiwkAb7B; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: ad4ug3pa8b4z6tyqs94gmtbfi1bk5jmk X-Rspamd-Queue-Id: 558F216007A X-HE-Tag: 1654886530-284764 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jun 10, 2022, at 11:14 AM, David Hildenbrand wrote: > Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we > can try mapping anonymous pages writable if they are exclusive, > the PTE is already dirty, and no special handling applies. Mapping the > PTE writable is essentially the same thing the write fault handler would do > in this case. > > Special handling is required for uffd-wp and softdirty tracking, so take > care of that properly. Also, leave PROT_NONE handling alone for now; > in the future, we could similarly extend the logic in do_numa_page() or > use pte_mk_savedwrite() here. Note that we'll now also check for uffd-wp in > case of VM_SHARED -- which is harmless and prepares for uffd-wp support for > shmem. > > While this improves mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE) > performance, it should also be a valuable optimization for uffd-wp, when > un-protecting. > > Applying the same logic to PMDs (anonymous THP, anonymous hugetlb) is > probably not worth the trouble, but could similarly be added if there is > demand. > > Results of a simple microbenchmark on my Ryzen 9 3900X, comparing the new > optimization (avoiding write faults) during mprotect() with softdirty > tracking, where we require a write fault. > > Running 1000 iterations each > > ========================================================== > Measuring memset() of 4096 bytes > First write access: > Min: 169 ns, Max: 8997 ns, Avg: 830 ns > Second write access: > Min: 80 ns, Max: 251 ns, Avg: 168 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 180 ns, Max: 290 ns, Avg: 190 ns > Write access after clearing softdirty: > Min: 451 ns, Max: 1774 ns, Avg: 470 ns > -> mprotect = 1.131 * second [avg] > -> mprotect = 0.404 * softdirty [avg] > ---------------------------------------------------------- > Measuring single byte access per page of 4096 bytes > First write access: > Min: 761 ns, Max: 1152 ns, Avg: 784 ns > Second write access: > Min: 130 ns, Max: 181 ns, Avg: 137 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 150 ns, Max: 1553 ns, Avg: 155 ns > Write access after clearing softdirty: > Min: 169 ns, Max: 1783 ns, Avg: 432 ns > -> mprotect = 1.131 * second [avg] > -> mprotect = 0.359 * softdirty [avg] > ========================================================== > Measuring memset() of 16384 bytes > First write access: > Min: 1594 ns, Max: 3497 ns, Avg: 2143 ns > Second write access: > Min: 250 ns, Max: 381 ns, Avg: 260 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 290 ns, Max: 1643 ns, Avg: 300 ns > Write access after clearing softdirty: > Min: 1242 ns, Max: 8987 ns, Avg: 1297 ns > -> mprotect = 1.154 * second [avg] > -> mprotect = 0.231 * softdirty [avg] > ---------------------------------------------------------- > Measuring single byte access per page of 16384 bytes > First write access: > Min: 1953 ns, Max: 2945 ns, Avg: 2008 ns > Second write access: > Min: 130 ns, Max: 912 ns, Avg: 142 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 160 ns, Max: 240 ns, Avg: 166 ns > Write access after clearing softdirty: > Min: 1112 ns, Max: 1513 ns, Avg: 1126 ns > -> mprotect = 1.169 * second [avg] > -> mprotect = 0.147 * softdirty [avg] > ========================================================== > Measuring memset() of 65536 bytes > First write access: > Min: 7524 ns, Max: 15650 ns, Avg: 7680 ns > Second write access: > Min: 251 ns, Max: 1323 ns, Avg: 648 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 270 ns, Max: 1282 ns, Avg: 736 ns > Write access after clearing softdirty: > Min: 4558 ns, Max: 12524 ns, Avg: 4623 ns > -> mprotect = 1.136 * second [avg] > -> mprotect = 0.159 * softdirty [avg] > ---------------------------------------------------------- > Measuring single byte access per page of 65536 bytes > First write access: > Min: 7083 ns, Max: 9027 ns, Avg: 7241 ns > Second write access: > Min: 140 ns, Max: 201 ns, Avg: 156 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 190 ns, Max: 451 ns, Avg: 197 ns > Write access after clearing softdirty: > Min: 3707 ns, Max: 5119 ns, Avg: 3958 ns > -> mprotect = 1.263 * second [avg] > -> mprotect = 0.050 * softdirty [avg] > ========================================================== > Measuring memset() of 524288 bytes > First write access: > Min: 58470 ns, Max: 87754 ns, Avg: 59353 ns > Second write access: > Min: 5180 ns, Max: 6863 ns, Avg: 5318 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 5871 ns, Max: 9358 ns, Avg: 6028 ns > Write access after clearing softdirty: > Min: 35797 ns, Max: 41338 ns, Avg: 36710 ns > -> mprotect = 1.134 * second [avg] > -> mprotect = 0.164 * softdirty [avg] > ---------------------------------------------------------- > Measuring single byte access per page of 524288 bytes > First write access: > Min: 53751 ns, Max: 59431 ns, Avg: 54506 ns > Second write access: > Min: 781 ns, Max: 2194 ns, Avg: 1123 ns > Write access after mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE): > Min: 161 ns, Max: 1282 ns, Avg: 622 ns > Write access after clearing softdirty: > Min: 30888 ns, Max: 34565 ns, Avg: 31229 ns > -> mprotect = 0.554 * second [avg] > -> mprotect = 0.020 * softdirty [avg] > > Cc: Linus Torvalds > Cc: Andrew Morton > Cc: Nadav Amit > Cc: Dave Hansen > Cc: Andrea Arcangeli > Cc: Peter Xu > Cc: Yang Shi > Cc: Hugh Dickins > Cc: Mel Gorman > Signed-off-by: David Hildenbrand > --- > > v1 -> v2: > * Rebased on v5.19-rc1 > * Rerun benchmark > * Fix minor spelling issues in subject+description > * Drop IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) check > * Move pte_write() check into caller > > --- > mm/mprotect.c | 67 ++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 55 insertions(+), 12 deletions(-) > > diff --git a/mm/mprotect.c b/mm/mprotect.c > index ba5592655ee3..728772bf41c7 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -38,6 +38,45 @@ > > #include "internal.h" > > +static inline bool can_change_pte_writable(struct vm_area_struct *vma, > + unsigned long addr, pte_t pte, > + unsigned long cp_flags) > +{ > + struct page *page; > + > + if ((vma->vm_flags & VM_SHARED) && !(cp_flags & MM_CP_DIRTY_ACCT)) > + /* > + * MM_CP_DIRTY_ACCT is only expressive for shared mappings; > + * without MM_CP_DIRTY_ACCT, there is nothing to do. > + */ > + return false; > + > + if (pte_protnone(pte) || !pte_dirty(pte)) > + return false; > + > + /* Do we need write faults for softdirty tracking? */ > + if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte)) > + return false; > + > + /* Do we need write faults for uffd-wp tracking? */ > + if (userfaultfd_pte_wp(vma, pte)) > + return false; > + > + if (!(vma->vm_flags & VM_SHARED)) { > + /* > + * We can only special-case on exclusive anonymous pages, > + * because we know that our write-fault handler similarly would > + * map them writable without any additional checks while holding > + * the PT lock. > + */ > + page = vm_normal_page(vma, addr, pte); > + if (!page || !PageAnon(page) || !PageAnonExclusive(page)) > + return false; > + } > + > + return true; > +} > + Looks good in general. Just wondering (out loud) whether it makes more sense to do all the vm_flags and cp_flags related checks in one of the callers (mprotect_fixup()?) and propagate whether to try to write-unprotect in cp_flags (e.g., by introducing new MM_CP_TRY_WRITE_UNPROTECT).