From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED2A8C43334 for ; Thu, 21 Jul 2022 11:22:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 599748E0005; Thu, 21 Jul 2022 07:22:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 549408E0003; Thu, 21 Jul 2022 07:22:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EA508E0005; Thu, 21 Jul 2022 07:22:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 31A258E0003 for ; Thu, 21 Jul 2022 07:22:05 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C0F7C1A08CE for ; Thu, 21 Jul 2022 07:46:05 +0000 (UTC) X-FDA: 79710323490.31.B35AAC0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 47237C001D for ; Thu, 21 Jul 2022 07:46:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658389564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=16t32vV12xwBpAcbt3NrjlFAHE0uEzXQweHiCxuyK+s=; b=OAQKGLGGQt1zqTvoFeUSRI3U28DWXhR1IkYc4j6v55Rq8pcWxsvE9+l9Gq5Hww4/m2vpIO i8ZzK4suTqewiVJqfN/wrrU+SDZNGeAR5ZhTxaqUSyw3Zn0UnnfE4nQXAMRxPmO/VeYZb5 ZmkpqXyz0tgekrY4zj0M9rfwokGBIvg= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-282-2jQ1TzvCNpiIc6h7C0CPvQ-1; Thu, 21 Jul 2022 03:46:03 -0400 X-MC-Unique: 2jQ1TzvCNpiIc6h7C0CPvQ-1 Received: by mail-wm1-f69.google.com with SMTP id q19-20020a7bce93000000b003a3264f3de9so351397wmj.3 for ; Thu, 21 Jul 2022 00:46:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=16t32vV12xwBpAcbt3NrjlFAHE0uEzXQweHiCxuyK+s=; b=NU0CDzdFDArn3Agrn4Cw0suODyg1xaDP5+cAkbHT+XqvGiNjAww9cc1nkpnIU9cvNM emQZiHO5GtuV5WnIuzhHTTT86IK8bOBlVuEojMsgVz5giXe8eOP/Ji+LaNNiIpMZYAEa c+K8Ubo9XjmdgAhhAQX2qeH/fhMSIxo0vPQETv2zeMXGEuwJ/mNHDO4lTZQwkHMIrCAo ryUJSLyGB8GrcXYXHUXrViZEeqHXiP4W1iITx1OjalqgOgNxoirjDi1NVCXF6CqB8rJZ bihMPT/9mW8vxrL4AIO2xiOG8o8AUm3FXilQGBg/40ob+4SlcTOQjhMglOh9++4Z3Dat Rc4g== X-Gm-Message-State: AJIora9GMjsDk/9+WTetDvYGrD3/V/JX8Nr/ZGf1UgwPL/xi0omxWzZq VymIqm06MPdyW1eOayjAJ+MnPL5BbzT8G8o0lJc2ftTZRRxX7RaYWFsQ4AjoiEz48rAXELzYuAc kRms50fGDSak= X-Received: by 2002:a5d:47a1:0:b0:21d:a50e:b2d7 with SMTP id 1-20020a5d47a1000000b0021da50eb2d7mr34599100wrb.58.1658389562064; Thu, 21 Jul 2022 00:46:02 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vAviz4+n+ekfx1YTmywN135cIVAx52CImT2vFjv9x9M2WJivSkHASe1fQ3OazcUhsDnEKhmA== X-Received: by 2002:a5d:47a1:0:b0:21d:a50e:b2d7 with SMTP id 1-20020a5d47a1000000b0021da50eb2d7mr34599015wrb.58.1658389560772; Thu, 21 Jul 2022 00:46:00 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:e000:25d3:15fa:4c8b:7e8d? (p200300cbc707e00025d315fa4c8b7e8d.dip0.t-ipconnect.de. [2003:cb:c707:e000:25d3:15fa:4c8b:7e8d]) by smtp.gmail.com with ESMTPSA id k22-20020a5d5256000000b0021d6a520ce9sm1095513wrc.47.2022.07.21.00.45.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Jul 2022 00:46:00 -0700 (PDT) Message-ID: <7a7377c8-1852-88d3-a768-ed1a6eac0c0c@redhat.com> Date: Thu, 21 Jul 2022 09:45:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 To: Nadav Amit Cc: Linux MM , LKML , Andrew Morton , Mike Rapoport , Axel Rasmussen , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin References: <20220718120212.3180-1-namit@vmware.com> <20220718120212.3180-4-namit@vmware.com> <23a9d678-487e-5940-4cde-dc53d920fb48@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OAQKGLGG; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658389565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=16t32vV12xwBpAcbt3NrjlFAHE0uEzXQweHiCxuyK+s=; b=qIjeK9imxxbckC3EytXGFn7r/uK3uh7FPRNvfcJOnkb7Ogg7yV0wH8CPI4iiIFOBV0tFGJ kfJsvLHpaz5XN73CKdNHWY9B2n7L1YYSayQN/mfYpkH1Jt32h5VWXzOoO41KAwoqjr8JJj gnFrGZpc1XivTkUlPpA9sHTWh8J3g3k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658389565; a=rsa-sha256; cv=none; b=0bu8XYeU3maqiQ34alsb6zRQougOYN8mKc06KBjClccoAKm0LIVYSaxss3rGXmUlOplo8t HJbECmzo8C1CS50MOmQbss889wULLtAXD2mG9NnHM4qNWAz/2fRYewru0qb5A9UD9m918o uG7nCwfT/wFL1PwCZyP5W7iHE4WTuJc= X-Rspam-User: X-Rspamd-Queue-Id: 47237C001D Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OAQKGLGG; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: t4ut4bnjwjmr53o7bnc8s7qgxkmwupg6 X-Rspamd-Server: rspam07 X-HE-Tag: 1658389565-796595 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.07.22 19:25, Nadav Amit wrote: > On Jul 20, 2022, at 8:19 AM, David Hildenbrand wrote: > >> On 18.07.22 14:02, Nadav Amit wrote: >>> From: Nadav Amit >>> >>> Anonymous pages might have the dirty bit clear, but this should not >>> prevent mprotect from making them writable if they are exclusive. >>> Therefore, skip the test whether the page is dirty in this case. >>> >>> Cc: Andrea Arcangeli >>> Cc: Andrew Cooper >>> Cc: Andrew Morton >>> Cc: Andy Lutomirski >>> Cc: Dave Hansen >>> Cc: David Hildenbrand >>> Cc: Peter Xu >>> Cc: Peter Zijlstra >>> Cc: Thomas Gleixner >>> Cc: Will Deacon >>> Cc: Yu Zhao >>> Cc: Nick Piggin >>> Signed-off-by: Nadav Amit >>> --- >>> mm/mprotect.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/mprotect.c b/mm/mprotect.c >>> index 34c2dfb68c42..da5b9bf8204f 100644 >>> --- a/mm/mprotect.c >>> +++ b/mm/mprotect.c >>> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, >>> >>> VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte)); >>> >>> - if (pte_protnone(pte) || !pte_dirty(pte)) >>> + if (pte_protnone(pte)) >>> return false; >>> >>> /* Do we need write faults for softdirty tracking? */ >>> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, >>> page = vm_normal_page(vma, addr, pte); >>> if (!page || !PageAnon(page) || !PageAnonExclusive(page)) >>> return false; >>> - } >>> + } else if (!pte_dirty(pte)) >>> + return false; >>> >>> return true; >>> } >> >> When I wrote that code, I was wondering how often that would actually >> happen in practice -- and if we care about optimizing that. Do you have >> a gut feeling in which scenarios this would happen and if we care? >> >> If the page is in the swapcache and was swapped out, you'd be requiring >> a writeback even though nobody modified the page and possibly isn't >> going to do so in the near future. > > So here is my due diligence: I did not really encounter a scenario in which > it showed up. When I looked at your code, I assumed this was an oversight > and not a thoughtful decision. For me the issue is more of the discrepancy > between how a certain page is handled before and after it was pages out. > > The way that I see it, there is a tradeoff in the way dirty-bit should > be handled: > (1) Writable-clean PTEs introduce some non-negligible overhead. > (2) Marking a PTE dirty speculatively would require a write back. > > … But this tradeoff should not affect whether a PTE is writable, i.e., > mapping the PTE as writable-clean should not cause a writeback. In other > words, if you are concerned about unnecessary writebacks, which I think is a > fair concern, then do not set the dirty-bit. In a later patch I try to avoid > TLB flushes on clean-writable entries that are write-protected. > > So I do not think that the writeback you mentioned should be a real issue. > Yet if you think that using the fact that the page is not-dirty is a good > hueristics to avoid future TLB flushes (for P->NP; as I said there is a > solution for RW->RO), or if you are concerned about the cost of > vm_normal_page(), perhaps those are valid concerned (although I do not think > so). I think I now understand what you mean. I somehow remembered that some architectures set a PTE dirty when marking it writable, but I guess this is not true -- and setting it writable will keep it !dirty until really accessed (either by the HW or by a fault). [I'll do some more digging just to confirm] With that in mind, your patch makes sense, and I guess you'd want that as patch #1, because otherwise I fail to see how current patch #1 would even succeed in reaching the "pte_mkdirty" call -- if !pte_dirty() protects the code from running. > > -- > > [ Regarding (1): After some discussions with Peter and reading more code, I > thought at some point that perhaps avoiding having writable-clean PTE as > much as possible makes sense [*], since setting the dirty-bit costs ~550 > cycles and a page fault is not a lot more than 1000. But with all the > mitigations (and after adding IBRS for retbless) page-fault entry is kind of > expensive. I understand the reasoning for anonymous memory, but not for page cache pages. And for anonymous memory I think there are still cases where we don't want to do that (swapcache and MADV_FREE comes to mind) -- and IMHO some of the scenarios where we have clean anonymous memory at all, we just don't care about optimizing for setting the dirty bit faster on x86 ;) . So my gut feeling is to keep it as simple as possible. To me, it translates to setting the dirty bit only if we have clear indication that write access is likely next. Thanks Nadav for refreshing my memory :) ! -- Thanks, David / dhildenb