From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C79DC4332F for ; Tue, 26 Oct 2021 17:44:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 275E261040 for ; Tue, 26 Oct 2021 17:44:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 275E261040 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9D306940008; Tue, 26 Oct 2021 13:44:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95BF4940007; Tue, 26 Oct 2021 13:44:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FC55940008; Tue, 26 Oct 2021 13:44:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 69A25940007 for ; Tue, 26 Oct 2021 13:44:19 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2A64C181B0499 for ; Tue, 26 Oct 2021 17:44:19 +0000 (UTC) X-FDA: 78739312638.27.25B61DE Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf25.hostedemail.com (Postfix) with ESMTP id 99A87B000197 for ; Tue, 26 Oct 2021 17:44:12 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id k2-20020a17090ac50200b001a218b956aaso74215pjt.2 for ; Tue, 26 Oct 2021 10:44:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=kXRXS8Y6TbgPFlvpdFdUOZ/XrEjgtr0lNIGMPXzFPQM=; b=LzGqEyDvC7gPkF2MEXReZdkjI3Zj8Qc/uVx65GGc+qFI+SbU8ptu06Jrs0j+ZcsYMa ZBfDK/b/wxHTPgY2neme4toglimWMBTL8JzAECboOHJSvR1VcfIh3QXOldFEaYY4uNCv O55nN3ChgSAREx0nLlR7Y1A0re8iaxknP/d0Jiocir8qbXFFyKC/RJIKm7bnAhGfcYWR PBteDTtQlKyUmbLGU+cgGExdlVFFlQ6MDFBmPGYE9eG8L2UK/m0SEhuehm/Gmmbh89Bx XrQHfqZRi7Ofs/N1u5m3ffHmFHRiiDa3ygR3YwVLUyQLLA2W2oT4ujVMRVqYAKVoBOS+ owEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=kXRXS8Y6TbgPFlvpdFdUOZ/XrEjgtr0lNIGMPXzFPQM=; b=oHw6E5saNy4kLBvVHRScW8ZlxDhf+nugkQw4B07MSLZ4orqYqKjnUEXqtr8gtCdg8Q WiLu2Bq8ceqEh0DqEFbb43cdQv+/XU1aNn9hK7nzkdYIEop8FlCswwM50kJqe00NLUoL gG9VL0ADoUCG6MWN4D+RAKPRVpSyuKC05OIF6UbUce3gU5krFWnwNntu3vQZgv1CLMXo i3yFmAY8QbSkrg0bRk/1QcdXV9m5BzmL93boflkN319o8Ae/FJd74Oe/pHjUeKO61bDA saQEMtACpte3O/Ut11Rtt+BSghhGO2hbVLdYOcRW5Lt6QOznS7C4YM7StlUwVS3szy7J dv7A== X-Gm-Message-State: AOAM5315OF/UuqWJtvrxt3b5Hg6EOITX+NKSKAPyJHuP0fqimRNZiHHr Bg5eEXQ6nG0oAAMZYxLqFEc= X-Google-Smtp-Source: ABdhPJzkWColX6WDq9s+VEMYbSV4sVULqSFenGNLcBIbjrJZFvs5uzvhB4Am6FP1R/ct/8d5n7YBqQ== X-Received: by 2002:a17:90b:3841:: with SMTP id nl1mr191780pjb.12.1635270257465; Tue, 26 Oct 2021 10:44:17 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id nn14sm1499045pjb.27.2021.10.26.10.44.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Oct 2021 10:44:16 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() From: Nadav Amit In-Reply-To: Date: Tue, 26 Oct 2021 10:44:15 -0700 Cc: Dave Hansen , Linux-MM , LKML , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , "x86@kernel.org" Content-Transfer-Encoding: quoted-printable Message-Id: <29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com> References: <20211021122112.592634-1-namit@vmware.com> <20211021122112.592634-3-namit@vmware.com> To: Nadav Amit X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Stat-Signature: 5tp7r43fmgdm5gufrubqzzgo4hyej6tm Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LzGqEyDv; spf=pass (imf25.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 99A87B000197 X-HE-Tag: 1635270252-298519 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Oct 26, 2021, at 9:47 AM, Nadav Amit wrote: >=20 >=20 >=20 >> On Oct 26, 2021, at 9:06 AM, Dave Hansen = wrote: >>=20 >> On 10/21/21 5:21 AM, Nadav Amit wrote: >>> The first TLB flush is only necessary to prevent the dirty bit (and = with >>> a lesser importance the access bit) from changing while the PTE is >>> modified. However, this is not necessary as the x86 CPUs set the >>> dirty-bit atomically with an additional check that the PTE is = (still) >>> present. One caveat is Intel's Knights Landing that has a bug and = does >>> not do so. >>=20 >> First, did I miss the check in this patch for X86_BUG_PTE_LEAK? I = don't >> see it anywhere. >=20 > No, it is me who missed it. It should have been in = pmdp_invalidate_ad(): >=20 > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > index 3481b35cb4ec..f14f64cc17b5 100644 > --- a/arch/x86/mm/pgtable.c > +++ b/arch/x86/mm/pgtable.c > @@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd) > return 0; > } >=20 > +/* > + * pmdp_invalidate_ad() - prevents the access and dirty bits from = being further > + * updated by the CPU. > + * > + * Returns the original PTE. > + * > + * During an access to a page, x86 CPUs set the dirty and access bit = atomically > + * with an additional check of the present-bit. Therefore, it is = possible to > + * avoid the TLB flush if we change the PTE atomically, as = pmdp_establish does. > + * > + * We do not make this optimization on certain CPUs that has a bug = that violates > + * this behavior (specifically Knights Landing). > + */ > +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long = address, > + pmd_t *pmdp) > +{ > + pmd_t old =3D pmdp_establish(vma, address, pmdp, = pmd_mkinvalid(*pmdp)); > + > + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) > + flush_pmd_tlb_range(vma, address, address + = HPAGE_PMD_SIZE); > + return old; > +} >=20 >>=20 >>> - * pmdp_invalidate() is required to make sure we don't miss >>> - * dirty/young flags set by hardware. >>=20 >> This got me thinking... In here: >>=20 >>> = https://nam04.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flore.k= ernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&dat= a=3D04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb391= 38ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFp= bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3Do8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&r= eserved=3D0 >>=20 >> I wrote: >>=20 >>> These bits are truly "stray". In the case of the Dirty bit, the >>> thread associated with the stray set was *not* allowed to write to >>> the page. This means that we do not have to launder the bit(s); we >>> can simply ignore them. >>=20 >> Is the goal of your proposed patch here to ensure that the dirty bit = is >> not set at *all*? Or, is it to ensure that a dirty bit which we need = to >> *launder* is never set? >=20 > At *all*. >=20 > Err=E2=80=A6 I remembered from our previous discussions that the dirty = bit cannot > be set once the R/W bit is cleared atomically. But going back to the = SDM, > I see the (relatively new?) note: >=20 > "If software on one logical processor writes to a page while software = on > another logical processor concurrently clears the R/W flag in the > paging-structure entry that maps the page, execution on some = processors may > result in the entry=E2=80=99s dirty flag being set (due to the write = on the first > logical processor) and the entry=E2=80=99s R/W flag being clear (due = to the update > to the entry on the second logical processor). This will never occur = on a > processor that supports control-flow enforcement technology (CET)=E2=80=9D= >=20 > So I guess that this optimization can only be enabled when CET is = enabled. >=20 > :( I still wonder whether the SDM comment applies to present bit vs dirty bit atomicity as well. On AMD=E2=80=99s APM I find: "The processor never sets the Accessed bit or the Dirty bit for a not present page (P =3D 0). The ordering of Accessed and Dirty bit updates with respect to surrounding loads and stores is discussed below.=E2=80=9D ( The later comment regards ordering to WC memory ). I don=E2=80=99t know if I read it too creatively...