From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36919C433EF for ; Tue, 26 Oct 2021 20:07:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DD98360F9D for ; Tue, 26 Oct 2021 20:07:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DD98360F9D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5611C80007; Tue, 26 Oct 2021 16:07:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51093940007; Tue, 26 Oct 2021 16:07:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DA3380007; Tue, 26 Oct 2021 16:07:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id 30C19940007 for ; Tue, 26 Oct 2021 16:07:42 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E025F26DE8 for ; Tue, 26 Oct 2021 20:07:41 +0000 (UTC) X-FDA: 78739673922.02.30A645F Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf06.hostedemail.com (Postfix) with ESMTP id 7ED75801AB10 for ; Tue, 26 Oct 2021 20:07:41 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id l203so537187pfd.2 for ; Tue, 26 Oct 2021 13:07:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=QZfJ0JXZD4hGi2c/axrYXIwzz739J14LlWMbDQhGfy8=; b=jh+U/f/503Pyc9kVPGIvebraJNvIizTIstIOIE1dxwjZueaEesfI9VYURJTCDK6Cu+ QeKBSrHGodLrSQy10WPqy1ST35WRGeTiipkH8j+zL9e/7KDudMKnonH7jKHNNJNv8v9m +cDAlXj2ynVqKOtnCfbf2CGVFtIvBMSCDHrd8p1DMeC6+vqvGinhRvXUHB/HyAYmgNyM sebiDRFeJDdTGmdTGuTExQ5VPSoqHUSqjEiTDslWxaIC8t948mvk4DJy+gLQ2sSH+pl+ nNxQv7QYUQMzFaqyXmd9X0B20CjiIdanh1GZfWgcV7UK4AmEF2N7DyEXjkBsR+kLwZvZ 9LaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=QZfJ0JXZD4hGi2c/axrYXIwzz739J14LlWMbDQhGfy8=; b=6HhpcdcubO/TdW56woOgu7hkpCuniPP39NrNYlGLQWIuIoWYE+tZZE19eavsWeuYKI Ng9eijGiuUwgfAFFMFaaZMklqhEIAzKQAOjKXTEq9NwjTP3JcTE0WBKTaqQzBDg6XLls tCBCwgFj4jll7M8B05ITrk6b3roo2gNhPLp2S/j0KFEgL7hkKEtZzwbg6Ibce2MpFLNW A6KfCBbznx3yqOSBIm2CVL0azV+w3mvVVyXoRl5bj+xN5Be4nzOhC6IcfQDaiblFdDuw cRNKD2hEowBT3lu4gaynauYkip2DTtRVNqT0CIKjyzwD/+XE18nfp4JuEsErJJnS4F3j YKbA== X-Gm-Message-State: AOAM530fESagEztCtUUIn68++opZejtV30D8jwMydJzI0DpWA43WVrGr vPCvKF3TL6C0bUTS7ULQY10= X-Google-Smtp-Source: ABdhPJy83ItqvUCXCnArppd51Bzp3kdKg/8uzw4G5Ylx8Mp9/j3u2ACaX1DZx7ZKUesgwfu1GgrU3Q== X-Received: by 2002:a63:7706:: with SMTP id s6mr14182169pgc.184.1635278859806; Tue, 26 Oct 2021 13:07:39 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id g22sm3726123pfc.202.2021.10.26.13.07.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Oct 2021 13:07:39 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() From: Nadav Amit In-Reply-To: <4f604380-a52b-660c-af82-541dbd7652e4@intel.com> Date: Tue, 26 Oct 2021 13:07:37 -0700 Cc: Linux-MM , LKML , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , "x86@kernel.org" Content-Transfer-Encoding: quoted-printable Message-Id: <640A6374-A06B-4E20-BF5D-9A21CC85CB12@gmail.com> References: <20211021122112.592634-1-namit@vmware.com> <20211021122112.592634-3-namit@vmware.com> <29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com> <435f41f2-ffd4-0278-9f26-fbe2c2c7545c@intel.com> <8BC74789-FF33-403F-B5D7-19034CAC7EE6@gmail.com> <4f604380-a52b-660c-af82-541dbd7652e4@intel.com> To: Dave Hansen X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7ED75801AB10 X-Stat-Signature: mstawhmn3cfy1tqtbcbks6qhndbstxc5 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="jh+U/f/5"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-HE-Tag: 1635278861-564035 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Oct 26, 2021, at 12:40 PM, Dave Hansen = wrote: >=20 > On 10/26/21 12:06 PM, Nadav Amit wrote: >>=20 >> To make it very clear - consider the following scenario, in which >> a volatile pointer p is mapped using a certain PTE, which is RW >> (i.e., *p is writable): >>=20 >> CPU0 CPU1 >> ---- ---- >> x =3D *p >> [ PTE cached in TLB;=20 >> PTE is not dirty ] >> clear_pte(PTE) >> *p =3D x >> [ needs to set dirty ] >>=20 >> Note that there is no TLB flush in this scenario. The question >> is whether the write access to *p would succeed, setting the >> dirty bit on the clear, non-present entry. >>=20 >> I was under the impression that the hardware AD-assist would >> recheck the PTE atomically as it sets the dirty bit. But, as I >> said, I am not sure anymore whether this is defined architecturally >> (or at least would work in practice on all CPUs modulo the=20 >> Knights Landing thingy). >=20 > Practically, at "x=3D*p", he thing that gets cached in the TLB will > Dirty=3D0. At the "*p=3Dx", the CPU will decide it needs to do a = write, > find the Dirty=3D0 entry and will entirely discard it. In other = words, it > *acts* roughly like this: >=20 > x =3D *p =09 > INVLPG(p) > *p =3D x; >=20 > Where the INVLPG() and the "*p=3Dx" are atomic. So, there's no > _practical_ problem with your scenario. This specific behavior isn't > architectural as far as I know, though. >=20 > Although it's pretty much just academic, as for the architecture, are > you getting hung up on the difference between the description of = "Accessed": >=20 > Whenever the processor uses a paging-structure entry as part of > linear-address translation, it sets the accessed flag in that > entry >=20 > and "Dirty:" >=20 > Whenever there is a write to a linear address, the processor > sets the dirty flag (if it is not already set) in the paging- > structure entry... >=20 > Accessed says "as part of linear-address translation", which means = that > the address must have a translation. But, the "Dirty" section doesn't > say that. It talks about "a write to a linear address" but not = whether > there is a linear address *translation* involved. >=20 > If that's it, we could probably add a bit like: >=20 > In addition to setting the accessed flag, whenever there is a > write... >=20 > before the dirty rules in the SDM. >=20 > Or am I being dense and continuing to miss your point? :) I think this time you got my question right. I was thrown off by the SDM comment on RW permissions vs dirty that I mentioned before: "If software on one logical processor writes to a page while software on another logical processor concurrently clears the R/W flag in the paging-structure entry that maps the page, execution on some processors = may result in the entry=E2=80=99s dirty flag being set (due to the write on = the first logical processor) and the entry=E2=80=99s R/W flag being clear (due to = the update to the entry on the second logical processor).=E2=80=9D I did not pay enough attention to these small differences that you = mentioned between access and dirty this time (although I did notice them before). I do not think that the change that you offered to the SDM really = clarifies the situation. Setting the access flag is done as part of caching the = PTE in the TLB. The SDM change you propose does not clarify the atomicity of = the permission/PTE-validity check and dirty-bit setting or the fact the PTE = is invalidated if the dirty-bit needs to be set and is cached as clear [I = do not presume you would want the latter in the SDM, since it is an = implementation detail.] I just wonder how come the R/W-clearing and the P-clearing cause = concurrent dirty bit setting to behave differently. I am not a hardware guy, but I = would imagine they would be the same...