From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB5CBC433F5 for ; Fri, 11 Mar 2022 20:53:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9D228D0002; Fri, 11 Mar 2022 15:53:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4CAB8D0001; Fri, 11 Mar 2022 15:53:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC7958D0002; Fri, 11 Mar 2022 15:53:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id BF1248D0001 for ; Fri, 11 Mar 2022 15:53:34 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6D6AA181C9B8A for ; Fri, 11 Mar 2022 20:53:34 +0000 (UTC) X-FDA: 79233306348.16.6CF1B6A Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf27.hostedemail.com (Postfix) with ESMTP id 05E5F40027 for ; Fri, 11 Mar 2022 20:53:33 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id kx6-20020a17090b228600b001bf859159bfso12126049pjb.1 for ; Fri, 11 Mar 2022 12:53:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=iHGGTD8PyTVJNiJh8Y1TqDmvdURAM0MJZFSp08G78Is=; b=OmeYL54C6iNXt9LA5vp8y8mnjIFX2umARAfHq156jOatkVuSmDuZIwkS7f2FW0qwaM 5ZcUlJrxBhu2aRiZalij+7oG7d8AjI8f/+p3yjZccOkt16ApsErI2vLxyuKS6DuL/2dF XewBS297splmdPZ0m9Tf4z1eJIcZx+4akKT6g2aj67G5AJJXej6AjUlZn8pE7Xix+uW9 +wHOlFU0fia6lsu9e/MAehDrINiQnCIHHF1csnbTAW/Bycm1C9QSB31Vecjarp+QCVOP HiORxMdyYLmTdnvP+r5VuzNjUvSiT1vhHOd/KScpx8we9rsWuGWTtNNy49EIk5zYOuNt Ra1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=iHGGTD8PyTVJNiJh8Y1TqDmvdURAM0MJZFSp08G78Is=; b=bRD4Y+4dCW9tDGWehUVjOdxVrqi1mrESL9lzcCLvH1kG9lYTnfB5BAYBaag3qyTuQ3 PAh/bEucrtk2ww3+oZx9ETRnOEGCNF3l0fbYBlpJbrwSZDwCAC8SqL/Z7wbJu5lPwk3r /FOE9K3Z9hzVYeYQj4qhelzUi2pzuKsUjDExKIEYrzaBwZzwZi8tGQk42F6DkatbxTqF at9HM/ms21hJcHv5dWmrzPyKNRqIuTvmzG47wy9DSMXhVc4MKr0QUAO/7zxEw6MjTT29 8vQzGwwdPrC8huRDPaGQaLF6tL83pO3c2YDTu6ILsKzjtpqbQ8U66sjB+PWg63MLswtm fG1Q== X-Gm-Message-State: AOAM532rMmIw9akwpY8AUoEvUp8fBFuWpRnNLYYkLUsSMxg2r2ZQ9tP9 Fm46hjWT/++CI/nqFYP3E+k= X-Google-Smtp-Source: ABdhPJxyNiCIk7FbxSXHe8ETZAAIY9TdBqDpTX658H+y+43oFI+tublXZJqo5GDPLxBqu4SZR0vqAw== X-Received: by 2002:a17:902:744c:b0:153:29de:55aa with SMTP id e12-20020a170902744c00b0015329de55aamr8245142plt.2.1647032012678; Fri, 11 Mar 2022 12:53:32 -0800 (PST) Received: from smtpclient.apple ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id s15-20020a63af4f000000b0037c8875108dsm9332340pgo.45.2022.03.11.12.53.30 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Mar 2022 12:53:31 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\)) Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() From: Nadav Amit In-Reply-To: Date: Fri, 11 Mar 2022 12:53:30 -0800 Cc: Linux-MM , Linux Kernel Mailing List , Andrew Morton , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-6-namit@vmware.com> To: Dave Hansen X-Mailer: Apple Mail (2.3693.60.0.1.1) X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 05E5F40027 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=OmeYL54C; spf=pass (imf27.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: r3rwiftarutti84jpnkjrg9uop6nmypg X-HE-Tag: 1647032013-726167 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Mar 11, 2022, at 12:41 PM, Dave Hansen = wrote: >=20 > On 3/11/22 11:07, Nadav Amit wrote: >> From: Nadav Amit >>=20 >> Calls to change_protection_range() on THP can trigger, at least on = x86, >> two TLB flushes for one page: one immediately, when pmdp_invalidate() = is >> called by change_huge_pmd(), and then another one later (that can be >> batched) when change_protection_range() finishes. >>=20 >> The first TLB flush is only necessary to prevent the dirty bit (and = with >> a lesser importance the access bit) from changing while the PTE is >> modified. However, this is not necessary as the x86 CPUs set the >> dirty-bit atomically with an additional check that the PTE is (still) >> present. One caveat is Intel's Knights Landing that has a bug and = does >> not do so. >=20 > First of all, thank you for your diligence here. This is a super > obscure issue. I think I put handling for it in the kernel and I'm = not > sure I would have even thought about this angle. >=20 > That said, I'm not sure this is all necessary. >=20 > Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the > question is whether it is *VALUABLE* and needs to be preserved. The > current kernel code pretty much just lets the hardware set the Dirty = bit > and then ignores it. If it were valuable, ignoring it would have been = a > bad thing. We'd be losing data on today's kernels because the = hardware > told us about a write that happened but that the kernel ignored. >=20 > My mental model of what the microcode responsible for the erratum does > is something along these lines: >=20 > if (write) > pte |=3D _PAGE_DIRTY; > if (!pte_present(pte)) > #PF >=20 > The PTE is marked dirty, but the write never actually executes. The > thread that triggered the A/D setting *also* gets a fault. >=20 This makes perfect sense. I guess I misunderstood or forgot the erratum. But feel free to recheck. It would allow to remove the KNL check, and probably the first patch in this series. But I don=E2=80=99t think it = would allow to get rid of pmdp_invalidate_ad() since I do not fell comfortable just to use pmdp_establish() directly: I do not know about other architectures well enough to say that they have the same atomicity guarantees when it comes to A/D bits. > I'll double-check with some Intel folks to make sure I'm not missing > something. But, either way, I don't think we should be going to this > much trouble for the good ol' Xeon Phi. I doubt there are many still > around and I *REALLY* doubt they're running new kernels. >=20 > *If* we need this (and I'm not convinced we do), my first instinct = would > be to just do this instead: >=20 > clear_cpu_cap(c, X86_FEATURE_PSE); >=20 > on KNL systems. If anyone cares, they know where to find us. I think that it is not necessary and your understanding of the erratum is the right one. Let me know if you find it is not the case. BTW: Thanks for the quick response, and sorry for the time it took me to send v3.