From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C510C433F5 for ; Fri, 11 Mar 2022 20:41:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4B818D0002; Fri, 11 Mar 2022 15:41:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CFB108D0001; Fri, 11 Mar 2022 15:41:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9BE18D0002; Fri, 11 Mar 2022 15:41:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id A5C2C8D0001 for ; Fri, 11 Mar 2022 15:41:58 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6F7B725221 for ; Fri, 11 Mar 2022 20:41:58 +0000 (UTC) X-FDA: 79233277116.13.8822297 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf13.hostedemail.com (Postfix) with ESMTP id E45C820017 for ; Fri, 11 Mar 2022 20:41:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647031317; x=1678567317; h=message-id:date:mime-version:from:to:cc:references: subject:in-reply-to:content-transfer-encoding; bh=d+V62j80p7kCm2OV0b4F2QuYNYGrrrxGrbo0cWvMM/A=; b=CZuDb7M7S1ZnFfDN5A9BoZ/XsXDJ5M1BHxIsDeLjayNtkATPHbtS1EPO Sx4F2DBxqsPHDNO+4013upLuHbEx78ITy9kWonUMJXtSCiAEP0xnb7E8T YK5oioNIpbSIUSvvrFPaFnjKFewPw/KbgGmOw9dHfsBc31YOQ0EZ6JYHm 5T/ds6MGkwRY1k8KD/oNblzW7ROU+q0riVsfUAZYCSBI8BYACkJOnX8JU BEv5whVCG8Ai6cAdOraMUwc3GYvC1aPNEdvMabeAA1B0WX8iyFlmjVSTR Q0sgjVBPf7hBjvZw74+HnV+I+3K7N4kbk00Usui/ZYooBQK0UAjWd6Gok Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10283"; a="254478171" X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="254478171" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 12:41:48 -0800 X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="645046437" Received: from cpeirce-mobl1.amr.corp.intel.com (HELO [10.212.128.243]) ([10.212.128.243]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 12:41:47 -0800 Message-ID: Date: Fri, 11 Mar 2022 12:41:41 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 From: Dave Hansen To: Nadav Amit , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-6-namit@vmware.com> Content-Language: en-US Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() In-Reply-To: <20220311190749.338281-6-namit@vmware.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CZuDb7M7; spf=none (imf13.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E45C820017 X-Stat-Signature: r6d4bbu4qdof5bq1op55yew4c1dfz6sb X-HE-Tag: 1647031316-52022 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/11/22 11:07, Nadav Amit wrote: > From: Nadav Amit > > Calls to change_protection_range() on THP can trigger, at least on x86, > two TLB flushes for one page: one immediately, when pmdp_invalidate() is > called by change_huge_pmd(), and then another one later (that can be > batched) when change_protection_range() finishes. > > The first TLB flush is only necessary to prevent the dirty bit (and with > a lesser importance the access bit) from changing while the PTE is > modified. However, this is not necessary as the x86 CPUs set the > dirty-bit atomically with an additional check that the PTE is (still) > present. One caveat is Intel's Knights Landing that has a bug and does > not do so. First of all, thank you for your diligence here. This is a super obscure issue. I think I put handling for it in the kernel and I'm not sure I would have even thought about this angle. That said, I'm not sure this is all necessary. Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the question is whether it is *VALUABLE* and needs to be preserved. The current kernel code pretty much just lets the hardware set the Dirty bit and then ignores it. If it were valuable, ignoring it would have been a bad thing. We'd be losing data on today's kernels because the hardware told us about a write that happened but that the kernel ignored. My mental model of what the microcode responsible for the erratum does is something along these lines: if (write) pte |= _PAGE_DIRTY; if (!pte_present(pte)) #PF The PTE is marked dirty, but the write never actually executes. The thread that triggered the A/D setting *also* gets a fault. I'll double-check with some Intel folks to make sure I'm not missing something. But, either way, I don't think we should be going to this much trouble for the good ol' Xeon Phi. I doubt there are many still around and I *REALLY* doubt they're running new kernels. *If* we need this (and I'm not convinced we do), my first instinct would be to just do this instead: clear_cpu_cap(c, X86_FEATURE_PSE); on KNL systems. If anyone cares, they know where to find us.