From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC1DDC4332F for ; Tue, 7 Nov 2023 20:51:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 387B18D0057; Tue, 7 Nov 2023 15:51:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 336AB8D0001; Tue, 7 Nov 2023 15:51:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D7408D0057; Tue, 7 Nov 2023 15:51:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0DAE08D0001 for ; Tue, 7 Nov 2023 15:51:02 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D3B66407E0 for ; Tue, 7 Nov 2023 20:51:01 +0000 (UTC) X-FDA: 81432352722.02.E151345 Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf19.hostedemail.com (Postfix) with ESMTP id 20BBE1A000C for ; Tue, 7 Nov 2023 20:50:59 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TrqJjbRV; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699390260; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vekju1KMs5CW5Xn64e+qJMXt5jILCHDisKThefITH0I=; b=L6jpfZEwrpWTFxkTtWnyTz216L2LZkgHGDf23wOHDAG7VnNiilNFHBhD/5JpyWnxSIB7+E 07lZp8F1sTHMTjzinRhT+hleZ+nv2w5DiNxbzpaa9jDHYWs7VPChhvPeGghtBRjbGE5eJJ 4O4eDKC5UpjH1AavuDPC6MuPWdJowRI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699390260; a=rsa-sha256; cv=none; b=nmXOGb8pA+v5xL61As0qIESKFnwuSQR8+YGwD45SbFpRdZmwq2W+pF3jLXMyX5cXT7jXhN V2udUEMn0DZpXJcSNhlGn74YH/Icbmx4BBTyKAXm2dIkQu/p7TIU6sCDD2dDGc/1lfgLFe SLYoRphZERaCKDaVtQjQlex85A2YXsU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TrqJjbRV; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-7b9ff6d89e6so2395492241.3 for ; Tue, 07 Nov 2023 12:50:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699390259; x=1699995059; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vekju1KMs5CW5Xn64e+qJMXt5jILCHDisKThefITH0I=; b=TrqJjbRVBG4c6JhFvPzb4BDWs/SSmD3IvPOU3NEY8kp3/JnGXHKgELPVW+IjQlm7s+ YKAVvd1SG3y8uRA0BB3lKX8XUj3+nWYg5En5tJtJT58aaKD9jPQ2xCt7i7rPEC8FZajR cUlE3sj8CRH2XWFjJOl7+q/kU9PZa8Hh0S2JtvcHLn3+kpgjBI5S4i4j892p5EdHVMaP I3wRt19SDETu8C2grfgOPxZMHhxIxGx5d9bx4fOLgkVp7mU9GD/jxuI5SpQXtoy37DpU NoETZjfbOLWbhI2mAOfY+sLxMwyxrKGWDxFAw/Ub0VcE47fbG4/5ayxrTSsVenruNnPt 41EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699390259; x=1699995059; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vekju1KMs5CW5Xn64e+qJMXt5jILCHDisKThefITH0I=; b=G34nxRig+1kResqsgP+4kO6uv+K0sDkqTc1Bk39i07zEDNgyq+zV/bOsXYndckfCF5 wTfQFr5KVjjzDuD3ic9zMnEAZLJif4oTqJrN3AblX1fNUmw9F7MzkGOUtLps8u0PJYgF 1Ww+4N/c998aMlwkFYQ3amSktAlR6PVDNK2OeVsBitJ1eHKoaM0Q0prz/my7yF4VgJNM m0a1vHPHHgqGhkR/1/71HuQ2OSlrJcFjbZs3jYGCEYUSEAVrhmmg6xgxrXcc6oSce1G/ 2J+yCtpc9pBIF23DySVwhRz6/yY1wCGY2nlOsMWkHwoFyIQmzJck1fAzM4zO/6cw8bd9 Xwbg== X-Gm-Message-State: AOJu0YyjkL/9TqjeYzx3pUW1TP/25coN9Ett2m6+Pn1Bj4czjPclPAqC g3WNo5x6gkJczO+ASQ2sXpfzmtQwAb1ql1uUQLE= X-Google-Smtp-Source: AGHT+IEsjhCBSQNPhCIobyEzL+51aFPawFNigHITYI3WoLoOE4v8jd4TaQrJzQhU+TdlDhqgBUwGES1YCX/c5SwRqe4= X-Received: by 2002:a05:6102:23f7:b0:457:bc52:9b04 with SMTP id p23-20020a05610223f700b00457bc529b04mr27393994vsc.35.1699390259218; Tue, 07 Nov 2023 12:50:59 -0800 (PST) MIME-Version: 1.0 References: <44e32b0e-0e41-4055-bdb9-15bc7d47197c@intel.com> <20231107101221.GB18944@willie-the-truck> In-Reply-To: <20231107101221.GB18944@willie-the-truck> From: Barry Song <21cnbao@gmail.com> Date: Wed, 8 Nov 2023 04:50:46 +0800 Message-ID: Subject: Re: [PATCH] arm64: mm: drop tlb flush operation when clearing the access bit To: Will Deacon Cc: "Yin, Fengwei" , Baolin Wang , catalin.marinas@arm.com, akpm@linux-foundation.org, v-songbaohua@oppo.com, yuzhao@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 20BBE1A000C X-Rspam-User: X-Stat-Signature: 3cbijg6kpeck9wmrfxbmom5edtu9edde X-Rspamd-Server: rspam03 X-HE-Tag: 1699390259-119247 X-HE-Meta: U2FsdGVkX198oyIna/amC6OsI8mvqwgMKXgy/pOS3XWQO3OoGCZK+wjdvnRFKdDztLIc2Xhnf++h1mjtQKh08gXl0AGhhJScVRLx6WdlPlevq/nFRmYl8H25z7BiYmYUPfCIJdrAVQFEjW98480riacDizTJODTdkEW7evTdaufVY3MNWomerV/OH1fy6z6EpgaLWE/JmX4W+o3TSguzGIcG8Ro8S0pkvtveLYFNuCVknr24uqGUopNAQY0u9zKo0wgLS6AR+TgLSewVfId4s1G7tCYnOEKdRxxnqA4Hfdx4D8mT89tkaExWiTyO3tyZ7M4XsiXy8+2bSwXMCIW7OacZ5PJgBS1cGopqNCxnrrmVKL8UTkgAw5PM4xdGkJRakBhsKI/jVR4ZC67dDGR6Bbaa5AgEd4URWILkHiawPmR2Le8LPCf2il2zxRrRsNjUv0BM4jAokZC7lPLZjCZugX6zHJU63qQ9snPy3GsHby14S9bjigfwqK+0XY9H3Gj4eSe0wXSkYPcpJCtyFsMjb3DOT71Qami1KOOIK4WY7LastEGE9gOK+9rAUSoNmiQlK4QBqevVCcNf5ndFsXnDsytrovbODGJNBMIO9oKqUxnJV3D9nE44jFwRv00XBpmu3hvKJNFy+e2SraoWFPZNvZFzoT1zNo52nApla+2meyKS9xBQT0nXXVCGp6luhyYa95px5dLqTvydSIIcfcvCSN/Oy9w+Pxs9y39tsT8J0Dc8O8dR6Wnh2q4SnhsjYWNrox7e+R4ACXS3G+Ggl75T2L0KpQUPyMDvBy+yfs6ZPqpFyR6zRr4z7RIEvWc48DlXfFtPVn5Alwk4t6r3M2I56pX0Wf5z2OpwAkAeodi3+DPGmKy9MwSw5gJ/yoLtzSfn0Hghnc8+32IxoEsmLNZ7BN/0Lf6DOuUlffLv/JZr04d5jxhx71E3Z+ns3I4w8zZ4XsnmQbLU/ok4pG9LRoB KFm7q1T8 v+dkInzh8yRuGq/Hw3JL+hp5OhsnzN6ExV485F+11gGuMkpB3QhRfnh38HlmqJzVekNWkYW/9H1yYCzzhCL1hYAkuKVjn9U1pO7wFJunG+HZFEPKlU4D1m8rdowbfkuXbwjDD4ARS9AeQpODr+vnwxPyzeu2fjkyl0FuUn1i5smzmZtEessG/yHDDB9RPzzwS9f66b6QggiDGXChn5JJMWqdcsnrFyCItthbCtjuItBLadXfYWk/FO0RU7gsYNG/tcGwl7MgBhJVlH9gZUNo61BBZCH0WBa81vnYJVaEjLx4RjKsly/n84gCHDtdezPyeWfS+5BPWlphqu4qrdy889yeo0byaRDSmu423+8aCqdUiBj/8nhWTfkDkqoVGfmX24uDBhkqfF4FEN8goGwPeVT1/cwwcu7qGI/oQAD8L+aa8DCJN9Q6Fop2eFC1SMxQBnpvRQNutOjx5CgdjMZd12EJISQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 7, 2023 at 6:12=E2=80=AFPM Will Deacon wrote: > > On Wed, Oct 25, 2023 at 09:39:19AM +0800, Yin, Fengwei wrote: > > > > >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/a= sm/pgtable.h > > >> index 0bd18de9fd97..2979d796ba9d 100644 > > >> --- a/arch/arm64/include/asm/pgtable.h > > >> +++ b/arch/arm64/include/asm/pgtable.h > > >> @@ -905,21 +905,22 @@ static inline int ptep_test_and_clear_young(st= ruct vm_area_struct *vma, > > >> static inline int ptep_clear_flush_young(struct vm_area_struct *vma= , > > >> unsigned long address, pte_= t *ptep) > > >> { > > >> - int young =3D ptep_test_and_clear_young(vma, address, ptep); > > >> - > > >> - if (young) { > > >> - /* > > >> - * We can elide the trailing DSB here since the wors= t that can > > >> - * happen is that a CPU continues to use the young e= ntry in its > > >> - * TLB and we mistakenly reclaim the associated page= . The > > >> - * window for such an event is bounded by the next > > >> - * context-switch, which provides a DSB to complete = the TLB > > >> - * invalidation. > > >> - */ > > >> - flush_tlb_page_nosync(vma, address); > > >> - } > > >> - > > >> - return young; > > >> + /* > > >> + * This comment is borrowed from x86, but applies equally to= ARM64: > > >> + * > > >> + * Clearing the accessed bit without a TLB flush doesn't cau= se > > >> + * data corruption. [ It could cause incorrect page aging an= d > > >> + * the (mistaken) reclaim of hot pages, but the chance of th= at > > >> + * should be relatively low. ] > > >> + * > > >> + * So as a performance optimization don't flush the TLB when > > >> + * clearing the accessed bit, it will eventually be flushed = by > > >> + * a context switch or a VM operation anyway. [ In the rare > > >> + * event of it not getting flushed for a long time the delay > > >> + * shouldn't really matter because there's no real memory > > >> + * pressure for swapout to react to. ] > > >> + */ > > >> + return ptep_test_and_clear_young(vma, address, ptep); > > >> } > > From https://lore.kernel.org/lkml/20181029105515.GD14127@arm.com/: > > > > This is blindly copied from x86 and isn't true for us: we don't invalid= ate > > the TLB on context switch. That means our window for keeping the stale > > entries around is potentially much bigger and might not be a great idea= . > > I completely agree. > > > My understanding is that arm64 doesn't do invalidate the TLB during > > context switch. The flush_tlb_page_nosync() here + DSB during context > > switch make sure the TLB is invalidated during context switch. > > So we can't remove flush_tlb_page_nosync() here? Or something was chang= ed > > for arm64 (I have zero knowledge to TLB on arm64. So some obvious thing > > may be missed)? Thanks. > > As you point out, we already elide the DSB here but I don't think we shou= ld > remove the TLB invalidation entirely because then we lose the guarantee > that the update ever becomes visible to the page-table walker. > > I'm surprised that the TLBI is showing up as a performance issue without > the DSB present. Is it because we're walking over a large VA range and > invalidating on a per-page basis? If so, we'd be better off batching them nop. in lru cases, there are thousands of pages in LRU list. doing vmscan, we depend on rmap to find their PTEs, then read and clear AF to figure out if a page is young. So it is not from a big VM area to those pages in this = VA range. There are just too many pages from lots of processes in LRU to be scanned. The thing is done by rmap. > up and doing the invalidation at the end (which will be upgraded to a > full-mm invalidation if the range is large enough). Those pages in LRU could be from hundreds of different processes, they are not in just one process. i guess one possibility is that hardware has a lim= ited tlbi/nosync buffer, once the buffer is full, something similar with dsb wil= l be done automatically by hardware. So too many tlbi even without dsb can still harm performance. > > Will Thanks Barry