From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E89DC0032E for ; Wed, 25 Oct 2023 06:17:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 133776B031A; Wed, 25 Oct 2023 02:17:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E3B36B031B; Wed, 25 Oct 2023 02:17:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EECF26B031C; Wed, 25 Oct 2023 02:17:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DFF246B031A for ; Wed, 25 Oct 2023 02:17:55 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 784F080C5F for ; Wed, 25 Oct 2023 06:17:55 +0000 (UTC) X-FDA: 81382978110.12.84AA827 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf21.hostedemail.com (Postfix) with ESMTP id 9A7841C0013 for ; Wed, 25 Oct 2023 06:17:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="z/XoE+2D"; spf=pass (imf21.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698214673; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7hrmR2jUwFTIh0kffJWcCQnTgSTrj5dr7g52/pumOQo=; b=SQNDihXdcDX9RPsVB4N94KdwaFvcH99zAbhxxd8ZEBXlFnlNA0lOUvn7Ztqk8TTIC0Ska8 z9kybJcy5m0vN+qFyB+AJxhyM5ZukxCvBPxaznyIyHnMe6kl/7MHS+rAoUjU6np5Yk8j/q gjLhi2sfDsMST2A7nfN0NRN2OOJ7Yac= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698214673; a=rsa-sha256; cv=none; b=ipAU3bW5/Lu+2Dt1LDLcLHwiD5MSgbV1hMVfj8ReR9fJFGPG7N5ltmAVNDdUHr0gHF0kpm relG4u23yrtu+VOCMrW4HSXxGyVIFB10n7PQ6MEZJThgu4TvBBaP8MNFnP2jfhs5lwA5+8 xoYfY44R6wg/JQJN7e5VlDd4PBQHEmQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="z/XoE+2D"; spf=pass (imf21.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-419b53acc11so219521cf.0 for ; Tue, 24 Oct 2023 23:17:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698214673; x=1698819473; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7hrmR2jUwFTIh0kffJWcCQnTgSTrj5dr7g52/pumOQo=; b=z/XoE+2DGD4crbmnwc7EAmiLtpfC1OFKzRc/zoG1TG3vEiJOjyDSTnHim5dQV+tZZV ulvnRrEHb8X+s4tuKD81qzYEkjQcsJpPRwG10rbHY3nwF13GTzdP/Knw1s77ZLftz2c2 UuzNpcyy8IegkWaVgBx/OworQJtElGrSvfktBvC8KwAkUqjhshSfUfPMv78cg1iyWH2Q MNo4UNxr/2NWbEqj/JaTudmnQEHEwbwxh0E3BbWR2FTqr0xAQ05SD4MFQnlCXj5h2G5Q ysO/enK1fGErbZkqgGDNC1NF679s+5cDZDH6hOFPYJNJa0VF68JefVKZ12KO/oknB3/A lesg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698214673; x=1698819473; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7hrmR2jUwFTIh0kffJWcCQnTgSTrj5dr7g52/pumOQo=; b=G/Ox1V74xF2VpkBOWuZlmUuO4B+9bcQOtm9dZlvjBjoWa6P90GxytLQ8a2jqT0IeE8 G5mUBUms4HBXx3BMaSdFg6ctNF6v9GelHLHOaD1EbHiYX4AFzILjudbcDePdPxDkAg4W SuvlVzJBsSNWxGlsdIGZ6tYxcjFTQYHZkd5udhIo+vWFRn990jw6n8lmDAFrkj0IXEV/ Ah6fu3aNVTgX3kwhv3xPLRCJxcV9Zlv3zS0JHDYIoUH2TY2Vlok1ek5uWD9KHqOc0bdN YCTfUvv1CRDVb6RHnaMqajT58+9xLHKFw5ppeI+mbjuRBAOIyHu2K3nBV03WQwHFgV3t rXiA== X-Gm-Message-State: AOJu0Yzaop/Yrrl57d2O4RU6oiRMLhW+jjJgt+Bg958+o+WHLGBATSOB qKo6FpSitThqQ+SSnR6wi+txf9WPaavyvi0k0/bRIg== X-Google-Smtp-Source: AGHT+IE1GNO48dMXbjU3HsAlI1Fec2fQ3Qbbl56+Dk+P44pufWL4vm6WMNf8PXylc6BN3EVqcCnKiujlP9EgCO7tnPI= X-Received: by 2002:ac8:7cad:0:b0:41e:2984:6dba with SMTP id z13-20020ac87cad000000b0041e29846dbamr88759qtv.3.1698214672590; Tue, 24 Oct 2023 23:17:52 -0700 (PDT) MIME-Version: 1.0 References: <87y1frqz2u.fsf@nvdebian.thelocal> <87ttqfqw8f.fsf@nvdebian.thelocal> <87bkcn1j5k.fsf@nvdebian.thelocal> In-Reply-To: <87bkcn1j5k.fsf@nvdebian.thelocal> From: Yu Zhao Date: Wed, 25 Oct 2023 00:17:14 -0600 Message-ID: Subject: Re: [PATCH] arm64: mm: drop tlb flush operation when clearing the access bit To: Alistair Popple Cc: Baolin Wang , Barry Song <21cnbao@gmail.com>, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, v-songbaohua@oppo.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: hncxu64ufjmmd57semtc1z7uj7soorft X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9A7841C0013 X-Rspam-User: X-HE-Tag: 1698214673-155771 X-HE-Meta: U2FsdGVkX1/LFUOt3c5gp5dE4XEAiNISXQOrkAbJ53o8umXcDGbIE3my/WhDhq5j8XFLxCUm8ThWbF1LgrjDpEUrLrxTX9Hjsc4C9oL7Tif0++Lh5roN2OWYyMcFJcQ1HbvEbJIW1fci9KuhxOBvgzvQqxRa/3IP53PGMhA/1jRG5nebDAoIWNXSL3kWjO0oIcf+YOLzICYmSJgAStCRpR3XYijPVAhJPXnyLXRuO0Uwt+v9yV/aMPOSCej3/N/2xtmcyflO68459He1EQidaXowZSWThbjp/3v9fzEofdPjyzjoaHpszQwmmcwscA6uEUAUfOKzisil2JzzvxA61PyEoXVZPgmYp/bUYzgO0HmdIudEo6g6dPlPbZAhlPj7QF+OyJuW7glYQFMIPKK0PnVZ9EXIj4AV9gp7V2CcRkypv5N+U3aMurzZhUvLFlpQrUbsHnRIegYAxdrlmDFbTUatgnMUBAqKOBLHIIbrCL0AFJu7k8ku973Hpqwat8HuUkP0JkJWrc71bsJ6arfEQ2NDhX+KdIqs4WCnEcNxl4t4PhfsiQM8qbYpkDnb5BUv+pGt2mMjaENpPR7MylYegrWAxn5n/4hrVo7bqDCBLUw3wijrtVKWXIb08KoE8IcHFDGpg8JDOWlbjiTGNNvCbVWLtE0Pyaz37kLOub/oycMuNtkvrORdJ+fsRXNoAL8PN6K9V278myuupu752shnP6LbtXunlJcDTDPe1wPyhrX69k2yvcMgxgWswY+BzLRhYVuWWKS3dntMajAvTLEoVD6HQf4QWgGw80kUnRHtTbkwkTxb5xT0QLXyLUG8vyS0WRQ7D9di7nNNU5z+g2sz+kHWvAx7wYQUDjO7JikMGgYKzjgatP7fvuoVOBlEnZXhYfvYiYLqDIqZB4hkwKt0hIhL6RhUAuv/upCR5pOB76mWytI2ZY6vmKZ6ZsScUcGo+Cvswd98C5f8BGUrPB2 hvvJ1MuU 6lT2ei38eqn6VTvyu1ymIZdrhnOH38vQPqiADcq5WhQa0VsTMR5AtT7OtTBe/wKZaLYHdExpzvEK9hqoCn22KKlXVObj5jP2VMkw9Bj5nd80dlhTipUGbafH3nwUgVX/OC/Lv6yjRuEQka4wcPQt4toiVDA7hFodmXpTaD2hVH4cMuNPPKZ88G3ALIruBMaAzkySk9bw6YO144FYB+ivVmMPBqmhQSixfqetS2a1edHCAo684941NaBX+EYyIHCh7GMZVgnRf6SLvSnZKzhMtoqulmsxGg2S3cjOhcAO3ITBRi0U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 24, 2023 at 9:21=E2=80=AFPM Alistair Popple wrote: > > > Baolin Wang writes: > > > On 10/25/2023 9:58 AM, Alistair Popple wrote: > >> Barry Song <21cnbao@gmail.com> writes: > >> > >>> On Wed, Oct 25, 2023 at 9:18=E2=80=AFAM Alistair Popple wrote: > >>>> > >>>> > >>>> Barry Song <21cnbao@gmail.com> writes: > >>>> > >>>>> On Wed, Oct 25, 2023 at 7:16=E2=80=AFAM Barry Song <21cnbao@gmail.c= om> wrote: > >>>>>> > >>>>>> On Tue, Oct 24, 2023 at 8:57=E2=80=AFPM Baolin Wang > >>>>>> wrote: > >> [...] > >> > >>>>>> (A). Constant flush cost vs. (B). very very occasional reclaimed h= ot > >>>>>> page, B might > >>>>>> be a correct choice. > >>>>> > >>>>> Plus, I doubt B is really going to happen. as after a page is promo= ted to > >>>>> the head of lru list or new generation, it needs a long time to sli= de back > >>>>> to the inactive list tail or to the candidate generation of mglru. = the time > >>>>> should have been large enough for tlb to be flushed. If the page is= really > >>>>> hot, the hardware will get second, third, fourth etc opportunity to= set an > >>>>> access flag in the long time in which the page is re-moved to the t= ail > >>>>> as the page can be accessed multiple times if it is really hot. > >>>> > >>>> This might not be true if you have external hardware sharing the pag= e > >>>> tables with software through either HMM or hardware supported ATS > >>>> though. > >>>> > >>>> In those cases I think it's much more likely hardware can still be > >>>> accessing the page even after a context switch on the CPU say. So th= ose > >>>> pages will tend to get reclaimed even though hardware is still activ= ely > >>>> using them which would be quite expensive and I guess could lead to > >>>> thrashing as each page is reclaimed and then immediately faulted bac= k > >>>> in. > > > > That's possible, but the chance should be relatively low. At least on > > x86, I have not heard of this issue. > > Personally I've never seen any x86 system that shares page tables with > external devices, other than with HMM. More on that below. > > >>> i am not quite sure i got your point. has the external hardware shari= ng cpu's > >>> pagetable the ability to set access flag in page table entries by > >>> itself? if yes, > >>> I don't see how our approach will hurt as folio_referenced can notify= the > >>> hardware driver and the driver can flush its own tlb. If no, i don't = see > >>> either as the external hardware can't set access flags, that means we > >>> have ignored its reference and only knows cpu's access even in the cu= rrent > >>> mainline code. so we are not getting worse. > >>> > >>> so the external hardware can also see cpu's TLB? or cpu's tlb flush c= an > >>> also broadcast to external hardware, then external hardware sees the > >>> cleared access flag, thus, it can set access flag in page table when = the > >>> hardware access it? If this is the case, I feel what you said is tru= e. > >> Perhaps it would help if I gave a concrete example. Take for example > >> the > >> ARM SMMU. It has it's own TLB. Invalidating this TLB is done in one of > >> two ways depending on the specific HW implementation. > >> If broadcast TLB maintenance (BTM) is supported it will snoop CPU > >> TLB > >> invalidations. If BTM is not supported it relies on SW to explicitly > >> forward TLB invalidations via MMU notifiers. > > > > On our ARM64 hardware, we rely on BTM to maintain TLB coherency. > > Lucky you :-) > > ARM64 SMMU architecture specification supports the possibilty of both, > as does the driver. Not that I think whether or not BTM is supported has > much relevance to this issue. > > >> Now consider the case where some external device is accessing mappings > >> via the SMMU. The access flag will be cached in the SMMU TLB. If we > >> clear the access flag without a TLB invalidate the access flag in the > >> CPU page table will not get updated because it's already set in the SM= MU > >> TLB. > >> As an aside access flag updates happen in one of two ways. If the > >> SMMU > >> HW supports hardware translation table updates (HTTU) then hardware wi= ll > >> manage updating access/dirty flags as required. If this is not support= ed > >> then SW is relied on to update these flags which in practice means > >> taking a minor fault. But I don't think that is relevant here - in > >> either case without a TLB invalidate neither of those things will > >> happen. > >> I suppose drivers could implement the clear_flush_young() MMU > >> notifier > >> callback (none do at the moment AFAICT) but then won't that just lead = to > >> the opposite problem - that every page ever used by an external device > >> remains active and unavailable for reclaim because the access flag nev= er > >> gets cleared? I suppose they could do the flush then which would ensur= e > > > > Yes, I think so too. The reason there is currently no problem, perhaps > > I think, there are no actual use cases at the moment? At least on our > > Alibaba's fleet, SMMU and MMU do not share page tables now. > > We have systems that do. Just curious: do those systems run the Linux kernel? If so, are pages shared with SMMU pinned? If not, then how are IO PFs handled after pages are reclaimed?