From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE7A2C25B75 for ; Fri, 31 May 2024 21:07:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CF3B6B00A6; Fri, 31 May 2024 17:07:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 757636B00A8; Fri, 31 May 2024 17:07:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D1116B00A9; Fri, 31 May 2024 17:07:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 360066B00A6 for ; Fri, 31 May 2024 17:07:22 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EA227A17E3 for ; Fri, 31 May 2024 21:07:21 +0000 (UTC) X-FDA: 82179926682.27.19D6B41 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf09.hostedemail.com (Postfix) with ESMTP id 1D5C614000C for ; Fri, 31 May 2024 21:07:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fgmX6Sgb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of dmatlack@google.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=dmatlack@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717189640; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; b=XUMOWGHSsf+GBkWCFYJuOPjH0wMr+x/2V+9/o3wcdCHqc7fF43zBY0bvIf4WrCFY3Ab1JR X8sdXzOaPJDg/gC2URaSj4GcFRkg5pVd4h7FMfjUecwzUeu2V5vfu5El+5/thNRLpPg6B8 o39q4h15SSDeNnP9ts/tPX5zY0IFOiA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fgmX6Sgb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of dmatlack@google.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=dmatlack@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717189640; a=rsa-sha256; cv=none; b=8ZqfYPK0Uw9pfqLuhNwVwOOBUzLnZiQ8HVk/anCbcRMEWIbQZ95RGjThCFhSxOfHZX+kNp on5lZ3M1EAd22ceV6Pg55G7QcefdH/CUD5qCR6xb71uprFge9eXUZHlw1cji+xtPAeqxlX tfNKuVuok6HUAJZ5uZ2D0I1diqa3ErQ= Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-35e0eb3efd0so896441f8f.0 for ; Fri, 31 May 2024 14:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717189638; x=1717794438; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; b=fgmX6SgbmT9abJx0cxodjiIAu07R5ygSAMoq/PPaeFKOpG5e1XAK/3Bgd5Cu+wxpGL xHxpLSsrmkMlOu2VLQRsImvbXUehtqNo8Eei0+ybsgsafPQ8HbVznUSCuEr7jD5HMLUY 8tMaVTklIPqe1ujJYU9IFl0zJzR07SQWJDQKck2Ox/KCXJqn7Ma9asQ2ClyytHw9ERHP Sm6jMFO7Q1ftW8L+91gSY8smRgeMQ2AOdopYnNlq1+eoTOhZv6lG/1Vgutj2YJrdcbIp t6j+K+Db6JSguIM0ToCck3SD2GVE22cL5cnNtMEpsYcLzt7W5jOE/zqs6sRgd3hNOKwP +sWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717189638; x=1717794438; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xx3RPcl1E+PKSqvPm+Il1+elz18o0K6zeYgjqcWFZiw=; b=kBbD6OG1LQEz20WciyVnXZSCxnTD9s1w1r+Sc3V8eqTv/Xnj+SdMXA8G9qtgQcEX6U 4YATdRH6kBvFnr3mrzPddQHH7BoFwH8Bj6XWAuWqzpLSZtCW/+GIFltAJkU8cPC/Ib87 2dc4LXUmv+mTwuhxOAm84APOiBSCynzF8e3l2iL9hAuV6ED1oxRruATMWWCPA//LugPs +Z1mSxv+MN06VF4i4UKoTkLLGEyUyMjhpD2nY05mtvkLYFc2oD5+1RzWV4FISQQzfDdJ 3v001BgHnvWyEZhEDaZTFzw+CLUkUMRtDR4iBWfw0C4lIl92uMe9GfcwP2+IvGYrO5xJ yA7g== X-Forwarded-Encrypted: i=1; AJvYcCVq/kBK+4PixT6uu637R9yNd8xWvmh7WFECnsj11ctIrqbrpKbVp4Ct0P6Az0PvARkKXfoZemfQGJt/sFUvtAN+m9M= X-Gm-Message-State: AOJu0YzTVw1wrWMsvvI2FDIT9fxCF+vMjzB25LljXmyTQj+NzUd9GjKn F+x83y7GcqoWmNaXl1WxlH8PA3To9PucdAC4cvFDTJNxgaBAn3y9Nu6sjfzprDy0RtW/IPyynp2 tmxYe18Kf1jWYoOd4qS6Rd41S7oMWGwbx7RXf X-Google-Smtp-Source: AGHT+IE03rpUbTjyQuwZWIC8g/9gHa5mH+MWE/NTSPO8cSZrAXipu4jJgjC2128Si67H3Tnp43CK2uN+A9xSwuw2ufM= X-Received: by 2002:a5d:4cc1:0:b0:34c:d9f5:a8e with SMTP id ffacd0b85a97d-35e0f25b1a0mr2178382f8f.7.1717189638221; Fri, 31 May 2024 14:07:18 -0700 (PDT) MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: David Matlack Date: Fri, 31 May 2024 14:06:49 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Yu Zhao Cc: Oliver Upton , James Houghton , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1D5C614000C X-Stat-Signature: 6dngf1tdog4dus3wysx974dqtfimyw47 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717189639-869239 X-HE-Meta: U2FsdGVkX1/epujID1H8Wks4f6ebRq5uByh5zf5BkPoWWDXIpYyPAkFt+b/Mge1iFhWZqPBK4iZF8lhxBbDHTSdPlvtwiNmzaS5NvKaH9Y69EfeJmDnwohtWDZUYY/pyJUruMa0+0NT98n41Bh3UnoK69LMi4UjVgzsTQ6uCwlao9C/t02EMPxySLfRwkO4iYV37v8Ea6IKCjIBaLve2qClvrVivWEE5h8dZX73IDJshcvNfYzw4qHQXlRPjkJeH4yVLZskKUscYfYHAOkr+RlJ7HN30806rkhitn3dqrEROWp9hpwiDXCb5eGrtDbFXEU4yHn5mPh+nV2epTh5CwGo6ew+IHnUWzwaxa8qqHYWOnKKFCV3a0bYZFz8mGHJ7ecxlbg0gSxH/QI229lthC9zQJhXuI/ESjUyoFo4v/QuTSrLV7d1cvuRiJ/uFGUuVPTblkNZprU7RwxiLB675VV0Wky4PFUtCxD1+8bJ2LpQCUm2VM11tU42Lb2LNhbFZ6Oh1wcNVP/nFlbiDmX7o4vGo+AEjhzoLkOOuugNivbiT6JktkcSeNvC9HRUj2TnSerALucfPb0cHkg7mlajBBiYcpwFzeNQvLcbCPTDymKl/tj2i/APNjjP1bx8Gm4UyVewf3ITLzyD/NV0sYkEAMcAeF2ZKkCM2C1D1uj/Nf7wCHX8Slh8Cw628LQLCZw64k8Dvq23I0jETgX7xmu6JPP52o62zG7DNL+1PE2V/inPMRD95AeWzdTRht1yTpYMIb+GsDNentdDJMJgwY3qK0kw/UX04Del+lMrm7lxxXK9geG2PDHdJpHw53rjSWlhrjdRHsDaiSqW8DduQ2NUrlaFu1/TEAtIDwq0W+HipqdLJwBs9k5l4vXSu9nIsZm5ievnLEnEA/1J8jGSBpqwk4T2VSOb6MwEJj/93cgB8dstH1Ezowp0W8h40NPKdK4aXOEq7gS7WpR3yvEAPdMB kASqYk+Q lNFVltUHWda7djq5tl29OYRRNvJA5AQ97ReYx762xAKur7nhkaQJe1DcSIL7HGa3RQhGNrEiuclVusrMNGoltwS9Ea0SeB7fsvqMa3+B0wPyxlfzHvaZuBJCp5z1w+81NWqLvfzhGb7HglSA92Sn0WMBgXBDgDKO687zVvBFapLlCv13fEJrWO2PWcqgX1nnwuo7vZGMNY8Ez1zonzEj6w8AbwOA5cRT+b08P4HYISFNlN0Au80jPbGmtxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 31, 2024 at 1:31=E2=80=AFPM Yu Zhao wrote: > > On Fri, May 31, 2024 at 1:24=E2=80=AFAM Oliver Upton wrote: > > > > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote: > > > On Wed, May 29, 2024 at 12:05=E2=80=AFPM James Houghton wrote: > > > > > > > > Secondary MMUs are currently consulted for access/age information a= t > > > > eviction time, but before then, we don't get accurate age informati= on. > > > > That is, pages that are mostly accessed through a secondary MMU (li= ke > > > > guest memory, used by KVM) will always just proceed down to the old= est > > > > generation, and then at eviction time, if KVM reports the page to b= e > > > > young, the page will be activated/promoted back to the youngest > > > > generation. > > > > > > Correct, and as I explained offline, this is the only reasonable > > > behavior if we can't locklessly walk secondary MMUs. > > > > > > Just for the record, the (crude) analogy I used was: > > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor, > > > but you are only allowed to pick up 10 of them (and put them in your > > > pocket). A smart move would be to survey the room *first and then* > > > pick up the largest ones. But if you are carrying a 500 lbs backpack, > > > you would just want to pick up whichever that's in front of you rathe= r > > > than walk the entire room. > > > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > > done lockless. Otherwise, it should just fall back to the existing > > > approach, which existed in previous versions but is removed in this > > > version. > > > > Grabbing the MMU lock for write to scan sucks, no argument there. But > > can you please be specific about the impact of read lock v. RCU in the > > case of arm64? I had asked about this before and you never replied. > > > > My concern remains that adding support for software table walkers > > outside of the MMU lock entirely requires more work than just deferring > > the deallocation to an RCU callback. Walkers that previously assumed > > 'exclusive' access while holding the MMU lock for write must now cope > > with volatile PTEs. > > > > Yes, this problem already exists when hardware sets the AF, but the > > lock-free walker implementation needs to be generic so it can be applie= d > > for other PTE bits. > > Direct reclaim is multi-threaded and each reclaimer can take the mmu > lock for read (testing the A-bit) or write (unmapping before paging > out) on arm64. The fundamental problem of using the readers-writer > lock in this case is priority inversion: the readers have lower > priority than the writers, so ideally, we don't want the readers to > block the writers at all. > > Using my previous (crude) analogy: puting the bill right in front of > you (the writers) profits immediately whereas searching for the > largest bill (the readers) can be futile. > > As I said earlier, I prefer we drop the arm64 support for now, but I > will not object to taking the mmu lock for read when clearing the > A-bit, as long as we fully understand the problem here and document it > clearly. FWIW, Google Cloud has been doing proactive reclaim and kstaled-based aging (a Google-internal page aging daemon, for those outside of Google) for many years on x86 VMs with the A-bit harvesting under the write-lock. So I'm skeptical that making ARM64 lockless is necessary to allow Secondary MMUs to participate in MGLRU aging with acceptable performance for Cloud usecases. I don't even think it's necessary on x86 but it's a simple enough change that we might as well just do it. I suspect under pathological conditions (host under intense memory pressure and high rate of reclaim occurring) making A-bit harvesting lockless will perform better. But under such conditions VM performance is likely going to suffer regardless. In a Cloud environment we deal with that through other mechanisms to reduce the rate of reclaim and make the host healthy. For these reasons, I think there's value in giving users the option to enable Secondary MMUs participation MGLRU aging even when A-bit test/clearing is not done locklessly. I believe this was James' intent with the Kconfig. Perhaps a default-off writable module parameter would be better to avoid distros accidentally turning it on? If and when there is a usecase for optimizing VM performance under pathological reclaim conditions on ARM, we can make it lockless then.