From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C947C25B78 for ; Mon, 3 Jun 2024 22:46:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA44E6B009A; Mon, 3 Jun 2024 18:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2C2F6B009D; Mon, 3 Jun 2024 18:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA6926B009F; Mon, 3 Jun 2024 18:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8873F6B009A for ; Mon, 3 Jun 2024 18:46:39 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3B4A5409F6 for ; Mon, 3 Jun 2024 22:46:39 +0000 (UTC) X-FDA: 82191063318.11.2802A84 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf12.hostedemail.com (Postfix) with ESMTP id 7EAF840022 for ; Mon, 3 Jun 2024 22:46:37 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kUWAws+R; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717454797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zwApvMmCxi6kRvhFpeZNrHVjs/fSYi202Vq8O0y7Pvc=; b=H7+rVayLiPBiAa8L8n7NP4wpAB+FhsVx9RRenJhun7bm2WdLibhxakPh8FVqYIVb/aUz8n dXaarrBC1nRkoR9dMYrnW4Djx0eEVHussUGSvkq1V4CZR2eJYiq3PVeYW2RhSb6Z0d9YPg ZyrpM4uoA1g4amjjmbYWIl3yQhK/ydM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717454797; a=rsa-sha256; cv=none; b=i1Tj7chV68NWBZTZAO2JgUgterU3D1bu8aFwjUNnHEma72glamTAbHI+jjfGlm9JJEodLB jDHQ2gVQH+kZA+L29GwhQ3F+9db4MPvLkd7qKi+5J3LPgn632xKVvxEIGyVjT8ikJjfkKk I+i2A0ageSYDPULM9sx+kQq3UqyS3Gc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kUWAws+R; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=jthoughton@google.com Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-43fe3289fc5so50601cf.1 for ; Mon, 03 Jun 2024 15:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717454796; x=1718059596; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zwApvMmCxi6kRvhFpeZNrHVjs/fSYi202Vq8O0y7Pvc=; b=kUWAws+RDC21M1u1kLnWx0ceBSdO8bRtk50tApmejuxOykA5kp8s+yDcs+z5zun0in 9lOMxwlbA1WwvqYs3NpScoquZuZdbO2jF0gHFR4Fi78/jRy4WQUjTUVs2XQkwFk6u0sx 22YpxI7KpWiD76Jm8BrE8Vvj45PHyHmHMOrS0CxYaWMy2p+aPESaOIf/T1Oaq0x6VJfm cxFt+rll2+dnzStyBPjOA7hq1+RG31cNDqPowSVsIeqaO/4AXE0agS705wQlif9Jw3JS rZf7ZLnAoI6V4tImlnM9mfYxizqZ98ZxMb5cnSiXrqcBJw8fHFdD2iZ9QcGKMwbKWJtB CRDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717454796; x=1718059596; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zwApvMmCxi6kRvhFpeZNrHVjs/fSYi202Vq8O0y7Pvc=; b=BT9ZTIM2SS0nJbO+mfiz776XFNmkMkVsTLYXARxKvuyqJY+D+90UdMTsEYsU/Fjg/J fMSw2fujmoeNjBDC3fkBBN7fm6kLktuccqjoprU2bUrjcWB/Mgq7oNjYk+xzDvCR6P8c XAZpvUzOEPg8z24mC9+IBVZ9caHrYfIei+CfWJEIAbKyTDk5SbtMFuyeBHuDMyTyf/cP jMzSla6ltmY5wBpo1cy5h9srQZV9QViMpxeW3QcyrYh62Wvjvfjbra6Ym9hJ/S4CmLWK xdCpap1ypH8V0fKZis0l91Kmme0CPoovIQRzOa2aBf2uNbv0h9PFyY8rLPpJE7C4lWrh GPsA== X-Forwarded-Encrypted: i=1; AJvYcCVmzxS/HYi11jiCMUogE0uZZZWZglWpLkV8b1LW3UEGDLpFo/9RegTN2MdyYdfcCntBm+MEyP/MJLjvNBKcpatyARQ= X-Gm-Message-State: AOJu0Yzm3DbGgJ8rUxaQTSHrDjJ7BGNWW/L+R+HgS/JpOf/eakugJCs6 o7+RwESQvlhR81ZhVRYq+cpl10fK6ZzaBdUbRMFhwP5WQEYEx0koFWn4dxUDPuja+NPw66qltmm y0hIdgpxap9ybvPhBnekEJqgJAmtpYrbTtB8h X-Google-Smtp-Source: AGHT+IFNqO3Eeca3fb/gjbGejzk9epZFQKuRMTX94UERp2ub7SAJHzg0ElHPjQ2U8grOlDfvsiNhgYwFk5rOK4wQGMU= X-Received: by 2002:a05:622a:59ce:b0:43f:ff89:dfb9 with SMTP id d75a77b69052e-4401bd281f4mr1732131cf.6.1717454795966; Mon, 03 Jun 2024 15:46:35 -0700 (PDT) MIME-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> <20240529180510.2295118-3-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Mon, 3 Jun 2024 15:45:59 -0700 Message-ID: Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging To: Yu Zhao Cc: Sean Christopherson , Andrew Morton , Paolo Bonzini , Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: aq7511usze83568g4mc574wqywej8pdn X-Rspamd-Queue-Id: 7EAF840022 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717454797-827464 X-HE-Meta: U2FsdGVkX18jZLwFb3SxD2NcRpNG5solWAeF1RzwzwJHQTl1YszVZ7u44hMlwK7PF3haVkuUtiDG5YqErm+eetTTsPT3qQUBbA1Cr8UWA8xpxqoC2GaWJpkD9oBv8S0NaYj5fBDVh5J4lI/KbE3sc7d1sKVS87tul7XN112gVgUaYHNcyYS53AR+6mwDz85UqunqADZGrrSNPTA2PoEWzGZDrv1smqfMky6WSipLjTi3pRmXyFYzrJ6vA5hxlWyPCawEBjo8CsZwiaxWMw3YpPun09HSoPx47l9j1e0l5/pVNAV8F+N7+Wt7/P2AwnLJjIOS5T78/NiTG8JyfBaed/zJYvcP8BCkUsY4PV4zcfIM27TJpmi1r78swsidtZndBI+4pshXEsZ2I9+zOxmV2k0DwKb5tSVZG2i4Rf9uaBah1FsvcinvMv26iT09DlkWaPn0ecLpckAhI7p20C22jIYAU5RoChpxiO1l4PHnoSuenlte6KF+EA3R23RpuxAd3pKAI134OTAyUSeaPmWfIqKXhEvAH5dg/55VHx/H/x+ZSldZYjfbA7tXwsFznKQGPwo1o7CwXrHIUL+3FfoyC7hRacCTRqpmniztDtRMQlWSKEB8GtNYvG6f3yYZHQ/adZJr0592/5agOKKfucWf/mgGq6UscFqVXXvGHW6rBp4nmfVpMQRZIZ7lueQLwGRg4raU4ABB1iEpF5Yx1zwIRt5KPhdBkq5wDPnE1wHKPxC8Wx3Ow2rB7xuBARJlYNNJIkjr6+XeIS9ouBQaPLfTJiLUVZ8OaAj/USdkE5cB5w2TYA8x35gcjKTse3x7hZzhgpyBj5Se7wWzW6EFtkZ4n8YuGuW/irnL6LErviJruCATrSLb2eXD7baaq4OFoY4Rrw4OMSJjAyVxIbbAuDsVLSxqt+TYsi/TRs7lhavvUWkjlP5gNYa8g80IdVK6Y+i0v9wACbJnlILQUo7CaB5 iCDl+7n9 Ap2GYg2ubIdwdjqV5NFB0u4PA2UsmAYvjXYarak5NSB/lUBRtexknG9rPWCV3A4XC065Q4uRXjz/GVRLhCsewYf3xaceR8XXvit+KH7o69OYFkJ5NtZSd97LwgZqFDdI+AiXupIwZ1ubTrCA9OCXBJ8/qbhVl+v3dbWAK9FTvatkhxzWP7lAISP/VzVhNKaVE6yKiKbqlUMnIoESyVsOwj1ep6YRflvZmMOTDN2USE9tEmSlFhRbcWWY0dsUGjMozfVEH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 11:06=E2=80=AFPM Yu Zhao wrote: > > On Wed, May 29, 2024 at 7:08=E2=80=AFPM James Houghton wrote: > > > > Hi Yu, Sean, > > > > Perhaps I "simplified" this bit of the series a little bit too much. > > Being able to opportunistically do aging with KVM (even without > > setting the Kconfig) is valuable. > > > > IIUC, we have the following possibilities: > > - v4: aging with KVM is done if the new Kconfig is set. > > - v3: aging with KVM is always done. > > This is not true -- in v3, MGLRU only scans secondary MMUs if it can > be done locklessly on x86. It uses a bitmap to imply this requirement. > > > - v2: aging with KVM is done when the architecture reports that it can > > probably be done locklessly, set at KVM MMU init time. > > Not really -- it's only done if it can be done locklessly on both x86 and= arm64. > > > - Another possibility?: aging with KVM is only done exactly when it > > can be done locklessly (i.e., mmu_notifier_test/clear_young() called > > such that it will not grab any locks). > > This is exactly the case for v2. Thanks for clarifying; sorry for getting this wrong. > > > I like the v4 approach because: > > 1. We can choose whether or not to do aging with KVM no matter what > > architecture we're using (without requiring userspace be aware to > > disable the feature at runtime with sysfs to avoid regressing > > performance if they don't care about proactive reclaim). > > 2. If we check the new feature bit (0x8) in sysfs, we can know for > > sure if aging is meant to be working or not. The selftest changes I > > made won't work properly unless there is a way to be sure that aging > > is working with KVM. > > I'm not convinced, but it doesn't mean your point of view is invalid. > If you fully understand the implications of your design choice and > document them, I will not object. > > All optimizations in v2 were measured step by step. Even that bitmap, > which might be considered overengineered, brought a readily > measuarable 4% improvement in memcached throughput on Altra Max > swapping to Optane: > > Using the bitmap (64 KVM PTEs for each call) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 > Latency p99 Latency p99.9 Latency KB/sec > -------------------------------------------------------------------------= --------------------------------------------------- > Sets 0.00 --- --- --- > --- --- --- 0.00 > Gets 1012801.92 431436.92 14965.11 0.06246 > 0.04700 0.16700 4.31900 39635.83 > Waits 0.00 --- --- --- > --- --- --- --- > Totals 1012801.92 431436.92 14965.11 0.06246 > 0.04700 0.16700 4.31900 39635.83 > > > Not using the bitmap (1 KVM PTEs for each call) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 > Latency p99 Latency p99.9 Latency KB/sec > -------------------------------------------------------------------------= --------------------------------------------------- > Sets 0.00 --- --- --- > --- --- --- 0.00 > Gets 968210.02 412443.85 14303.89 0.06517 > 0.04700 0.15900 7.42300 37890.74 > Waits 0.00 --- --- --- > --- --- --- --- > Totals 968210.02 412443.85 14303.89 0.06517 > 0.04700 0.15900 7.42300 37890.74 > > > FlameGraphs with bitmap (1.svg) and without bitmap (2.svg) attached. > > What I don't think is acceptable is simplifying those optimizations > out without documenting your justifications (I would even call it a > design change, rather than simplification, from v3 to v4). I'll put back something similar to what you had before (like a test_clear_young() with a "fast" parameter instead of "bitmap"). I like the idea of having a new mmu notifier, like fast_test_clear_young(), while leaving test_young() and clear_young() unchanged (where "fast" means "prioritize speed over accuracy"). It seems a little more straightforward that way. > > > For look-around at eviction time: > > - v4: done if the main mm PTE was young and no MMU notifiers are subscr= ibed. > > - v2/v3: done if the main mm PTE was young or (the SPTE was young and > > the MMU notifier was lockless/fast). > > The host and secondary MMUs are two *independent* cases, IMO: > 1. lookaround the host MMU if the PTE mapping the folio under reclaim is = young. > 2. lookaround the secondary MMU if it can be done locklessly. > > So the v2/v3 behavior sounds a lot more reasonable to me. I'll restore the v2/v3 behavior. I initially removed it because, without batching, we (mostly) lose the spatial locality that, IIUC, look-around is designed to exploit. > > Also a nit -- don't use 'else' in the following case (should_look_around(= )): > > if (foo) > return bar; > else > do_something(); Oh, yes, sorry. I wrote and rewrote should_look_around() quite a few times while trying to figure out what made sense in a no-batching series. I'll fix this. > > > I made this logic change as part of removing batching. > > > > I'd really appreciate guidance on what the correct thing to do is. > > > > In my mind, what would work great is: by default, do aging exactly > > when KVM can do it locklessly, and then have a Kconfig to always have > > MGLRU to do aging with KVM if a user really cares about proactive > > reclaim (when the feature bit is set). The selftest can check the > > Kconfig + feature bit to know for sure if aging will be done. > > I still don't see how that Kconfig helps. Or why the new static branch > isn't enough? Without a special Kconfig, the feature bit just tells us that aging with KVM is possible, not that it will necessarily be done. For the self-test, it'd be good to know exactly when aging is being done or not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would help make the self-test set the right expectations for aging. The Kconfig would also allow a user to know that, no matter what, we're going to get correct age data for VMs, even if, say, we're using the shadow MMU. This is somewhat important for me/Google Cloud. Is that reasonable? Maybe there's a better solution.