From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EB35C636D6 for ; Thu, 23 Feb 2023 19:11:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0196D6B0071; Thu, 23 Feb 2023 14:11:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F0B356B0072; Thu, 23 Feb 2023 14:11:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAD196B0073; Thu, 23 Feb 2023 14:11:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CBB886B0071 for ; Thu, 23 Feb 2023 14:11:40 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8F7B6A0AB9 for ; Thu, 23 Feb 2023 19:11:40 +0000 (UTC) X-FDA: 80499500760.22.F199BB0 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf18.hostedemail.com (Postfix) with ESMTP id C3FC81C000A for ; Thu, 23 Feb 2023 19:11:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=illZ3Rsx; spf=pass (imf18.hostedemail.com: domain of 3aLr3YwYKCOEVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3aLr3YwYKCOEVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677179497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lgz23S0KW/3Vuh8fPybY+EaMWzwQFtPo75lQKh/z++I=; b=uKQne+vB2zbpzwEL3Pzqo5yg+WPWNFWRAwIAgO/l5IsRqdqZI3kKWK2TUmgCKn0WB0y4SR VQis5jRk/tHRHraMsYrsC9BApUhj6vYY9h63CjBz8Z57e5QyEmMnNaVsu2mllzs7t9YzvE wg9xkxtCoVNXAfq4V8tiL1Ec6vPBQxM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=illZ3Rsx; spf=pass (imf18.hostedemail.com: domain of 3aLr3YwYKCOEVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3aLr3YwYKCOEVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677179497; a=rsa-sha256; cv=none; b=pQt1+3jCIDjmr2uFnderEAHrElExexv2vBI9FCzMPPM4bjt/dc0BOWMiobaLplEE1Y8jEg llvtAz//BFDzmZr1xUoBVMBjLU/dVtXXbv54/56YRTONNva5MBX6it8Bj+8Ex8Mea+KpQp dUy3P9gJPdObNeJZI2ARCRzY3TZ6//k= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-536e8d6d9ceso111682477b3.12 for ; Thu, 23 Feb 2023 11:11:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=Lgz23S0KW/3Vuh8fPybY+EaMWzwQFtPo75lQKh/z++I=; b=illZ3Rsxhz/9jljbLI+FUdfsGaNHQ+3fywMFYtiPdi4y1cf10XY7ukC5ghyLXOpRiX ECKpfvO4XF3GMAEqQtGTGUg1jqO+AbRdSqGFpyrvz1Y54kJ0qEPuz3ERkD1mNPRjChGr FIk8fspffno+cxbJIVBWXLSAjcURDSQfT0pVMLjA9MwRSCOas0U4UaV//I6oaWp4/jP8 Nedf5FJP2yTEdqt4rNWa/GGGRcnTa9xaHgSx+sUBPJmKn+5OuioGUbjEJBaVHxmxHjpR JfrWlzWMnyD8o2vHK884xzIW6y4U6nk3QsjYBj1PHqc/oR+/TNHECEU3hASerxLBXfzT 1/Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Lgz23S0KW/3Vuh8fPybY+EaMWzwQFtPo75lQKh/z++I=; b=j87Wm0YId7odbHjr7wMxZ+JVto2E5k5L2HawrjCs8iw0T36w+4zbplYdSY2/6aj9Xv TJAGpAkiQ1vClZNQG3Sin7IDnGdIN8cVpoVM7l5BzmuUO8jYEDlkgESWUtaL5wr2p5Ac G6jWE5AOEnBD+0mgcgqxJwYzOxkhagnPQctzM5hJC+zfmU0yTSUAVphY7JE+jLAoa7h3 sPluTvF+ioc4XgCNkEQTE33QV46nuzNH1sjRDmHPSU9TgaXblCWe6I+6xn9a054aAzWu Zdsyfv7CgycAcpelWnZH7D7odQK0Cby3S130Ii7+DYyA2gTQdhfcDx0DziBRdGRXxHe5 bH8Q== X-Gm-Message-State: AO0yUKUEelQJHAGSfPD/YWNSKRXLdP2uouh5DdYxzjDXRfGmcJONiy5k rE01cZ6zdtq7Bm8ZwHpsFQayeKvVVys= X-Google-Smtp-Source: AK7set9CAvBzEvzau8o9w7k/cYlwu85gWQxy7MQm2EeJKeyL2LgEayAQYJRKvwtnPY5l/Yt7cVb6SpmQxus= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a5b:50b:0:b0:a36:3875:564a with SMTP id o11-20020a5b050b000000b00a363875564amr1021123ybp.2.1677179496807; Thu, 23 Feb 2023 11:11:36 -0800 (PST) Date: Thu, 23 Feb 2023 11:11:35 -0800 In-Reply-To: Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> <20230217041230.2417228-6-yuzhao@google.com> Message-ID: Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young() From: Sean Christopherson To: Yu Zhao Cc: Andrew Morton , Paolo Bonzini , Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C3FC81C000A X-Rspam-User: X-Stat-Signature: ius7ygagme971tzny8pr7qw48khamc4g X-HE-Tag: 1677179497-877660 X-HE-Meta: U2FsdGVkX19y6whzWGu5jeyiNHpJcLjP5DQP9/Iksr95MIyFjDpzG6kjtR8rHL0/C1uWLydqq6Ctgb6FSLotrzbGtzPr0HtoaDuS+dFC22WcOYrDfPC700ddswV8ivDXZNe1njOXufxeBW20dsiINr35noBBxNFyprFCVu3C6jvn+cETNadLTb7By7kh0VwME9AzhdR+LIyh3s++hP34gYMC/4g10t8r9gN72y2YuOW7JGRz/IO6Z/AD0hh/iWOELuaij70mzi6Ane2FGwgLmi0RsoTXac7tiJv0iSldTzo3WAz9GfwfOh4OBzuBTJqIuZc7aPUKnyrFHjCc49E84T4UQpbAJhuKusJAecAPTB33EY2ufdHqwqTmt96a5rXo2GrP18MFdwe06EyPI5aEAL2r81MfYtbR6cyIhtsSjpfh2xxFZBihM+OvGqOhRdZEI2DruT42w4y+DxB/mWx4GRGhtqSqaIc52c7wr5ByHXbQQUMF78WAjlUkQCcVHXFeIQLvrta5E+pdyWiK9B1G1Y21LAP+q0VdVrmr2W9qLqIjSICYnCzqUJl6A6AEJuQuO3PQ0Op6l7LLX5nrVWtOS/PyikMBZTMqf6/Bsx5MLR6YXG90XzxLvF9GazYkg5Kfm0CLZyDckEskE1p8/Hwm2VVhA7Qwn9EHIRgblvAkA0ZE5nhmHZF+rX4W24tpJzYuQwNj9sNpQL448eiTpeamzkorCIcyKK+mbhgrHXUEFvV+RRYF82Wwi1hgtkvFjj9koHrU5X1bssUm9cf0C6G4T0drPdtvyweh/sLsClQSIBtselvYrvC8XJQYe7sMT+fNEnGJj6pZIDFWk/cbZyiOO5P1S66xuWiRZHAgGyrVLsvlv1u54uQ7D7BDuCCKIHd942MsiXGwugROV4Nfuu/BzjHEQVOQyWuG+ouqTj60xbD7TToAxdH5QzMU1coyqy/BHeaStw88WRdEYyaU3q7 WlNtvJw0 9gE2EvhVonwEI1u6Nun8m9VMZqnwH3H5a3eGtP7ZewwYSoegZ4B6ZRkoQdx1gtjSl0ZAcWEF5PQuFlAkn2cTBUTod1EXWtuP20pkPyVx5JexXdEZBDbAGqNOPr/Z9WKrY4LVJAv+52AX46iLkAXdBwW7kFetHXbaAjuc4o8Jx7WYO2NgX6zDAQ/4/SxFES4+Ego5hJ9LjcqweQXBZBBxltfw1ybo1VpB2Wu4YEGGvhfE1SG0hsl0mqd/81LmuMJv7Xq1IB5WaTXaIwVoyPeHJhlEnIkPme4jLnaa/sP7H+Xk4XONtiwIcx1sAx0fGLduNDFtkJ703gfZ6ln5Rc0OUw+vIaRzLINFas7LuxD+/fqPHpG4dl+rKeEsstQsYB21h9trn21I86n+sCZxliWW6F+hAsTYSxVq8YoOBIKUkwsv9U8+aF9TPPlLgB/H+YwOcTeQeNzQqzpD2pofMNtiT+8cveLnhsElJAazlm6dxXvpomOBNUU0BRNVhqnPasnwepIBKuv00WvwoI7SVEktKfYyWj9nSiol+ZX7DIODR89QAfEYB0p5VJ2Ia5rSVdzCDH0Lc/rMjZF1nyIjfZ5bPISC5vWBf6B3DxtovIcC3aX94elvECwhD+BeSI4p5ndrSlwM7nw9hQKsVNGaYHhv2vWN3tU/YGYMa45NvpReLhKIMgTw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 23, 2023, Yu Zhao wrote: > On Thu, Feb 23, 2023 at 10:43=E2=80=AFAM Sean Christopherson wrote: > > > > On Thu, Feb 16, 2023, Yu Zhao wrote: > > > kswapd (MGLRU before) > > > 100.00% balance_pgdat > > > 100.00% shrink_node > > > 100.00% shrink_one > > > 99.97% try_to_shrink_lruvec > > > 99.06% evict_folios > > > 97.41% shrink_folio_list > > > 31.33% folio_referenced > > > 31.06% rmap_walk_file > > > 30.89% folio_referenced_one > > > 20.83% __mmu_notifier_clear_flush_young > > > 20.54% kvm_mmu_notifier_clear_flush_young > > > =3D> 19.34% _raw_write_lock > > > > > > kswapd (MGLRU after) > > > 100.00% balance_pgdat > > > 100.00% shrink_node > > > 100.00% shrink_one > > > 99.97% try_to_shrink_lruvec > > > 99.51% evict_folios > > > 71.70% shrink_folio_list > > > 7.08% folio_referenced > > > 6.78% rmap_walk_file > > > 6.72% folio_referenced_one > > > 5.60% lru_gen_look_around > > > =3D> 1.53% __mmu_notifier_test_clear_young > > > > Do you happen to know how much of the improvement is due to batching, a= nd how > > much is due to using a walkless walk? >=20 > No. I have three benchmarks running at the moment: > 1. Windows SQL server guest on x86 host, > 2. Apache Spark guest on arm64 host, and > 3. Memcached guest on ppc64 host. >=20 > If you are really interested in that, I can reprioritize -- I need to > stop 1) and use that machine to get the number for you. After looking at the "MGLRU before" stack again, it's definitely worth gett= ing those numbers. The "before" isn't just taking mmu_lock, it's taking mmu_lo= ck for write _and_ flushing remote TLBs on _every_ PTE. I suspect the batching is= a tiny percentage of the overall win (might be larger with RETPOLINE and frie= nds), and that the bulk of the improvement comes from avoiding the insanity of kvm_mmu_notifier_clear_flush_young(). Speaking of which, what would it take to drop mmu_notifier_clear_flush_youn= g() entirely? I.e. why can MGLRU tolerate stale information but !MGLRU cannot?= If we simply deleted mmu_notifier_clear_flush_young() and used mmu_notifier_cl= ear_young() instead, would anyone notice, let alone care? > > > @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kob= j, struct kobj_attribute *attr, c > > > if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_= YOUNG)) > > > caps |=3D BIT(LRU_GEN_NONLEAF_YOUNG); > > > > > > + if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WAL= K)) > > > + caps |=3D BIT(LRU_GEN_SPTE_WALK); > > > > As alluded to in patch 1, unless batching the walks even if KVM does _n= ot_ support > > a lockless walk is somehow _worse_ than using the existing mmu_notifier= _clear_flush_young(), > > I think batching the calls should be conditional only on LRU_GEN_SPTE_W= ALK. Or > > if we want to avoid batching when there are no mmu_notifier listeners, = probe > > mmu_notifiers. But don't call into KVM directly. >=20 > I'm not sure I fully understand. Let's present the problem on the MM > side: assuming KVM supports lockless walks, batching can still be > worse (very unlikely), because GFNs can exhibit no memory locality at > all. So this option allows userspace to disable batching. I'm asking the opposite. Is there a scenario where batching+lock is worse = than !batching+lock? If not, then don't make batching depend on lockless walks. > I fully understand why you don't want MM to call into KVM directly. No > acceptable ways to set up a clear interface between MM and KVM other > than the MMU notifier? There are several options I can think of, but before we go spend time desig= ning the best API, I'd rather figure out if we care in the first place.