From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BAB0C3271E for ; Fri, 5 Jul 2024 18:36:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB5E96B009E; Fri, 5 Jul 2024 14:36:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B3D3D6B009F; Fri, 5 Jul 2024 14:36:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B6286B00A0; Fri, 5 Jul 2024 14:36:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7B7036B009E for ; Fri, 5 Jul 2024 14:36:11 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F3D63140312 for ; Fri, 5 Jul 2024 18:36:10 +0000 (UTC) X-FDA: 82306553742.13.69BDEBB Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf28.hostedemail.com (Postfix) with ESMTP id 3324FC001F for ; Fri, 5 Jul 2024 18:36:08 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Gshk64RD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720204539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yoy/TMejpXli2qc8prUN620bWwV2Ki2Pjep6mdKl7Uo=; b=wqiE66LopnLObshWlzDBlfwqnmZ9ylNTXAz1DwnrBfK9mR1l9NB04yRbhcanycy1hRPTGW YYCi5372w8aHNUQsGDHxTdBvK+fwAW0PXm+HEWJZK0iohP41N1YZ8L0mZ9Eieb2/bGzDR9 ipQkSXxsKzzSZ67fUHnu91+F6DAuxUs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720204539; a=rsa-sha256; cv=none; b=s2VcgJIL31hDKoH2PsDYHSRZTQw6ZVkUia/R8WrXjsHYsmdO/kWf2kPI8aS/+UeBAkaabx RJ603Go9d/wJf8pwfMROidFc5ySn5LebP3ePJ5vxWsx8ohiENZw3h0nwaXLy7vABxmJLAY t/TmKKU7b0OkyIgV/5ysO5BRG9/UhZA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Gshk64RD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=yuzhao@google.com Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-447dabd9562so193871cf.0 for ; Fri, 05 Jul 2024 11:36:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720204568; x=1720809368; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Yoy/TMejpXli2qc8prUN620bWwV2Ki2Pjep6mdKl7Uo=; b=Gshk64RDW1MLyxVuVHBbbhxe/LPkaHBvXL84958raJId2PMQw3kVpU1t8e0KE4ifGx I5yPE08igRMa/WdPTOeWAXXLfa2anoE7MV3Ug17UQ2qTPXCPS7s+/A8+AiOcsMapJPBs M/Fp+XA+fkbCQ6Kgs+uZFgj9kBjfCOFVtOPPMIktjiZvio2R8vcDgejildq9+U82L4AT fXKU2MkDk6S/wjLpaNyolpNNgoEINuxUr09N4l1enUbWT3dnJCZUO1lzU6Ah02UhHSt5 V7sjdT3jPgW2kUAJxGyhpAmKfCCZA8BkjRuqNKA1Xc7Z7mRwzyO5HApJPzYOWd6lKJo4 EOWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720204568; x=1720809368; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yoy/TMejpXli2qc8prUN620bWwV2Ki2Pjep6mdKl7Uo=; b=CUhhP5yvf4mWQe/9jqUH+XGfZFXuYOGowWWJgUizvnXwtGhOUs94oSLRFZkj4zrJK3 XPdSJZV7vdGdqzAMIvvKhhG6YjS8T2J4moNIJiGuebD4xQMcayV9IrNxJbBCZpeJM+u7 s+wEtY2ObvvCDhlTQIbtsJYen73mk5AD2NJR+mOhm4sLXqtwkKmdRnsr1ZiHAolzEbCR bhzEvrTabSjb7bgCksbaWk1SU4VWphhosZefrcUmFfTN1cOtQqgEACP5+JrtRShEr7Q0 3QE4uHcHSCvId1Z+mc5uN9y7CwTq5zIq1/R0bS5VzwtjzjbttW5aNlUM/dZRP0ieCEYd RUTg== X-Forwarded-Encrypted: i=1; AJvYcCXNGgkT8AisMVRtRrpG24OxbfN4XWk0ekQ9l7kCoMgSI1rDy+1+3fl0oNKzN3KSREwpZPIj0P1HqaJjfZH8+F3QLwU= X-Gm-Message-State: AOJu0Yx7/acrZRD/3il6qZFU35v6CYVc8rjtcF7hWPZ7nOnW7zPDtbFJ XyHyGzC/bxhyJJQFhwC55343sUdsW3AeDBsXV7hCPGCwy7W9LM4qYAImcyDlG2BgJ9J70N0msOk rnjGIISsAVmzya8s18uX1i81tVJmMJjnWyjUy X-Google-Smtp-Source: AGHT+IGnfZ2i6vZIamne8x/WQHaBe69PdvzYLEPHrKS7jax7pkFgeRZaSsCa2L5NvM2TqQ2LDJG/8RSWDD12RqGBZ9E= X-Received: by 2002:ac8:4f43:0:b0:443:1cdd:7859 with SMTP id d75a77b69052e-447c8fa4056mr5386091cf.12.1720204567891; Fri, 05 Jul 2024 11:36:07 -0700 (PDT) MIME-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> <20240611002145.2078921-9-jthoughton@google.com> In-Reply-To: <20240611002145.2078921-9-jthoughton@google.com> From: Yu Zhao Date: Fri, 5 Jul 2024 12:35:29 -0600 Message-ID: Subject: Re: [PATCH v5 8/9] mm: multi-gen LRU: Have secondary MMUs participate in aging To: James Houghton Cc: Andrew Morton , Paolo Bonzini , Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3324FC001F X-Stat-Signature: bs9kgzmtqr3ky438a5wdy56dyceq5fuc X-Rspam-User: X-HE-Tag: 1720204568-553131 X-HE-Meta: U2FsdGVkX1/4vIQhNsYS9siWCkII/ORVZfZoc+e4XrbSKo+38uxXdFTuPTdFD7DFdNZiTopc9kRVsi/+l+rDr3mCHRsOqejOSS8M34vHxE3exbPkrKuMjOPeSCrMB54NRy9lT6pybfl7vPFEgAi+EZl+BTRYtBRaFCTG8mygORed5bm+kwuwFtVyoMW5yEA7oswWfRKsljutR2I7FC/6ghyWePngeCEw+QG9i3Zc0CqGGH6KgnOQ9XiacjXwLabBLjzS8jbNh8zo5NWkWXH/QysD7XiFGKSPlFMudUtE2DxSMvAKlV2e3sX0fQCOvXF4RlECduyXNPmFEVXp4jV2TNM5jVNuAJY/9XIGKipAmYMG0rJFTpbZRigPAaqwRu9j1PSpnRZcfPHrAaQjDz5GgDRCA87F2Bx3XLp1ZXH1M/cwMYckfl3Um/OcoZLkz6J8cOBRZNrf9BzoxtZ18sS86TmEoKBU+SlDT4MkByGbTDd3SOM5vIAeW51PkqgXF8pDmafFGbqrUNkFtVMw4K6wUatVahpE9ODIDVcrDeRyr4ghx1r5MZS7IkZZClBIU34E+XrQx6XhWbzRN3TeLeBAhJsWxeL2FAKPjNd4HDYlntQRd+uFxCANXpU4JRL3KAIqiDAfYAVppiuFzAWmyosBbPrDMS8vFR24T70ZFGQePql2WQIVWlRe/M10/btVNbyDoQZNeG1Uv8z9fcCtB0RgDM1m+hfLfm9RiZ1MLKI+xisd0iTBMmmz8cIfyVuvZ+O08fnrfzTK9duOEAffKQcXC1mAVzYtBYy0OJILtUSxP2ocozP01nCzij08k9eiQpRZtNYpuM9zT74fUWvhO/I/FTQDkgbDjjZWWQ4QVKEcMWVAX4q+llfc1LaXac+PgiFZxzunYVy547eoTEOGiLy8tJDrvYNddeMioTKhQoaBoKRkSDDSsgeNRhWaocCAMakbTScrIZ8ZL9Y+StBH0Qc JHH3tGc3 yl3m3MeEnLDhghbIswfZPQ1dWotdEhvK4TnE9QYEZm5tOiHnCHG+Fji0VyNJI7ezNiV+hBdJyCKSi1Vc+7P5HdBi3bWUGUpSo9bAAIw7jQkN7o3nMYaPugluuGHDq2DynYD4b8VOeGxwLjGMicmT3CbHJXYAwucuh4FFQ5RCJ+pP2OjR1zi5QQds+QzwtVEEfWqoC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 10, 2024 at 6:22=E2=80=AFPM James Houghton wrote: > > Secondary MMUs are currently consulted for access/age information at > eviction time, but before then, we don't get accurate age information. > That is, pages that are mostly accessed through a secondary MMU (like > guest memory, used by KVM) will always just proceed down to the oldest > generation, and then at eviction time, if KVM reports the page to be > young, the page will be activated/promoted back to the youngest > generation. > > The added feature bit (0x8), if disabled, will make MGLRU behave as if > there are no secondary MMUs subscribed to MMU notifiers except at > eviction time. > > Implement aging with the new mmu_notifier_test_clear_young_fast_only() > notifier. For architectures that do not support this notifier, this > becomes a no-op. For architectures that do implement it, it should be > fast enough to make aging worth it. > > Suggested-by: Yu Zhao > Signed-off-by: James Houghton > --- > > Notes: > should_look_around() can sometimes use two notifiers now instead of o= ne. > > This simply comes from restricting myself from not changing > mmu_notifier_clear_young() to return more than just "young or not". > > I could change mmu_notifier_clear_young() (and > mmu_notifier_test_young()) to return if it was fast or not. At that > point, I could just as well combine all the notifiers into one notifi= er, > like what was in v2 and v3. > > Documentation/admin-guide/mm/multigen_lru.rst | 6 +- > include/linux/mmzone.h | 6 +- > mm/rmap.c | 9 +- > mm/vmscan.c | 185 ++++++++++++++---- > 4 files changed, 164 insertions(+), 42 deletions(-) ... > static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned lon= g end, > struct mm_walk *args) > { > @@ -3357,8 +3416,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, > struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); > DEFINE_MAX_SEQ(walk->lruvec); > int old_gen, new_gen =3D lru_gen_from_seq(max_seq); > + struct mm_struct *mm =3D args->mm; > > - pte =3D pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &p= tl); > + pte =3D pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); > if (!pte) > return false; > if (!spin_trylock(ptl)) { > @@ -3376,11 +3436,12 @@ static bool walk_pte_range(pmd_t *pmd, unsigned l= ong start, unsigned long end, > total++; > walk->mm_stats[MM_LEAF_TOTAL]++; > > - pfn =3D get_pte_pfn(ptent, args->vma, addr); > + pfn =3D get_pte_pfn(ptent, args->vma, addr, pgdat); > if (pfn =3D=3D -1) > continue; > > - if (!pte_young(ptent)) { > + if (!pte_young(ptent) && > + !lru_gen_notifier_test_young(mm, addr)) { > walk->mm_stats[MM_LEAF_OLD]++; > continue; > } > @@ -3389,8 +3450,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, > if (!folio) > continue; > > - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) > - VM_WARN_ON_ONCE(true); > + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); > + if (pte_young(ptent)) > + ptep_test_and_clear_young(args->vma, addr, pte + = i); > > young++; > walk->mm_stats[MM_LEAF_YOUNG]++; There are two ways to structure the test conditions in walk_pte_range(): 1. a single pass into the MMU notifier (combine test/clear) which causes a cache miss from get_pfn_page() if the page is NOT young. 2. two passes into the MMU notifier (separate test/clear) if the page is young, which does NOT cause a cache miss if the page is NOT young. v2 can batch up to 64 PTEs, i.e., it only goes into the MMU notifier twice every 64 PTEs, and therefore the second option is a clear win. But you are doing twice per PTE. So what's the rationale behind going with the second option? Was the first option considered? In addition, what about the non-lockless cases? Would this change make them worse by grabbing the MMU lock twice per PTE?