From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEC00C38145 for ; Fri, 2 Sep 2022 01:29:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17CA280092; Thu, 1 Sep 2022 21:29:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12D768008D; Thu, 1 Sep 2022 21:29:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0E1F80092; Thu, 1 Sep 2022 21:29:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E21368008D for ; Thu, 1 Sep 2022 21:29:08 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B40C940C74 for ; Fri, 2 Sep 2022 01:29:08 +0000 (UTC) X-FDA: 79865411976.02.ED76479 Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf11.hostedemail.com (Postfix) with ESMTP id 541784003C for ; Fri, 2 Sep 2022 01:29:08 +0000 (UTC) Received: by mail-vs1-f51.google.com with SMTP id i12so633698vsr.10 for ; Thu, 01 Sep 2022 18:29:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=drzeCFI6C3D4GkZ2WMTCU8o1/1IhLN/qaofev4e6fi1mIufwzBZlDMul7pVTnFowrj UckddCVK33VtTWFTYgSwggcc35E+H7akCFrt9aaUc+lQxJdtPiXUuUrFpYBqX8whJFLb maD1mApMxPqykVUQY8do9X2znhlNAa8zwt8fqeqCHOMVM2U4Lclb9fGQz7L2MFxi6a/X GAjAqGed8QjLRpy9QHJoiiouQPeURRiKw4HyoZ1wrnLam2rEtSyGqPqRP92Js5aUIZV+ TR+++JpoQg+gEp4AScIH3ekOAcuqFdJ3sGEG67PWwlb6Flin6pQOiyjE829Zn852Pf+l XUKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=O0+AXi+gSbJ40apDG9Gedsv1YG7EpP0fb4sAulH5IibU5QUfUmk+hiDpVC1NdNtFKJ RBVGMwQEmte9k4dc2h66SmnwmyQXTi8kRloNJIm+c5v11CPC8zMbWbHUHRgF0J82Z3Lm uwlzH6f0PgZR1lFgo4Fk/DtgMZhi4yUWN0OjNzE7fMEU7IJSFwm/hIaWLREFJrA71/Tp 98yN54g/SWj2s3fafeEuUMJF3d3wkEsY1PJLRuCrYpMyNUw7dCicBc3DM2j16X01nEqe FZyXC0Tt90onkIoyy39cKSdIYTQ2UgtIrQkgUYf42gUWC+mCPDOM3CJR8ES9zu3MZRzV 8j5w== X-Gm-Message-State: ACgBeo39KhDfyvWSD0UtIUnQrQK1C/VYkfoqShTui8y1oVu1cUXQGnsz l4ySW41D+tT7SdtFh7DmZkTdrmhoi2w6re89dboizw== X-Google-Smtp-Source: AA6agR4+ZFKFADL/9cwQOj7+rGh6g92ApnvKD8hhwNx3e9lydmKFAO6gZ8jfP/eZABigIsOdyeih9cEJv0DekXvb/JQ= X-Received: by 2002:a67:f909:0:b0:390:e960:7f5a with SMTP id t9-20020a67f909000000b00390e9607f5amr7264884vsq.50.1662082147436; Thu, 01 Sep 2022 18:29:07 -0700 (PDT) MIME-Version: 1.0 References: <20220815071332.627393-1-yuzhao@google.com> <20220815071332.627393-8-yuzhao@google.com> <0F7CF2A7-F671-4196-B8FD-F35E9556391B@gmail.com> In-Reply-To: From: Yu Zhao Date: Thu, 1 Sep 2022 19:28:31 -0600 Message-ID: Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap To: Nadav Amit Cc: Andrew Morton , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Peter Zijlstra , Tejun Heo , Vlastimil Babka , Will Deacon , Linux ARM , "open list:DOCUMENTATION" , LKML , Linux MM , X86 ML , Kernel Page Reclaim v2 , Barry Song , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662082148; a=rsa-sha256; cv=none; b=v0xLzR7X3h6REbqJY3CAyBGlu40V/1eYlcFGeSn5E7YavtvZ0s+KGWqTAGx3Aqhvwkdvdb NzNgzCgkzJbqDHIHPGJqJUc8NnInOaqCXSSZsnajTklIHiNnRhZ6BF9Jrvy9SX6wlS7oNG gxls0EbSIl+n1CiS+iEd+yvbJOaPreo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=drzeCFI6; spf=pass (imf11.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662082148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S/fTF/4dM/pEqTpy2bm0bAwsPgRgvgo33NrrNtuAix8=; b=M7fzg3e+ucZF67WbauSw38qsghUS1TGhwsfnc9+5TVWporx6K2YdUJi7gXaQbPdH7/CtHp 8DAXAobT9xm1exrPjxQ6Sty/zzt6u1W2wRvZTcxzPcnE3nnAJgaWf1tD1O5jHcq+BdHaQ3 LwfYBs2iaNU/KG3ttKLnE7mBzfC7OlA= Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=drzeCFI6; spf=pass (imf11.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 541784003C X-Stat-Signature: 8xc9eebmocaw74xdta1pi56tpnrxi5io X-HE-Tag: 1662082148-857380 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 1, 2022 at 7:17 PM Yu Zhao wrote: > > On Thu, Sep 1, 2022 at 3:18 AM Nadav Amit wrote: > > > > > > > > > On Aug 15, 2022, at 12:13 AM, Yu Zhao wrote: > > > > > > Searching the rmap for PTEs mapping each page on an LRU list (to test > > > and clear the accessed bit) can be expensive because pages from > > > different VMAs (PA space) are not cache friendly to the rmap (VA > > > space). For workloads mostly using mapped pages, searching the rmap > > > can incur the highest CPU cost in the reclaim path. > > > > Impressive work. Thanks. > > Sorry if my feedback is not timely. > > > > Just one minor point for thought, that can be left for a later cleanup. > > > > > > > > + for (i =3D 0, addr =3D start; addr !=3D end; i++, addr +=3D PAG= E_SIZE) { > > > + unsigned long pfn; > > > + > > > + pfn =3D get_pte_pfn(pte[i], pvmw->vma, addr); > > > + if (pfn =3D=3D -1) > > > + continue; > > > + > > > + if (!pte_young(pte[i])) > > > + continue; > > > + > > > + folio =3D get_pfn_folio(pfn, memcg, pgdat); > > > + if (!folio) > > > + continue; > > > + > > > + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i= )) > > > + continue; > > > + > > > > You have already checked that the PTE is old (not young) so this check > > seems redundant. > > You are right, for x86, which belongs to category 1: hardware and > OS share the same paging data structure. > > > I do not see a way in which the access-bit can be cleared > > since you hold the ptl. > > There is also category 2: the OS paging data structure is a shadow of wha= t > hardware actually uses, e.g., POWER9 radix. > > To make both categories work, the general rule is that the OS paging > data structure must be more strict, i.e., it can have A/D bits set > while the hardware paging data structure may not. The opposite is not > allowed, even for the A bit, because the A bit can also be used to > determine whether a TLB flush is required. The Linux kernel doesn't do > this but there are other OSes that do. > > For prefaulted PTEs, we generally mark them young unless > arch_wants_old_prefaulted_pte() returns true (currently only ARMv8.2+ > do). On POWER9, we'd see those PTEs pass the first check but fail the > second. Because the first check (non-atomic) is allowed to fetch from the OS paging data structure (which is more strict) while the second check (atomic) must fetch from the hardware page data structure (which does not have the A bit because those PTEs are preffaulted). > > IOW, there is no need for the =E2=80=9Cif" and =E2=80=9Ccontinue". > > > > Makes me also wonder whether having a separate ptep_clear_young() can > > slightly help, since anyhow the access-bit is more of an estimation, > > and having a separate ptep_clear_young() can enable optimizations. > > > > On x86, for instance, if the PTE is dirty, we may be able to clear the > > access-bit without an atomic operation, which should be faster. > > Agreed.