From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C421AC05027 for ; Fri, 20 Jan 2023 17:30:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65D6B6B0092; Fri, 20 Jan 2023 12:30:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60D9A6B0096; Fri, 20 Jan 2023 12:30:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D5C56B0099; Fri, 20 Jan 2023 12:30:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3D2EF6B0092 for ; Fri, 20 Jan 2023 12:30:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 201591C5F60 for ; Fri, 20 Jan 2023 17:30:01 +0000 (UTC) X-FDA: 80375865402.28.BA61297 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf17.hostedemail.com (Postfix) with ESMTP id 283B840021 for ; Fri, 20 Jan 2023 17:29:57 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NHHYwI5D; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674235798; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jDf3LZYSGD10CXbLn9VSOx3el1JLvQsxMhmvdG1KmzM=; b=ugBBA38d1gaqj+y5MhhffNfQJj3tHoW3Vv3Ckzh3G0O4bP+hbEUCM9AAsi9HPpg1hiLKET cfrcSqNhEbVMk1X4vkH+kNEBaWGvPolRUWO24RQbZP4Ad7OE3RG0hhR9DrxytNzsRxPX0e lVC3hEK+XGsFwN2nolw26rCmg0PVUoY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NHHYwI5D; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674235798; a=rsa-sha256; cv=none; b=A5wp/bEogZwH0iNYPFwpm76RhanhVWKobO6CVgq5Qt7UI7fFbG7O34OwtzHhSyWW3L44hy dK2CIQPLebem/eSpZkJeFaL1PcHAWms6Q1aFT9KuDgkEO+IsBgGjwMOGCTZWvaN0E2+VDp +2IVIG4AQf8ToA305+nNp4k96FQlWTc= Received: by mail-qv1-f50.google.com with SMTP id q10so4242052qvt.10 for ; Fri, 20 Jan 2023 09:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jDf3LZYSGD10CXbLn9VSOx3el1JLvQsxMhmvdG1KmzM=; b=NHHYwI5DI1KfNqdGYj10G+aJzDOMKGHgxZ86Lp7KvwyzMt9nT9rT0FHXzQLDoo5rOF apnQybKHZGsBPM0PVQ3yEOcrjEF3ypSwyrz9Ao/Ed5mRArCsbQPr/mjuCtYird3JCuCS 1Dlq3bZTFf3sm4naj3wEjav+dMJf5bsSwRzHSutmN5Nzs8MqbFok2cS04ot2ue7h1R0k d8HqZ3n6IrGxHgPwXt4zIn4G8KRmAdwYhcI5tbUy9W4B6vjQ/XpJLEmQKG+LC9tiBrBR Z4chUen79vt4fRAO0uT4zK5oyV3FnK9uLyYZ+tzCx3n1cX+IopLs0fmBBFg2HBfwX9kh bcdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jDf3LZYSGD10CXbLn9VSOx3el1JLvQsxMhmvdG1KmzM=; b=l1ROvUzCx0y/VRxBgP2h57SCP/qjxfe2HqqPj3H3bYh7nIqcTivf5I1xnODMZYRTZ2 cuHFCh1e8xuWF/AsRUWQk6/gd17Aam6yd6k3qg8i/TAwPquOoY2FM8lT5w958c/Kumpq F36bsPAZfiWjUvZRI0hce1NfKxBrbRQA7p08xEByrsVUK6S/J0fCY1knVfPj7WIsWz6d b6scy1qOvnXK/AUTg6VcF2F8hZgAHlSD8uD/LiQRvMLOWGKaEmE4hmwZ9BpixZSq14cD hH2VbPU4XziyNBO8QXmWpAWCaAqsNAaQemRDQb5WtAVHNIPIoakt9ukGwHw4nU9lzukY XIUg== X-Gm-Message-State: AFqh2kpn5sqZQkL/SM8URH+/CB3gHrNB5j/N1YDRemE62kQZBsAkG9EP NhUAKJ47tt1vHa7qW8Ck3VeZ411428qhj8Iu90w= X-Google-Smtp-Source: AMrXdXs9I7hLGgERQ16Om2J3BW2CeFc8fhwM0XSdIRTwGoxG/U59jw1HSZUzuOao5T9AKXE55MYvOCC+ol2cDQRXQ9k= X-Received: by 2002:ad4:4f25:0:b0:535:664b:7713 with SMTP id fc5-20020ad44f25000000b00535664b7713mr88354qvb.8.1674235797174; Fri, 20 Jan 2023 09:29:57 -0800 (PST) MIME-Version: 1.0 References: <20230117195959.29768-1-nphamcs@gmail.com> <20230117195959.29768-2-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Fri, 20 Jan 2023 09:29:46 -0800 Message-ID: Subject: Re: [PATCH v6 1/3] workingset: refactor LRU refault to expose refault recency check To: Brian Foster Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, willy@infradead.org, linux-api@vger.kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: aka7nfocetyzpyqr8kbi9yte33j76drz X-Rspamd-Queue-Id: 283B840021 X-HE-Tag: 1674235797-407150 X-HE-Meta: U2FsdGVkX19cZQLZg3TAM67HteNIXLmkhNB8+FM6K+S+bxPV9JivPwj40nDHdx1yfINALW1GV8CJWkUvpMcOR9B31mMzQw8uLPUwMYZiv7EkUxbsyb0NWip0azsJD2WpJEWkvYMyTWJZunm/eEe/pTT0BjhKqK3fn82VjxdyhzRT394Qs+iN3ErNzpBNLywQIvu1h/1E9p2U2euCslKqBeHcQaXDPxaCx/GdkToZkgP/vreL7fiEQg890NNCWYZ/P7VOHia62T5jjuezt4eb/zB3YAB7g3M0LPq0/lwUNz/6ILxh9XKNzV99Qpfq64Pk4/uDOysvl9rhPL8zHbhUOTKe5xur5xxxVeaogXSJWuec5+W5GtM+1mhkJk2mUIA4tIBlIpNOly5gNu4LPpjuKzXCx+MEfSo3tB0hbSLCU8pg+QZBZm1Bn0nx8ME/bzOA3KFa5nJ/HPDeJ9V+Rs+DCfq2GNh07gEaj8Am1iK49IjPKfS0B+qB3Qsp2z0593DWeChbRyRWECQEHXVXFdcd9/4SXDdF4MLNeLYLH5y9hLQtGf5Tjbi+pd1AoL6TXnq1+YfYzqZ4/eSO6pQdFKwfUk1JOGpe40S8VlkeiwLLvTYylfjChyiG5OtuqEfaFd+kUgCra4STaZyc8sF9D8L2QXvbTFeM6rukuayi2Ja7HNwOzcXPlBh0kjGiXFERAoY6GtJ/SZ5ubNC1Siksu0DnYCxVuOnO0P8SNx1TPF3yCHE5bTuOtyzvXDkTWh32g/cVstNQ3H/dn1JFWhYOuoZ8Wp4+GV8sOy4rgywvxW5X32hItp+sQEv9Sb3NxRpazbKl6k+JqNlFyujdwBdioLJntlqsjExwQxxnr9pqOLCB9gNIMad9lNnOwFEBYe/8xgKZF7TwR5PzwAOtyOAZ+uJPKjPWCpJMGdc4nUU6/ZOwqlCyrOMvN78v6M9W4IPfW5cICHPh9PlpFkd2Ym1g1LU 6kGi9dKh 7zo0NmYUTSrS1P0l46HdUEpMrQ//RekDcKD4ECOEpbIHz5d4r+ecmkggx0lk6DsQClmT1sg9Iu3oYsPL8Nz/8X3uu/ZRZBi5i01gUELmiuP/hQ/k0hTMf3JYCPS7gI5osxwA53nDb1BM695r7GTuUfHTK3LGjgJl5Hncdq38IJSROIGKpltck1AJJSudDZT5nXsfCeRPNKcbOYRFFqUcels1Zi7s0I5S7XSacIdxHPvAaifpIEN+xko7c7zxE5s/Y1+T0kkoy6zOCL5doRIayi7vrV116TVaOCVHfCGgqIfnHUKVcf8NX/9fuvfPZRow9nNAqS5qqcTIzMmnH7jlI/C73K9Z2i4bcrzDmcAu07FdtJC0lfh9NRQkHT3BzrJbqduYkKBq/tETERUcEDv2+54YBa8XN4XKOP7JMryA2t1jM/t+uWknW8YtToMRJyOPl3C/RMLn7Q0yOZ/Cwv3UtiAvNjvtS/PKEOECGTZyXEURUwknM+iRVJtC+Jw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 20, 2023 at 6:33 AM Brian Foster wrote: > > On Tue, Jan 17, 2023 at 11:59:57AM -0800, Nhat Pham wrote: > > In preparation for computing recently evicted pages in cachestat, > > refactor workingset_refault and lru_gen_refault to expose a helper > > function that would test if an evicted page is recently evicted. > > > > Signed-off-by: Nhat Pham > > --- > > Hi Nhat, > > I'm not terribly familiar with the workingset management code, but a few > thoughts now that I've stared at it a bit... > > > include/linux/swap.h | 1 + > > mm/workingset.c | 129 ++++++++++++++++++++++++++++++------------- > > 2 files changed, 92 insertions(+), 38 deletions(-) > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > index a18cf4b7c724..dae6f6f955eb 100644 > > --- a/include/linux/swap.h > > +++ b/include/linux/swap.h > > @@ -361,6 +361,7 @@ static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) > > } > > > > /* linux/mm/workingset.c */ > > +bool workingset_test_recent(void *shadow, bool file, bool *workingset); > > void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); > > void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); > > void workingset_refault(struct folio *folio, void *shadow); > > diff --git a/mm/workingset.c b/mm/workingset.c > > index 79585d55c45d..006482c4e0bd 100644 > > --- a/mm/workingset.c > > +++ b/mm/workingset.c > > @@ -244,6 +244,33 @@ static void *lru_gen_eviction(struct folio *folio) > > return pack_shadow(mem_cgroup_id(memcg), pgdat, token, refs); > > } > > > > +/* > > + * Test if the folio is recently evicted. > > + * > > + * As a side effect, also populates the references with > > + * values unpacked from the shadow of the evicted folio. > > + */ > > +static bool lru_gen_test_recent(void *shadow, bool file, bool *workingset) > > +{ > > + struct mem_cgroup *eviction_memcg; > > + struct lruvec *lruvec; > > + struct lru_gen_struct *lrugen; > > + unsigned long min_seq; > > + > > Extra whitespace looks a bit funny here. > > > + int memcgid; > > + struct pglist_data *pgdat; > > + unsigned long token; > > + > > + unpack_shadow(shadow, &memcgid, &pgdat, &token, workingset); > > + eviction_memcg = mem_cgroup_from_id(memcgid); > > + > > + lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); > > + lrugen = &lruvec->lrugen; > > + > > + min_seq = READ_ONCE(lrugen->min_seq[file]); > > + return !((token >> LRU_REFS_WIDTH) != (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH))); > > I think this might be more readable without the double negative. Hmm indeed. I was just making sure that I did not mess up Yu's original logic here (by just wrapping it in a parentheses and negate the whole thing), but if I understand it correctly it's just an equality check. I'll fix it in the next version to make it cleaner. > > Also it looks like this logic is pulled from lru_gen_refault(). Any > reason the caller isn't refactored to use this helper, similar to how > workingset_refault() is modified? It seems like a potential landmine to > duplicate the logic here for cachestat purposes and somewhere else for > actual workingset management. In V2, it is actually refactored analogously as well - but we had a discussion about it here: https://lkml.org/lkml/2022/12/5/1321 > > > +} > > + > > static void lru_gen_refault(struct folio *folio, void *shadow) > > { > > int hist, tier, refs; > > @@ -306,6 +333,11 @@ static void *lru_gen_eviction(struct folio *folio) > > return NULL; > > } > > > > +static bool lru_gen_test_recent(void *shadow, bool file, bool *workingset) > > +{ > > + return true; > > +} > > I guess this is a no-op for !MGLRU but given the context (i.e. special > treatment for "recent" refaults), perhaps false is a more sane default? Hmm, fair point. Let me fix that in the next version. > > > + > > static void lru_gen_refault(struct folio *folio, void *shadow) > > { > > } > > @@ -373,40 +405,31 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg) > > folio_test_workingset(folio)); > > } > > > > -/** > > - * workingset_refault - Evaluate the refault of a previously evicted folio. > > - * @folio: The freshly allocated replacement folio. > > - * @shadow: Shadow entry of the evicted folio. > > +/* > > + * Test if the folio is recently evicted by checking if > > + * refault distance of shadow exceeds workingset size. > > * > > - * Calculates and evaluates the refault distance of the previously > > - * evicted folio in the context of the node and the memcg whose memory > > - * pressure caused the eviction. > > + * As a side effect, populate workingset with the value > > + * unpacked from shadow. > > */ > > -void workingset_refault(struct folio *folio, void *shadow) > > +bool workingset_test_recent(void *shadow, bool file, bool *workingset) > > { > > - bool file = folio_is_file_lru(folio); > > struct mem_cgroup *eviction_memcg; > > struct lruvec *eviction_lruvec; > > unsigned long refault_distance; > > unsigned long workingset_size; > > - struct pglist_data *pgdat; > > - struct mem_cgroup *memcg; > > - unsigned long eviction; > > - struct lruvec *lruvec; > > unsigned long refault; > > - bool workingset; > > + > > int memcgid; > > - long nr; > > + struct pglist_data *pgdat; > > + unsigned long eviction; > > > > - if (lru_gen_enabled()) { > > - lru_gen_refault(folio, shadow); > > - return; > > - } > > + if (lru_gen_enabled()) > > + return lru_gen_test_recent(shadow, file, workingset); > > Hmm.. so this function is only called by workingset_refault() when > lru_gen_enabled() == false, otherwise it calls into lru_gen_refault(), > which as noted above duplicates some of the recency logic. > > I'm assuming this lru_gen_test_recent() call is so filemap_cachestat() > can just call workingset_test_recent(). That seems reasonable, but makes > me wonder... You're right. It's a bit clunky... > > > > > - unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); > > + unpack_shadow(shadow, &memcgid, &pgdat, &eviction, workingset); > > eviction <<= bucket_order; > > > > - rcu_read_lock(); > > /* > > * Look up the memcg associated with the stored ID. It might > > * have been deleted since the folio's eviction. > > @@ -425,7 +448,8 @@ void workingset_refault(struct folio *folio, void *shadow) > > */ > > eviction_memcg = mem_cgroup_from_id(memcgid); > > if (!mem_cgroup_disabled() && !eviction_memcg) > > - goto out; > > + return false; > > + > > eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat); > > refault = atomic_long_read(&eviction_lruvec->nonresident_age); > > > > @@ -447,21 +471,6 @@ void workingset_refault(struct folio *folio, void *shadow) > > */ > > refault_distance = (refault - eviction) & EVICTION_MASK; > > > > - /* > > - * The activation decision for this folio is made at the level > > - * where the eviction occurred, as that is where the LRU order > > - * during folio reclaim is being determined. > > - * > > - * However, the cgroup that will own the folio is the one that > > - * is actually experiencing the refault event. > > - */ > > - nr = folio_nr_pages(folio); > > - memcg = folio_memcg(folio); > > - pgdat = folio_pgdat(folio); > > - lruvec = mem_cgroup_lruvec(memcg, pgdat); > > - > > - mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); > > - > > mem_cgroup_flush_stats_delayed(); > > /* > > * Compare the distance to the existing workingset size. We > > @@ -483,8 +492,51 @@ void workingset_refault(struct folio *folio, void *shadow) > > NR_INACTIVE_ANON); > > } > > } > > - if (refault_distance > workingset_size) > > + > > + return refault_distance <= workingset_size; > > +} > > + > > +/** > > + * workingset_refault - Evaluate the refault of a previously evicted folio. > > + * @folio: The freshly allocated replacement folio. > > + * @shadow: Shadow entry of the evicted folio. > > + * > > + * Calculates and evaluates the refault distance of the previously > > + * evicted folio in the context of the node and the memcg whose memory > > + * pressure caused the eviction. > > + */ > > +void workingset_refault(struct folio *folio, void *shadow) > > +{ > > + bool file = folio_is_file_lru(folio); > > + struct pglist_data *pgdat; > > + struct mem_cgroup *memcg; > > + struct lruvec *lruvec; > > + bool workingset; > > + long nr; > > + > > + if (lru_gen_enabled()) { > > + lru_gen_refault(folio, shadow); > > + return; > > + } > > ... if perhaps this should call workingset_test_recent() a bit earlier, > since it also covers the lru_gen_*() case..? That may or may not be > cleaner. It _seems like_ it might produce a bit more consistent logic, > but just a thought and I could easily be missing details. Hmm you mean before/in place of the lru_gen_refault call? workingset_test_recent only covers lru_gen_test_recent, not the rest of the logic of lru_gen_refault I believe. > > > + > > + rcu_read_lock(); > > + > > + nr = folio_nr_pages(folio); > > + memcg = folio_memcg(folio); > > + pgdat = folio_pgdat(folio); > > + lruvec = mem_cgroup_lruvec(memcg, pgdat); > > + > > + if (!workingset_test_recent(shadow, file, &workingset)) { > > + /* > > + * The activation decision for this folio is made at the level > > + * where the eviction occurred, as that is where the LRU order > > + * during folio reclaim is being determined. > > + * > > + * However, the cgroup that will own the folio is the one that > > + * is actually experiencing the refault event. > > + */ > > IIUC, this comment is explaining the difference between using the > eviction lru (based on the shadow entry) to calculate recency vs. the > lru for the current folio to process the refault. If so, perhaps it > should go right above the workingset_test_recent() call? (Then the if > braces could go away as well..). You're right! I think it should go above `nr = folio_nr_pages(folio);` call. > > > goto out; > > + } > > > > folio_set_active(folio); > > workingset_age_nonresident(lruvec, nr); > > @@ -498,6 +550,7 @@ void workingset_refault(struct folio *folio, void *shadow) > > mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr); > > } > > out: > > + mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); > > Why not just leave this up earlier in the function (i.e. before the > recency check) as it was originally? Let me double check, but I think this is a relic from the old (and incorrect) version of workingset code. Originally, mod_lruvec_state uses the lruvec computed from a variable (pgdat) that was unpacked from the shadow. So this mod_lruvec_state has to go after the unpack_shadow call (which has been moved inside of workingset_test_recent). This is actually wrong - we actually want the pgdat from the folio. It has been fixed in a separate patch: https://lore.kernel.org/all/20230104222944.2380117-1-nphamcs@gmail.com/T/#u But I didn't update it here. Let me stare at it a bit more to make sure, and then fix it in the next version. It should not change the behavior, but it should be cleaner. > > Brian > > > rcu_read_unlock(); > > } > > > > -- > > 2.30.2 > > >