From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40AA4C3DA4A for ; Tue, 6 Aug 2024 00:14:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83F366B0083; Mon, 5 Aug 2024 20:13:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EF936B0085; Mon, 5 Aug 2024 20:13:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68F6B6B0088; Mon, 5 Aug 2024 20:13:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4AF816B0083 for ; Mon, 5 Aug 2024 20:13:59 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D37E71A1D18 for ; Tue, 6 Aug 2024 00:13:58 +0000 (UTC) X-FDA: 82419897756.21.3B36F11 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf15.hostedemail.com (Postfix) with ESMTP id D0153A000A for ; Tue, 6 Aug 2024 00:13:56 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=yL1qgzmd; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722903188; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lz1Ae/7rlg/3/u2JXrEz+z/RAQ/eMDWE8SckjgKSpLY=; b=LNHJwrNSKGL1lAugL7PLr/1/shnMjj6BcHBYFoDA/ikA40m1UzfH7lN0B9uGzezV0OBMwO KntZiHG/6lnp9nUxxrNZgIecFEfxRXycwQbdg7Kvs9OjmgkHIcBw25tllMUgg11LsQIxaz ptZ9wuP5joMHhVidW4vHmcmdikdALDo= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=yL1qgzmd; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722903188; a=rsa-sha256; cv=none; b=c7JaaU4sFLn7IwtcP5EXk+V9WZQz9fyl+eXXCMPzCVK8SoNHjFFu57RIHGu2kKI+b+3/Ex 7xRLVnBMSHbQ3HxZt8MWA1L3MiU59ep+sogk36zrHQWOVa7ChmgdbalJAu5JgS/vjS96x/ mGayTmvda35AcPRKvFrfa2pSbdnQecM= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a7d26c2297eso18342266b.2 for ; Mon, 05 Aug 2024 17:13:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722903235; x=1723508035; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lz1Ae/7rlg/3/u2JXrEz+z/RAQ/eMDWE8SckjgKSpLY=; b=yL1qgzmdHqnSzLmbY8OhBGROKeT1WRtlAR9h1XBkRTYXEhwMMAF6XPAo7bjuLkzwZ/ 5JQ5vmbs3XIrb9ZePEyFHA+1ITRmXdqQXZEB2FUOht9p/ud7xbtmQlfwUmTmO/dRR42J iVYh4qwf76FbdrXMFoPKNkGTHsRoAccoXUH4tlmPDTRPLqC/4IAs72WuYq8Z6CayROjE wQV0RFdrGtFvkMJNKgI7e62AigPsnO7r8ufiXmIsG2+9jGWx4S3dnrKOw1jBC0UmqR17 p32NoUaHqlIMYNokCuwKRCSnRiQevnBzgqraif0AP0HM/bm2sO7ZjdXHA0ZDKje76t6n zcew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722903235; x=1723508035; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lz1Ae/7rlg/3/u2JXrEz+z/RAQ/eMDWE8SckjgKSpLY=; b=NnUt6cq5VTax5PWw5WNg/Ee56zDu5MdiGXdk3l6aOUILyfgI24yRkVcskgLtMqP2Pn r+3H68l0jrZR1CMGbphDTatwx8EgpONuPbgXVmNXcd7o869FBGXlzGuPOgVyET3NdonS ADpfkX7WXW+igGmXn8pVPfSDCXB9N09fO2+SMXT6Ny2JjZbQbVLrYfA/hvHOel1yqTL0 rtL+hCgg+UJqSyJsn1hIV6VoeQPV2WpxUV0aSgEjFc2FU6mS372U4UCcm27GH6O+J7KS HtYZacr8PRbLDGsSsULNFkR+zjVv09vFXVjzwIpayNe7hvOXuPHdAVHmgyMOmk++CNBr +3pg== X-Forwarded-Encrypted: i=1; AJvYcCW16aVgqpbQf65agA0K7PCbb61aWN5Oy7B+c/fu9x4z79wcTfH2M4j+3MDgHaB+1oej6xEZFRXVuMlZp+OV/3tXY5g= X-Gm-Message-State: AOJu0Ywoc+f0J0LkAEsXdYQHUupRUb0pdFr8kWc4/hD7/X/BYrb7xH+a P42meJlVQ85dq+4kLLu1fyHPLi1+0lpx/xnk8iEcbtJnnf+ohxKBngt+Qlu+p2R4QfLOtof4zeL E0GEoLVltiEJBuTRNxod5Iy/kRi7ID1xjDwF1 X-Google-Smtp-Source: AGHT+IHc3zANL8ZaSgowmvp7aToDGXdFt777ft/8Sndpc2h+mU1jmNIEJ9xF4cwU9gBldTl1f9hRoWOI/y3KoRYdq/g= X-Received: by 2002:a17:907:6d28:b0:a77:f2c5:84a9 with SMTP id a640c23a62f3a-a7dc4e56417mr954695566b.18.1722903234349; Mon, 05 Aug 2024 17:13:54 -0700 (PDT) MIME-Version: 1.0 References: <20240805232243.2896283-1-nphamcs@gmail.com> <20240805232243.2896283-2-nphamcs@gmail.com> In-Reply-To: <20240805232243.2896283-2-nphamcs@gmail.com> From: Yosry Ahmed Date: Mon, 5 Aug 2024 17:13:17 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker To: Nhat Pham Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, flintglass@gmail.com, chengming.zhou@linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fnrpupwwbmjk9jyhf9fhkciguc7xydpa X-Rspam-User: X-Rspamd-Queue-Id: D0153A000A X-Rspamd-Server: rspam02 X-HE-Tag: 1722903236-971616 X-HE-Meta: U2FsdGVkX1//UaPAuID3esWXNGAMbHFhe4e+d6nWf5jT+xmh41zq/GFACXTMYjko/HfZzcQ5RfSTff3N8GZHCpGmiMMiApDKqXDnWIpaO+I4+GD1b4ne8boNWqVJ46GyqWUh5E1XmHn+SQ89C3o8OdKIZesJAhCYHdORJ/JIOF3Y2l70cFDCMdfe1fcsRzR+sI8h/pFTQVLpzYDU+fKKNJuNpCc0ReAbzQb6IxxcqrQmDvrJaDuDUb9VIUVNsWykDB1XCGD8i0HTzl1rnFStbHv0Nd9BU01l7vkTgv4RQvecyfu/9x51Ki8pho9pgXnDAADx7dPXFW+5o0WyNSJO/sU+iwtXDwpzxiOku/JCnk47oQLLB/3EnMMxGa20HzBZUDczYiprANVEb4eLLFt4SlrSUc7xlBT/MiUzu771DY5ZvvEX48NTrCeX4AEdv1axDFiBk4iUic6H1WHOZWbXXkicSFe0yuvjaeBwYMQ/+KBLUQ7WcHLxyjcTw3wvjLzHCjsGsCZwkAh3WpxRDqhmZxLloTEvJCqFZXiAsYHZeg/ME9rykOsUEqniFVwol7Z+GSBfT27vnxvWX3F9VfYyJBWYOlV454KmEZ+/qjxnlSAQPi42/UBmSrBZCp/wDK7Kn8rpkEuc0Lk2nZyhoySUh4yTn8tEyBj0YnMscc74Ga7eRt2w4eHbP4NhHGyFLLwEIEMxeZrbsl4/8JovxYq0gjDkeeEsYHSRq0ENwsKgjZO9Q1q4kUMD4Az30EXOCuthuQ6JMp5TddcLXP3RZ6rT5SMgYoYC2znKLhouXLGKXw3Pn2swrxNsx8BzjXpdlUvM28aJVjhz6CoEuOoD//ushDsRdrULGNyijouB31mmPi4Fn96VXWHExamY0xGQt+P4dwR/WuCjqDdWItTKir9qVAk6anlWTjvM5lRd7exT2oORL3cmH1IrxlswKUxlmQfI7vTt5+wl4HL3kgkL96q 02j5jUtR SnpGrI+3rB5iyf8IuArR8nkTnwipxYzzG4PFTeaqvM+vDWURViTm6ItRemSuM1Qxqqri5inLpLK0REVGY0jFH/hkhA23df3EEfyxPz5MGAPFz0Q/EnCMq/mwAsca8Up7EW41Qj4i4CXKq55tNAbin1mHw5SNZL7CNZ5WL6C53H7WKRDRXAONJHHgW0FeFaGvgnYSBduF8YqRE9R8W8m4zUgdfXJnzuKXMJVItiW8Dbq0lSj8/cZ1QY0PfmT52RSOL4igi5PkdjWE2zyRUyvavqxw0Tw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.006539, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 5, 2024 at 4:22=E2=80=AFPM Nhat Pham wrote: > > Current zswap shrinker's heuristics to prevent overshrinking is brittle > and inaccurate, specifically in the way we decay the protection size > (i.e making pages in the zswap LRU eligible for reclaim). > > We currently decay protection aggressively in zswap_lru_add() calls. > This leads to the following unfortunate effect: when a new batch of > pages enter zswap, the protection size rapidly decays to below 25% of > the zswap LRU size, which is way too low. > > We have observed this effect in production, when experimenting with the > zswap shrinker: the rate of shrinking shoots up massively right after a > new batch of zswap stores. This is somewhat the opposite of what we want > originally - when new pages enter zswap, we want to protect both these > new pages AND the pages that are already protected in the zswap LRU. > > Replace existing heuristics with a second chance algorithm > > 1. When a new zswap entry is stored in the zswap pool, its referenced > bit is set. > 2. When the zswap shrinker encounters a zswap entry with the referenced > bit set, give it a second chance - only flips the referenced bit and > rotate it in the LRU. > 3. If the shrinker encounters the entry again, this time with its > referenced bit unset, then it can reclaim the entry. > > In this manner, the aging of the pages in the zswap LRUs are decoupled > from zswap stores, and picks up the pace with increasing memory pressure > (which is what we want). > > The second chance scheme allows us to modulate the writeback rate based > on recent pool activities. Entries that recently entered the pool will > be protected, so if the pool is dominated by such entries the writeback > rate will reduce proportionally, protecting the workload's workingset.On > the other hand, stale entries will be written back quickly, which > increases the effective writeback rate. > > The referenced bit is added at the hole after the `length` field of > struct zswap_entry, so there is no extra space overhead for this > algorithm. > > We will still maintain the count of swapins, which is consumed and > subtracted from the lru size in zswap_shrinker_count(), to further > penalize past overshrinking that led to disk swapins. The idea is that > had we considered this many more pages in the LRU active/protected, they > would not have been written back and we would not have had to swapped > them in. > > To test this new heuristics, I built the kernel under a cgroup with > memory.max set to 2G, on a host with 36 cores: > > With the old shrinker: > > real: 263.89s > user: 4318.11s > sys: 673.29s > swapins: 227300.5 > > With the second chance algorithm: > > real: 244.85s > user: 4327.22s > sys: 664.39s > swapins: 94663 > > (average over 5 runs) > > We observe an 1.3% reduction in kernel CPU usage, and around 7.2% > reduction in real time. Note that the number of swapped in pages > dropped by 58%. > > Suggested-by: Johannes Weiner > Signed-off-by: Nhat Pham > --- > include/linux/zswap.h | 16 +++---- > mm/zswap.c | 108 ++++++++++++++++++++++++------------------ > 2 files changed, 70 insertions(+), 54 deletions(-) > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > index 6cecb4a4f68b..9cd1beef0654 100644 > --- a/include/linux/zswap.h > +++ b/include/linux/zswap.h > @@ -13,17 +13,15 @@ extern atomic_t zswap_stored_pages; > > struct zswap_lruvec_state { > /* > - * Number of pages in zswap that should be protected from the shr= inker. > - * This number is an estimate of the following counts: > + * Number of swapped in pages from disk, i.e not found in the zsw= ap pool. > * > - * a) Recent page faults. > - * b) Recent insertion to the zswap LRU. This includes new zswap = stores, > - * as well as recent zswap LRU rotations. > - * > - * These pages are likely to be warm, and might incur IO if the a= re written > - * to swap. > + * This is consumed and subtracted from the lru size in > + * zswap_shrinker_count() to penalize past overshrinking that led= to disk > + * swapins. The idea is that had we considered this many more pag= es in the > + * LRU active/protected and not written them back, we would not h= ave had to > + * swapped them in. > */ > - atomic_long_t nr_zswap_protected; > + atomic_long_t nr_disk_swapins; > }; > > unsigned long zswap_total_pages(void); > diff --git a/mm/zswap.c b/mm/zswap.c > index adeaf9c97fde..fb3d9cb88785 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -187,6 +187,10 @@ static struct shrinker *zswap_shrinker; > * length - the length in bytes of the compressed page data. Needed dur= ing > * decompression. For a same value filled page length is 0, and= both > * pool and lru are invalid and must be ignored. > + * referenced - true if the entry recently entered the zswap pool. Unset= by the > + * dynamic shrinker. The entry is only reclaimed by the dyn= amic > + * shrinker if referenced is unset. See comments in the shr= inker > + * section for context. Nit: It is unset and reclaimed by the writeback logic in general, which isn't necessarily triggered from the dynamic shrinker, right? > * pool - the zswap_pool the entry's data is in > * handle - zpool allocation handle that stores the compressed page data > * value - value of the same-value filled pages which have same content > @@ -196,6 +200,7 @@ static struct shrinker *zswap_shrinker; > struct zswap_entry { > swp_entry_t swpentry; > unsigned int length; > + bool referenced; > struct zswap_pool *pool; > union { > unsigned long handle; > @@ -700,11 +705,8 @@ static inline int entry_to_nid(struct zswap_entry *e= ntry) > > static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry = *entry) > { > - atomic_long_t *nr_zswap_protected; > - unsigned long lru_size, old, new; > int nid =3D entry_to_nid(entry); > struct mem_cgroup *memcg; > - struct lruvec *lruvec; > > /* > * Note that it is safe to use rcu_read_lock() here, even in the = face of > @@ -722,19 +724,6 @@ static void zswap_lru_add(struct list_lru *list_lru,= struct zswap_entry *entry) > memcg =3D mem_cgroup_from_entry(entry); > /* will always succeed */ > list_lru_add(list_lru, &entry->lru, nid, memcg); > - > - /* Update the protection area */ > - lru_size =3D list_lru_count_one(list_lru, nid, memcg); > - lruvec =3D mem_cgroup_lruvec(memcg, NODE_DATA(nid)); > - nr_zswap_protected =3D &lruvec->zswap_lruvec_state.nr_zswap_prote= cted; > - old =3D atomic_long_inc_return(nr_zswap_protected); > - /* > - * Decay to avoid overflow and adapt to changing workloads. > - * This is based on LRU reclaim cost decaying heuristics. > - */ > - do { > - new =3D old > lru_size / 4 ? old / 2 : old; > - } while (!atomic_long_try_cmpxchg(nr_zswap_protected, &old, new))= ; Nice, arcane heuristics gone :) LGTM with the above nit: Acked-by: Yosry Ahmed