From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1324C3DA49 for ; Tue, 30 Jul 2024 03:39:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B83B56B0083; Mon, 29 Jul 2024 23:39:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B32E76B0088; Mon, 29 Jul 2024 23:39:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FAAD6B0089; Mon, 29 Jul 2024 23:39:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 840D96B0083 for ; Mon, 29 Jul 2024 23:39:38 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2A3281C0C53 for ; Tue, 30 Jul 2024 03:39:38 +0000 (UTC) X-FDA: 82395014436.24.673FF1B Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf28.hostedemail.com (Postfix) with ESMTP id EF112C0024 for ; Tue, 30 Jul 2024 03:39:35 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Kw1fbUWa; spf=pass (imf28.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722310723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fmUkdyMiuUGo/jhP6zW85emotRS/41cjiPX/2udCqHg=; b=l+kln1rQRySuGv9aY8pOTV6JcFHhBZCLRPhNqcCWO73kpxcCO16KSUa3JSeJT/Rsqk9Qrx 3csYXKfysrn5V4denXgB2by5K6A2nfzOb/FssIfdB3AVRPXGIJkQcuuJ4ZtxP/88IpZwSw wf5+Cg1yQheJX85Jn3emJsfW1WzpGTU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722310723; a=rsa-sha256; cv=none; b=QfA5HAXjKaknaaEtP1y6r9T/baRjfaVGZ+LTuCtQHPkk7GZ0TdHAZfaydF99bVW0B/qfOD S+rEn819YMYWLVsxHuSmzwOdOKNaz43PKjJyKXft2kAnA2AxR+W22AyZJQp0SA0gJH0dGO ks+Ml/fN+Vyfj4JTKVlXuiqPseIMs2U= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Kw1fbUWa; spf=pass (imf28.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-7a1e31bc1efso230474885a.3 for ; Mon, 29 Jul 2024 20:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1722310775; x=1722915575; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=fmUkdyMiuUGo/jhP6zW85emotRS/41cjiPX/2udCqHg=; b=Kw1fbUWa0Q955l4o5ASp4l5oGnMH+1QSmlmNT+ey7+atUWKVjDrSK8wrQpPi7nt6h7 ooMdyS7pjCPHed3fd9uLvJcGvwC+KoO7bG2agoNQ+a0M+VsBbUVh97iW5ixhTxlRmv98 ZJ+8fRc3PcZC5P83L8NJ6OeZrrG2wT/2SMxr+DR1jHyAYRyaWGVH2xd8Nqius2+/1wFs RO6tLMWvf3wk3al0CXBYWXMK7LRXbsoqwGXKn9hJBYq46tMjbObyCEN2xMYGLhkzgKpx H/F5toIK+tGL4RmyMFUpbbxYSBnI6NHuYvuDu38cB6HHGF3G4sIEiDq6lrhteQ0FjCsx YITw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722310775; x=1722915575; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fmUkdyMiuUGo/jhP6zW85emotRS/41cjiPX/2udCqHg=; b=YMLOmdAo6lbn+uXAbq80IllkOT83uTigyzgoThb51gUOxRCyKaBOzbzMHucBnEVU6C 6rAMpIqt99KidhMZGG8Mk6KHqsuhetlbMUX+C8Nou80bA2crZxygyGVmd2APlFHypEtt hItRGuTWHD2DIqjLgVvOErGtQUtgd8vRxjUsYdm1d6QelFSjWbCeaDZk+MWvzCpIcQGK wEh3v0T4de52JH1nmdGQY3qMKhcrWDj98/QdDqqgMDg24Dq3MqDssGVyFnjAY8oxpZlt 8TtdxB2ESjVdB5WuUWNAmXwEX5AecFuB3lrfphfF46QTUZ+y+ts5f2H6FJUyRAnhiA6V gjlA== X-Forwarded-Encrypted: i=1; AJvYcCWsOR0ogI78JDyDXEKQ2qzPtFSisVz3p61P5QBiSJDerdvvULNwfHgZe4Y8ydPV7GqIMRhYayalFJCaL2G4q2yf6pg= X-Gm-Message-State: AOJu0Yx7bl6uiPUfdE/mhRzVXVdjB/cAZkqw1iy7o8rA7TIq/arvYcuI 02zrdaKcG8DzYhSb3WCZdrpiEgXrAUkLEj4JaIAaThqiSZb3Qg95y5Xm8iZH6vU= X-Google-Smtp-Source: AGHT+IFuzlTSw7d9MNyYAga3U32REepEaANOkKDahwecQ2OxWx4SqDhJ3yJ/ZuuKFSJZoYn+mZq2NA== X-Received: by 2002:a05:6214:2527:b0:6b5:6a1:f89a with SMTP id 6a1803df08f44-6bb55977e17mr105020936d6.2.1722310774757; Mon, 29 Jul 2024 20:39:34 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bb3f8d81d7sm59017836d6.1.2024.07.29.20.39.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jul 2024 20:39:34 -0700 (PDT) Date: Mon, 29 Jul 2024 23:39:29 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, flintglass@gmail.com Subject: Re: [PATCH 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker Message-ID: <20240730033929.GB2866591@cmpxchg.org> References: <20240725232813.2260665-1-nphamcs@gmail.com> <20240725232813.2260665-2-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: EF112C0024 X-Stat-Signature: 93t5kyednee8a9rmwrt9qkch5wzi86uy X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1722310775-300682 X-HE-Meta: U2FsdGVkX1/0L5y4EEhvJyOV6PX1rwz8rJNQhxxFhgCRITU8vmYZY2mYTIMpxXsNC9PhcTfr+2Of5dMmW/nB+ySONSBaatPm9+w2f6/lFFOy3N+rVXaCtsXaxIzoyMFlK0bd+X/AzFPEfPahzqycw3qAaKXv41DlyWwChJftEnMSOcrnIm/xXEde/X81UiF6UmrXMz7GBpqeVIjAE0Q5fs8GsRJuO3M6pYbpxxefeYIaXS6BAtMRyAVZgkd/c4OTWDieTvCGCToDRywO03eh5+TL3RGPg1FdUZm1owiHjIiXCXvASXvph6eshyiBxgUOhs8F/HXtQmyjeC1xjYpstCl3+H2zv3R3A5lW7ex/qXbklVBez8ZKELQOt4WC1VHYCB68loWOzMmBSZJ0M6CSRCGKPtH5Uvt1kHTnRNP+nYthNP5jH2/kXs69cIHGMkB83uwDlyGQlgzxMQ7G5FbSzzCnUPrN0IfMIVP1mJ4F/CN0m0gxgopXn+OyWScvRiqAFHB1VrceYrE+hPHwQlBzcrZTgTuzu1ZM5q+70rJqFOOm8NKlBmH/MhZ1uHrc9l1MRV2Lh+X+S7V3twrg69CdgWx5PhzjHDIzO9dqG644xosjAXhOVHFspf/VDun29blF9Lj1AvrReX8G/Lrcx2jGPU9wT5+xV7HbT1BQoEstwaRJduJPK1WsrERf+79cHscOqKeUY+iOe44v1R3iXMFJuKDd5HxFuaDJEoKWeFZdSfcswtHDBBNezqY3q64rSSIQv/XNOuDjq+feBV5X7fM0npva/Gyj7KeoeEQHhoPTwCGeK+8PrG0nddWKdEoQGEh5SOn5tBEOMqHQpnUMpn1gggLAUaIQz6BL3YGbagIYrCrBDTeu5Tt1tSmRlh746sxLfWk9dI3IBV8U6fiGT6OR4maX4DiBP1K/z6afAddwCdGoR9X4dFm571IGMTPyEmF0Jgi2LPihlRWdQ4YZB82 7x1rojgO se8HWnEX0IOxP4s05fwRKT73RF+xm8yhGlnRQUZXicTbX7MmzrJ0sald38pXIeQPFScoWL4OkCRroNRrVCgLzS8o6VHZeLpILDyTNF4ufIQOm+gzdBROSjf+39fM4w3l/zJWki6wRgVSrQdoTIVcXeoVPpnrQiWbEguR9EUvVyFHiQ3M4QFh2sJiDWdhXGyecyHtkaNKgq07nr+qB3PkRhkpws4RAtjoDlZwbTS+qOqAXQBOOJLRiuS29jkB53FqwdV4mjosRmU7Ugh6v5DGWQgOIB+oOZImptbraDTXP7Yw0hs6KpoGZ2BZkZHffw5W7tGrQAVvJVavys5m1xV2ccQgWxxA/TSXfJZItQd7s52JUJ3O1gZCoTHXrk026p4cG3lf9G+bKWdlf615MAnqtZzeNQnJHPuDbaPM3ZUWRDCmZnqyf6s30DxafNj+oEL5E5ntKNNQvvoiKLAzlXujW1ZLUMINISFCGW9UD X-Bogosity: Ham, tests=bogofilter, spamicity=0.002911, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 26, 2024 at 02:58:14PM -0700, Yosry Ahmed wrote: > On Thu, Jul 25, 2024 at 4:28 PM Nhat Pham wrote: > > > > Current zswap shrinker's heursitics to prevent overshrinking is brittle > > and inaccurate, specifically in the way we decay the protection size > > (i.e making pages in the zswap LRU eligible for reclaim). > > Thanks for working on this and experimenting with different > heuristics. I was not a huge fan of these, so I am glad we are trying > to replace them with something more intuitive. > > > > > We currently decay protection aggressively in zswap_lru_add() calls. > > This leads to the following unfortunate effect: when a new batch of > > pages enter zswap, the protection size rapidly decays to below 25% of > > the zswap LRU size, which is way too low. > > > > We have observed this effect in production, when experimenting with the > > zswap shrinker: the rate of shrinking shoots up massively right after a > > new batch of zswap stores. This is somewhat the opposite of what we want > > originally - when new pages enter zswap, we want to protect both these > > new pages AND the pages that are already protected in the zswap LRU. > > > > Replace existing heuristics with a second chance algorithm > > > > 1. When a new zswap entry is stored in the zswap pool, its reference bit > > is set. > > 2. When the zswap shrinker encounters a zswap entry with the reference > > bit set, give it a second chance - only flips the reference bit and > > rotate it in the LRU. > > 3. If the shrinker encounters the entry again, this time with its > > reference bit unset, then it can reclaim the entry. > > At the first look, this is similar to the reclaim algorithm. A > fundamental difference here is that the reference bit is only set > once, when the entry is created. It is different from the conventional > second chance page reclaim/replacement algorithm. > > What this really does, is that it slows down writeback by enforcing > that we need to iterate entries exactly twice before we write them > back. This sounds a little arbitrary and not very intuitive to me. This isn't different than other second chance algorithms. Those usually set the reference bit again to buy the entry another round. In our case, another reference causes a zswapin, which removes the entry from the list - buying it another round. Entries will get reclaimed once the scan rate catches up with the longest reuse distance. The main goal, which was also the goal of the protection math, is to slow down writebacks in proportion to new entries showing up. This gives zswap a chance to solve memory pressure through compression. If memory pressure persists, writeback should pick up. If no new entries were to show up, then sure, this would be busy work. In practice, new entries do show up at a varying rate. This all happens in parallel to anon reclaim, after all. The key here is that new entries will be interleaved with rotated entries, and they consume scan work! This is what results in the proportional slowdown. > Taking a step back, what we really want is to writeback zswap entries > in order, and avoid writing back more entries than needed. I think the > key here is "when needed", which is defined by how much memory > pressure we have. The shrinker framework should already be taking this > into account. > > Looking at do_shrink_slab(), in the case of zswap (seek = 2), > total_scan should boil down to: > > total_scan = (zswap_shrinker_count() * 2 + nr_deferred) >> priority > > , and this is bounded by zswap_shrinker_count() * 2. > > Ignoring nr_deferred, we start by scanning 1/2048th of > zswap_shrinker_count() at DEF_PRIORITY, then we work our way to 2 * > zswap_shrinker_count() at zero priority (before OOMs). At > NODE_RECLAIM_PRIORITY, we start at 1/8th of zswap_shrinker_count(). > > Keep in mind that zswap_shrinker_count() does not return the number of > all zswap entries, it subtracts the protected part (or recent swapins) > and scales by the compression ratio. So this looks reasonable at first > sight, perhaps we want to tune the seek to slow down writeback if we > think it's too much, but that doesn't explain the scenario you are > describing. > > Now let's factor in nr_deferred, which looks to me like it could be > the culprit here. I am assuming the intention is that if we counted > freeable slab objects before but didn't get to free them, we should do > it the next time around. This feels like it assumes that the objects > will remain there unless reclaimed by the shrinker. This does not > apply for zswap, because the objects can be swapped in. Hm. _count() returns (objects - protected) * compression_rate, then the shrinker does the >> priority dance. So to_scan is expected to be a small portion of unprotected objects. _scan() bails if to_scan > (objects - protected). How often does this actually abort in practice? > Also, in the beginning, before we encounter too many swapins, the > protection will be very low, so zswap_shrinker_count() will return a > relatively high value. Even if we don't scan and writeback this > amount, we will keep carrying this value forward in next reclaim > operations, even if the number of existing zswap entries have > decreased due to swapins. > > Could this be the problem? The number of deferred objects to be > scanned just keeps going forward as a high value, essentially > rendering the heuristics in zswap_shrinker_count() useless? > > If we just need to slow down writeback by making sure we scan entries > twice, could something similar be achieved just by tuning the seek > without needing any heuristics to begin with? Seek is a fixed coefficient for the scan rate. We want to slow writeback when recent zswapouts dominate the zswap pool (expanding or thrashing), and speed it up when recent entries make up a small share of the pool (stagnating). This is what the second chance accomplishes.