From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32C91C3DA41 for ; Mon, 8 Jul 2024 19:17:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B811E6B0099; Mon, 8 Jul 2024 15:17:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B31A06B009A; Mon, 8 Jul 2024 15:17:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D15D6B009C; Mon, 8 Jul 2024 15:17:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7E8D46B0099 for ; Mon, 8 Jul 2024 15:17:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C2F69C02BB for ; Mon, 8 Jul 2024 19:17:26 +0000 (UTC) X-FDA: 82317544092.12.C53611A Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf03.hostedemail.com (Postfix) with ESMTP id 002D120003 for ; Mon, 8 Jul 2024 19:17:24 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DhiHC0iG; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720466205; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LoMHwWvMZITG0MKYsYImhjMa8QW8VMCVVEOqRiLhyAE=; b=tw9+Eka/XMHXllPFISfGOI5H2dgwkjrePADzXPuqmrs83Q119k6GweWr6D3LspO1CemamS BRgn36CAdgRBq7eMDd4mts2tw59l3pw0vx8/jSPIs4Mwbnnm2eCGADH64rs+Ffx1FrS6NW Xs+F/da8wKdrY+BsYpBZs6xLJjubD9Q= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DhiHC0iG; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720466205; a=rsa-sha256; cv=none; b=rz9xUsmjaqLIgakqfSFtu1M74xz1aQS96kVQrHYWboM3GJ+ecJwBZ5iOWxDuiAtectKu5h 44BjjP/KHtVbgHmu0l/9ZqhVmtIFjHqvkcrcbusEKwxZjsbj/LnJcbX1UPEIRJQVSfI1xs HwvbqsDroI5yaLDmxuAvIF77EmMxllk= Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6b5def3916bso22560766d6.3 for ; Mon, 08 Jul 2024 12:17:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720466244; x=1721071044; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LoMHwWvMZITG0MKYsYImhjMa8QW8VMCVVEOqRiLhyAE=; b=DhiHC0iGmbk3oUeMXbFEU49MmN7fpAMTPSiUUhe7L3R9BAMWXmUkNbMwmkDrY4Vyni 0V0amGGBiV7EZBNNAruV28RAcd9vHOFd5L8JAgazR0c4/kcGuVu0AINzdhpGZb7jMADZ tXcldWnqJQfQxH4qruG3IDCu/BhdvOEDO7ekckCzJkjjFME/Yy7hFf2ARw+UM5Bu7BUh Up5xOBCuVXJDL1aEUUeTitSIDZNCUFl6TB7svm+CncOn5GAhp+SGZrQMTypMt3s8HvtX yhHvF/EOZlcHSZmiqjpda7ijqOVubNltVyofnJydW9YML9hr4rxaWA8Db/XfqnfFw5aU kAAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720466244; x=1721071044; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LoMHwWvMZITG0MKYsYImhjMa8QW8VMCVVEOqRiLhyAE=; b=G5e49jXvD80K78Yuq4gmEWaWklIFgYCU6kuYCGs1EA43cN7I8uv8eAhL2XfJiNTg/y dqu4qlJWj6nuAh6wKXi0sc9XnvnFfN2LHjicF/3GY2EEsxjt/6LFxNqF19V7hYgKtJ6E VQfyjbSwyq9bNPIlZJE+7vePnY5//shbE1cwmvNR0GwMjTTE6hGEhMXnXaTP2+HCSmJq wZNhIlmN8dj0BDmJC6wzWLvhrYoE6uZW2i3aZ5+pIkGuq0cDJYysNK3OVfMpxMqnTk9k GJgJbB9yDG/3xKrGFrVjJzpMQBqMFKwimT/SnByPQQjKpntNbXLqMyx6C/VcKXkMdDzl 0E5Q== X-Forwarded-Encrypted: i=1; AJvYcCWxq3HDnnFjfu2pGjxcYnp1k68SF9dGMJE4adCEFBN4lxvBY7omLnf7qvAXMUkqv7AET6CNaHwhgwVjbyPJ0904Fjc= X-Gm-Message-State: AOJu0YzScfn/xGz5NmeBgWXYQpBEzZJxBwEJLoDHlhrpH7YDggRvJtNf Irpiwa1JE7ffJMfnOXqk7VC/xrXYEmLtZAtL3zx9bEQDBLGz6CVrp+Xm0Jh2HJ6i9IEA7nHmrWj 1M+YcUz29RvnS1cHj4ktUrMMGkUU= X-Google-Smtp-Source: AGHT+IHLnrS5ones95lU+A08RP/q9bJ3HfGVZ7UcfmdzyCJdzEaMdABPZHSXmlD8dlGsauWpvoWXzWX85VWrrG+uDmM= X-Received: by 2002:a05:6214:acf:b0:6b5:db70:b980 with SMTP id 6a1803df08f44-6b61bc83bfcmr7678756d6.4.1720466243938; Mon, 08 Jul 2024 12:17:23 -0700 (PDT) MIME-Version: 1.0 References: <20240706022523.1104080-1-flintglass@gmail.com> <20240706022523.1104080-7-flintglass@gmail.com> In-Reply-To: <20240706022523.1104080-7-flintglass@gmail.com> From: Nhat Pham Date: Mon, 8 Jul 2024 12:17:12 -0700 Message-ID: Subject: Re: [PATCH v2 6/6] mm: zswap: interrupt shrinker writeback while pagein/out IO To: Takero Funaki Cc: Johannes Weiner , Yosry Ahmed , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 002D120003 X-Stat-Signature: wf1zjckgaxgdh37qznorjxjdhkgs9qic X-Rspam-User: X-HE-Tag: 1720466244-551197 X-HE-Meta: U2FsdGVkX18rKYzGji2+1yGVly4W0+hLFrGvK1L/C/6t7jaBdytbz5mbh5F5fcic8AvCod/oYKwDF8qj83O0pFT+0+Wt8S96TnJMvJIrZn1tSUqIXoEXwhPUy5Ul5wQWCHhjrX9f24UsbOiFOgCE0W2tOzEoXgiDcd7999JbtJ33OdHkVdMAVBHxyyvW+zGdxes2SXnoGzy01LgfsyODvzBnlVpWJ73N/4hnPdHTbAdKxFzkTvy1nIBwDQDFidPykwgIhEmY8PgjoI0nPzsjst+dMqNQ+tofGm68H6TgOA6hpavBhjfNqk7lzs3Bg4w7bCuwU50FgFy3FFAljckzEAvp70wEZ3fX9cw0NpWyhqm2W2Fij8eEvjajpJBgr5HlYophS5QlG2xfGJph6OLKvI7kxenYRUwX9gv3XeZheEwXrQx5RkVH7GKdc235nzej7CVV+bjYJuylP2fa24X8zr7f1wY5fzvF28lZXGX/vVHQtLPgEvGBN+7E06OdILJPCQoxYuvrMLHe/tKth89SpCRmCZ7IrHVF4rDytJZtdHfVnrizqWZDZ+Zw9eUMokmUb2gOckFsc6l4tXqBZ+C/Dtz5UtJRXr/kBtZKQ4XhohUM6UheOAWZbp+4nrhcup0Ny+lHnkJxxfuq/a5MHV9B5lqT84QhvfA9ETUlgC/sAywOZ/vcx+bZUKtR/++iRmL7PxbqvOlqOmDOzrqAxnOD0eyuEFGYoFzgzDU4MlebpL9kLzPK8xBr3dI2DSPloMF809lAnVGSCXhUzobPEtczc0xrLmaT1H/PztdLJCW7x3C1Qs+wDbSR1ReA8BKnmtPLiz6NAAW9i6m9jlEh5BAnbA9WazfmjdCMpdV+v3srfIUCpLjYK1XcgZFK8x6RmTNSb264lTfao+/weOI6yuYXZvbJVGiVsFGR5jqM+0ghDRj5ymDgcpFo9d/AE9RbE4G7XTrWfoQAo7uzGYMTGp6 mNqXjs6A CfNwXm6STNOh2uPJVFpmmnYz6UHqfEgibL3Q+l6kXDlVp2+UmafWZmbDrZpUuDKYucgWwAih5bt1HtKLg9bWU/wpplXfbzHtlLGtmRFx5My61A/vxZZ6nMYKSSsrF3jK6vm8Mr/hlDeT9m+dz/VkJ1x0+p6L4o53yet2zJZGCmfT3DvtymL2vvkc6QJlJsUX2xLloIwi3aNhuAZxb2JxrAnP2aAd93vmvfyl9gIlQ68KRtnie4jfhWouUTWCP19MJn+/Y9eC3oywVSJffEGRqZPZkCE/A2swaCjHLaVV80rxMVDNTNeZJBNohvjkAmi7ib5bs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 5, 2024 at 7:25=E2=80=AFPM Takero Funaki = wrote: > > To prevent the zswap global shrinker from writing back pages > simultaneously with IO performed for memory reclaim and faults, delay > the writeback when zswap_store() rejects pages or zswap_load() cannot > find entry in pool. > > When the zswap shrinker is running and zswap rejects an incoming page, > simulatenous zswap writeback and the rejected page lead to IO contention > on swap device. In this case, the writeback of the rejected page must be > higher priority as it is necessary for actual memory reclaim progress. > The zswap global shrinker can run in the background and should not > interfere with memory reclaim. Do you see this problem actually manifesting in real life? Does not sound infeasible to me, but I wonder how likely this is the case. Do you have any userspace-visible metrics, or any tracing logs etc. that proves that it actually happens? This might also affect the dynamic shrinker as well FWIW. > > The same logic applies to zswap_load(). When zswap cannot find requested > page from pool and read IO is performed, shrinker should be interrupted. > > To avoid IO contention, save the timestamp jiffies when zswap cannot > buffer the pagein/out IO and interrupt the global shrinker. The shrinker > resumes the writeback in 500 msec since the saved timestamp. > > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 45 insertions(+), 2 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index def0f948a4ab..59ba4663c74f 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -35,6 +35,8 @@ > #include > #include > #include > +#include > +#include > > #include "swap.h" > #include "internal.h" > @@ -176,6 +178,14 @@ static bool zswap_next_shrink_changed; > static struct work_struct zswap_shrink_work; > static struct shrinker *zswap_shrinker; > > +/* > + * To avoid IO contention between pagein/out and global shrinker writeba= ck, > + * track the last jiffies of pagein/out and delay the writeback. > + * Default to 500msec in alignment with mq-deadline read timeout. If there is a future version, could you include the reason why you select 500msec in the patch's changelog as well? > + */ > +#define ZSWAP_GLOBAL_SHRINKER_DELAY_MS 500 > +static unsigned long zswap_shrinker_delay_start; > + > /* > * struct zswap_entry > * > @@ -244,6 +254,14 @@ static inline struct xarray *swap_zswap_tree(swp_ent= ry_t swp) > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpools[0])) > > +static inline void zswap_shrinker_delay_update(void) > +{ > + unsigned long now =3D jiffies; > + > + if (now !=3D zswap_shrinker_delay_start) > + zswap_shrinker_delay_start =3D now; > +} Hmmm is there a reason why we do not just do: zswap_shrinker_delay_start =3D jiffies; or, more unnecessarily: unsigned long now =3D jiffies; zswap_shrinker_delay_start =3D now; IOW, why the branching here? Does not seem necessary to me, but perhaps there is a correctness/compiler reason I'm not seeing? In fact, if it's the first version, then we could manually inline it. Additionally/alternatively, I wonder if it is more convenient to do it at the mm/page_io.c zswap callsites, i.e whenever zswap_store() and zswap_load() returns false, then delay the shrinker before proceeding with the IO steps. > + > /********************************* > * pool functions > **********************************/ > @@ -1378,6 +1396,8 @@ static void shrink_worker(struct work_struct *w) > struct mem_cgroup *memcg; > int ret, failures =3D 0, progress; > unsigned long thr; > + unsigned long now, sleepuntil; > + const unsigned long delay =3D msecs_to_jiffies(ZSWAP_GLOBAL_SHRIN= KER_DELAY_MS); > > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > @@ -1405,6 +1425,21 @@ static void shrink_worker(struct work_struct *w) > * until the next run of shrink_worker(). > */ > do { > + /* > + * delay shrinking to allow the last rejected page comple= tes > + * its writeback > + */ > + sleepuntil =3D delay + READ_ONCE(zswap_shrinker_delay_sta= rt); I assume we do not care about racy access here right? Same goes for updates - I don't see any locks protecting these operations (but I could be missing something). > + now =3D jiffies; > + /* > + * If zswap did not reject pages for long, sleepuntil-now= may > + * underflow. We assume the timestamp is valid only if > + * now < sleepuntil < now + delay + 1 > + */ > + if (time_before(now, sleepuntil) && > + time_before(sleepuntil, now + delay + 1)) > + fsleep(jiffies_to_usecs(sleepuntil - now)); > + > spin_lock(&zswap_shrink_lock); > > /* > @@ -1526,8 +1561,10 @@ bool zswap_store(struct folio *folio) > VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > > /* Large folios aren't supported */ > - if (folio_test_large(folio)) > + if (folio_test_large(folio)) { > + zswap_shrinker_delay_update(); > return false; > + } > > if (!zswap_enabled) > goto check_old; > @@ -1648,6 +1685,8 @@ bool zswap_store(struct folio *folio) > zswap_entry_cache_free(entry); > reject: > obj_cgroup_put(objcg); > + zswap_shrinker_delay_update(); > + > if (need_global_shrink) > queue_work(shrink_wq, &zswap_shrink_work); > check_old: > @@ -1691,8 +1730,10 @@ bool zswap_load(struct folio *folio) > else > entry =3D xa_load(tree, offset); > > - if (!entry) > + if (!entry) { > + zswap_shrinker_delay_update(); > return false; > + } > > if (entry->length) > zswap_decompress(entry, page); > @@ -1835,6 +1876,8 @@ static int zswap_setup(void) > if (ret) > goto hp_fail; > > + zswap_shrinker_delay_update(); > + > shrink_wq =3D alloc_workqueue("zswap-shrink", > WQ_UNBOUND, 1); > if (!shrink_wq) > -- > 2.43.0 >