From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99B48C3DA61 for ; Mon, 29 Jul 2024 18:24:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 251216B0095; Mon, 29 Jul 2024 14:24:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 200AD6B0096; Mon, 29 Jul 2024 14:24:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C8F06B0098; Mon, 29 Jul 2024 14:24:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3FC36B0095 for ; Mon, 29 Jul 2024 14:24:50 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6948F1C1940 for ; Mon, 29 Jul 2024 18:24:50 +0000 (UTC) X-FDA: 82393616340.27.C46AA23 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf05.hostedemail.com (Postfix) with ESMTP id 8CDC2100010 for ; Mon, 29 Jul 2024 18:24:48 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CB33pPtD; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722277446; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ruasPNPK8KVxSRl2ha7YpZazMWY1eezny9YoCe+Cc3U=; b=wQULExwiaPVTmxbLKocwX186lq1LhuAHc16pbrpCu1xIkY03927lejzrbOd1KLGvMbRsvA AqNk0Mdt0Vpw4QVSydEBJVodBjX0AZsn3lp95ilc8X3+Ywe3RUDwb/p4ktLfaRCWIsa//y krHEbDnpGifDqIfHLVVi4/D30odtiuE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CB33pPtD; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722277446; a=rsa-sha256; cv=none; b=30l/l7Buq5+UYIELdZ/BvfpOsuzg770uTfmAy3oeNyppLc90QN11DGBMt80xglE6rWPZse GfQz4gAT84+99COd53vzdC4ADRbchc4pMpBtA4gCfUAWvF5W3iKxInmvnLgCSUp7N90zKG HohYU1Wopu3ntaBhLDKOqrrTvoDjU0o= Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-52efbb55d24so5549200e87.1 for ; Mon, 29 Jul 2024 11:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722277486; x=1722882286; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ruasPNPK8KVxSRl2ha7YpZazMWY1eezny9YoCe+Cc3U=; b=CB33pPtDg6whstXFzhLD7PhIZ/0PYnzkrERv/CvkgmzO8/iWvHeKH+jQSwC8eE/ISJ u7bh6lDhwXqEK1u9X4Vc5zsUWBkwFOHblj5rX34KawipWRY2G43yd3A+JhI6BqmVaI5G qagfFF+aDOANgaJHUSMW/u/PjZ9Mb1NXEUSK8AybF/akaO0wCASo8Wh2hs+CDCCxpYLn MLkS625kMamH9VyZdCPtM0BG7cZl1ayfIHlV4Eq05kvhyOX/SDElP2NdRCg6uEOVhngK iv/gaA9lMyzhht8097Oi1UBCzjk1Ojj9vKlnzlkRCvq6dSFRXp1peFffCl/wxoKe26Dq E/yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722277486; x=1722882286; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ruasPNPK8KVxSRl2ha7YpZazMWY1eezny9YoCe+Cc3U=; b=UnrLYDA5SrM+3pvOKjVJ2ijQTT9dsF5FlFUYqfCm8VfGf6WGEDtBVr5lvDaMxBTsgn P/TarNGsBWS0075hcWoPQTXMzSxZJpJlH5xqpWvrDH5P3teDyDNG2t+joGfuLfjtayG6 uB/dmvHO9G/pVinlpu0LbiFBMJhWabJs5xMtYGFMjueHWXa+qKYQuJ2rkrua1NORQ2NT B7w4T8N6WitPCpi1k07pE/PqMAc2ECoKP5Xekq6bLse+TpQO+xrFl0MOuv9ABt6K9z1n 6HEDfW+ea3nvkB6WSojXAn3RBPndATTXKcV6Y0ZzdWwLZsowbCtToMJgV7KopmhZeG+/ l3QQ== X-Forwarded-Encrypted: i=1; AJvYcCXeCkwA8RWR+tZxxwln8j4J7S2Hbkk/BPrJMDgBlSM5FzUCgFOkgL0LdTIwLOm+NJG6zTMQcakQBr29ik5jbrI4F0A= X-Gm-Message-State: AOJu0YzLNeCXLmlkQtrkwLdOUZeh1nEY4Uug0vb0hYPbliKbgC5JGz5P rBVKtqC1uWJ0ku+xUgdhgyMl1iVPGKu3uE5Rl2skRCWztJnInYucP/qdwzAo/YeEH2iy4URcPvM VgDXCIuw+jOG+dBJ6PbD8nSkUGgzlJ3YaSrs+ X-Google-Smtp-Source: AGHT+IEPwV28ddsiGG5nMpSWP/Lv2Q5KF4eM3YF1a6FKjfIDQvczLFRKGlmwbqc/3mLBIKOssKU+bJdDRX8eS4JyYms= X-Received: by 2002:a05:6512:32cb:b0:52e:9f7a:6e6 with SMTP id 2adb3069b0e04-5309b2caa3bmr6952185e87.41.1722277485916; Mon, 29 Jul 2024 11:24:45 -0700 (PDT) MIME-Version: 1.0 References: <20240727230635.3170-1-flintglass@gmail.com> <20240727230635.3170-2-flintglass@gmail.com> In-Reply-To: <20240727230635.3170-2-flintglass@gmail.com> From: Yosry Ahmed Date: Mon, 29 Jul 2024 11:24:08 -0700 Message-ID: Subject: Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zi6izbxb8w6i13o93oj97urkuikup9r7 X-Rspam-User: X-Rspamd-Queue-Id: 8CDC2100010 X-Rspamd-Server: rspam02 X-HE-Tag: 1722277488-310586 X-HE-Meta: U2FsdGVkX18qjRLfcmNmLFDUr7cvmc9hv5C2yTw6UiuwAFODGHf7SyS4xq9YrSNdIzdNPlQJ4SEYjHqdQKxufjg9m5ZumVYSFEvV0zMygvaUdDFbP0ktPKMRDK4d31H2JYV8IjSac15t3wAYaBvWqWtEEg/N1fI3wdDkhBiVfSUTNEi8NwkW4gDAaKUIGKO93Hqrj0BnSBiEx7IsobzQKdhgeuM/3SrEsp/NQ1WJM3GHWbfBLZ8zYIa3SfZQ+zJvFWzwIHgdn6W8fiILCD9iifFH3nUdiPZoEm2iDifhdvC1AqdIVdWn3u1/QTJtHgLpGav3SIqgvbWa9S0jQsq1VpTDIikQMIKrStRscjdfC3JZDU1RzsfEVpAFWjLB3CsnCXCM3GxAIE1G9OajG8jZjq6KjGdIBgFym7CLrq11s+ro7CkI/5zGnODueY1Z05r2NQig4B4jd99yFUEWKkQwaC2SfeQbB7/pZB4SEytAgjOOlwRwW+cMhVz0X6jopt6KXasUCCqRKbpFn+sQld3BDe8QE6d0dgDTZKvsr4v/EjovJH08LWyT1Vv1BmKTKffHIQIqVlEwGo8qrk4nyrkwTbmxFhL5YHWFnz771cj120VDL2YBASWDAk3CU4Dek9iDld3M/KWl7iGXeujud9jyArmo+KRdSWMjCZCCngjXqMR8i/lgL4FHRfEfS8szAu4NPQUwkmtwQ7HXWjoZvRW+P04z4aoqaqlO0xvMzMGe7HcM/xCIJfkEWTNwxxO45iPJQBFErKMghYe/8OBn8MC4isHmM1tE8laRl3l2930btHf6wm0VVhSEMMIMQaQGkxn3IxmDpBHcbT1fUzpHxVJVuC3dkZMxC1BQUM16zuw4gG1oXBtc6ktTMuW5jIj0ayJhxB5zNY5VstkAb0/XFl9fHSv2tYHcAbLSglb6RBhOxJXxm5WEDr79RInnJJh/pAGLhv9Srp6P9GcFRXs9gt1 D6Eg3e4m +zSwhyv2AHxLv0x9J6AqSI0WiyXQstKvr3MPXCOyJExKE4zIvle3zphNiOebK1Oy4Y7SLxGjB4SZ1AYF6N4iWCOCihJRGZ8s8swAOZKLEJo6+3XjKFuHxkMjSSGJL1BzLSIWYtdji0Jb6P4ilI67CpQNszXRRQSK0Hx6pV/F/vlB12Bio8EyjrGh3vpfmyIR0BCiD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Jul 27, 2024 at 4:06=E2=80=AFPM Takero Funaki wrote: > > This patch fixes an issue where the zswap global shrinker stopped > iterating through the memcg tree. > > The problem was that shrink_worker() would restart iterating memcg tree > from the tree root, considering an offline memcg as a failure, and abort > shrinking after encountering the same offline memcg 16 times even if > there is only one offline memcg. After this change, an offline memcg in > the tree is no longer considered a failure. This allows the shrinker to > continue shrinking the other online memcgs regardless of whether an > offline memcg exists, gives higher zswap writeback activity. > > To avoid holding refcount of offline memcg encountered during the memcg > tree walking, shrink_worker() must continue iterating to release the > offline memcg to ensure the next memcg stored in the cursor is online. > > The offline memcg cleaner has also been changed to avoid the same issue. > When the next memcg of the offlined memcg is also offline, the refcount > stored in the iteration cursor was held until the next shrink_worker() > run. The cleaner must release the offline memcg recursively. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 49 insertions(+), 24 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index adeaf9c97fde..e9b5343256cd 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio) > } > } > > +/* > + * This function should be called when a memcg is being offlined. > + * > + * Since the global shrinker shrink_worker() may hold a reference > + * of the memcg, we must check and release the reference in > + * zswap_next_shrink. > + * > + * shrink_worker() must handle the case where this function releases > + * the reference of memcg being shrunk. > + */ > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > { > /* lock out zswap shrinker walking memcg tree */ > spin_lock(&zswap_shrink_lock); > - if (zswap_next_shrink =3D=3D memcg) > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > + if (zswap_next_shrink =3D=3D memcg) { > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap= _next_shrink, NULL); > + } while (zswap_next_shrink && !mem_cgroup_online(zswap_ne= xt_shrink)); > + /* > + * We verified the next memcg is online. Even if the nex= t > + * memcg is being offlined here, another cleaner must be > + * waiting for our lock. We can leave the online memcg > + * reference. > + */ I thought we agreed to drop this comment :) > + } > spin_unlock(&zswap_shrink_lock); > } > > @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w) > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > > - /* global reclaim will select cgroup in a round-robin fashion. */ > + /* global reclaim will select cgroup in a round-robin fashion. nit: s/global/Global > + * > + * We save iteration cursor memcg into zswap_next_shrink, > + * which can be modified by the offline memcg cleaner > + * zswap_memcg_offline_cleanup(). > + * > + * Since the offline cleaner is called only once, we cannot leave= an > + * offline memcg reference in zswap_next_shrink. > + * We can rely on the cleaner only if we get online memcg under l= ock. > + * > + * If we get an offline memcg, we cannot determine if the cleaner= has > + * already been called or will be called later. We must put back = the > + * reference before returning from this function. Otherwise, the > + * offline memcg left in zswap_next_shrink will hold the referenc= e > + * until the next run of shrink_worker(). > + */ > do { > spin_lock(&zswap_shrink_lock); > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > - memcg =3D zswap_next_shrink; > > /* > - * We need to retry if we have gone through a full round = trip, or if we > - * got an offline memcg (or else we risk undoing the effe= ct of the > - * zswap memcg offlining cleanup callback). This is not c= atastrophic > - * per se, but it will keep the now offlined memcg hostag= e for a while. > - * > + * Start shrinking from the next memcg after zswap_next_s= hrink. > + * When the offline cleaner has already advanced the curs= or, > + * advancing the cursor here overlooks one memcg, but thi= s > + * should be negligibly rare. > + */ > + do { > + memcg =3D mem_cgroup_iter(NULL, zswap_next_shrink= , NULL); > + zswap_next_shrink =3D memcg; > + } while (memcg && !mem_cgroup_tryget_online(memcg)); Let's move spin_lock() and spin_unlock() to be right above and before the do-while loop, similar to zswap_memcg_offline_cleanup(). This should make it more obvious what the lock is protecting. Actually, maybe it would be cleaner at this point to move the iteration to find the next online memcg under lock into a helper, and use it here and in zswap_memcg_offline_cleanup(). zswap_shrink_lock and zswap_next_shrink can be made static to this helper and maybe some of the comments could live there instead. Something like zswap_next_shrink_memcg(). This will abstract this whole iteration logic and make shrink_worker() significantly easier to follow. WDYT? I can do that in a followup cleanup patch if you prefer this as well. > + /* > * Note that if we got an online memcg, we will keep the = extra > * reference in case the original reference obtained by m= em_cgroup_iter > * is dropped by the zswap memcg offlining callback, ensu= ring that the > * memcg is not killed when we are reclaiming. > */ > - if (!memcg) { > - spin_unlock(&zswap_shrink_lock); > - if (++failures =3D=3D MAX_RECLAIM_RETRIES) > - break; > - > - goto resched; > - } > - > - if (!mem_cgroup_tryget_online(memcg)) { > - /* drop the reference from mem_cgroup_iter() */ > - mem_cgroup_iter_break(NULL, memcg); > - zswap_next_shrink =3D NULL; > - spin_unlock(&zswap_shrink_lock); > + spin_unlock(&zswap_shrink_lock); > > + if (!memcg) { > if (++failures =3D=3D MAX_RECLAIM_RETRIES) > break; > > goto resched; > } > - spin_unlock(&zswap_shrink_lock); > > ret =3D shrink_memcg(memcg); > /* drop the extra reference */ > -- > 2.43.0 >