From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAFD3C3DA61 for ; Tue, 30 Jul 2024 05:25:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49FA06B007B; Tue, 30 Jul 2024 01:25:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 428F36B0083; Tue, 30 Jul 2024 01:25:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A1DC6B0085; Tue, 30 Jul 2024 01:25:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 097126B007B for ; Tue, 30 Jul 2024 01:25:14 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 69463C0250 for ; Tue, 30 Jul 2024 05:25:13 +0000 (UTC) X-FDA: 82395280506.19.F7235A9 Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by imf07.hostedemail.com (Postfix) with ESMTP id A536240016 for ; Tue, 30 Jul 2024 05:25:11 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z+4boXPI; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722317107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cyiDB8MsTuRIBGPC4XFhUVRa3szv8WxrYdF7v0WYTQQ=; b=XsJayQFwN3djRAlM0fZOIixT4hgB3JnQTJTEBjqN4DpYcIIshjgZeYyIIVPoKij1os63wd x1WAKhKgPUvcCjZbUsYSWdPQz0PVOCOH887KIcBs+Y2jHWExW6466byV+36WkleEP7XeWg ACtXennZqyN3cgOQsoDUaW/y41MKdLk= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z+4boXPI; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722317107; a=rsa-sha256; cv=none; b=srnxrnkEFRRBuQJvP/O8QecXfuOhjjb4Ugtd5TPU0j1tQ9fil7FsMSrSYsDhpOd8XTVhqR wkZ9UdjvSFWS7YfCiS1CRnpVstsMunow0mFgO1MCVedUzrcVzmjiP/tA52PyD6O2j5vLmR v1Kobo77BNMV+YhAY1XQGEHO+uApN7E= Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-e0885b4f1d5so2731539276.1 for ; Mon, 29 Jul 2024 22:25:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722317111; x=1722921911; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cyiDB8MsTuRIBGPC4XFhUVRa3szv8WxrYdF7v0WYTQQ=; b=Z+4boXPIgZAIaB65l5r7qeI2BFwqkwl0cD5fyU1glOcUTyRDd1FZdZ940ExugvObIn Iv9YKtHcHVOrfjTfq+7uOm9uaIpXWaw8sfF/jPwjte6GWKzM6yy8/cLz/4HUkIK9OK2j 5RYZU2XL5zeCfaMHYVx31NHAaplfJPt1RFlMdFQVCG3QD3SSY/Yz3QYfdInt3cQXXEKV xCsqivUp8pbyhConGWo/Z8RdoHMQmkYWy8gG4TQACByOiEmj/tuzBefZMShNRwsH3l1r GwBTul7TXz2LACo7+DicLoAMOAU27vVt27IFEJn0uuR8MrWGzCnM11MMRZRlyEAcBo1Q 5k8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722317111; x=1722921911; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cyiDB8MsTuRIBGPC4XFhUVRa3szv8WxrYdF7v0WYTQQ=; b=CqTnwQiHoPSA4YIaZDsziaylKIH2KZZIPhnJ5LD4G2hu1puLUgv3Q8QXx68HMZr+lt IBq4VTKSVz7ebikbHlJsTMtWX1mPV6f6QxRA6KhDC+G5crEMTFf3rYXtFjeM1ggv2KVG UfKvDYrcozyc3NnQ09YbqJYzFco7jlDUa301/5tTMD2+C0KyOuQdA9/GEqO/mUN438cZ CgI9YLHD5wc+i2zNic17AUywEz7DVid7Dlk1f44kU5K2TLtkuxunL5llxYejSgyiJeMo 8zIBDtOnQc/LnCJV8GFBjn+AsSALmP1utjkkDGg4DXgViegfHRQRMjzlZCp/IHMMPII4 rd0w== X-Forwarded-Encrypted: i=1; AJvYcCWjzisGS+lOcnmdSL9got1/769RO4nAyKOJ7U59m8EBvuuD0RPtwQIaQ22Co2o4nO7YYK4uxF4Zy9u2VZ+ADGbiXoM= X-Gm-Message-State: AOJu0YwVYepc8BwZhLfVwMXzHHMNbiSnt7XIQxAz+RAjUvf2bExbX8B+ gBM3fITjAZhxNLlH6/95i+LtG3xH8YaIzV86fqEn+J9UWlGajNIKmKyrJZ6G3+Rhdrl0h01IN4D abRzKe96pvIgWn/CqjEJvx5w/ML8= X-Google-Smtp-Source: AGHT+IFZffRehTVlzmv9yHNpiisxr9LSLmbs4TmHWA1G83FovhjHw4k8OFBV0ksRQhqaEBdbKmu7ADZ7QIm7EfUv8K0= X-Received: by 2002:a05:6902:120a:b0:e06:d62c:98ec with SMTP id 3f1490d57ef6-e0b545b9fb4mr9543675276.45.1722317110554; Mon, 29 Jul 2024 22:25:10 -0700 (PDT) MIME-Version: 1.0 References: <20240727230635.3170-1-flintglass@gmail.com> <20240727230635.3170-2-flintglass@gmail.com> In-Reply-To: From: Takero Funaki Date: Tue, 30 Jul 2024 14:24:58 +0900 Message-ID: Subject: Re: [PATCH v4 1/2] mm: zswap: fix global shrinker memcg iteration To: Yosry Ahmed Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: a6jqbthg9f8u1tie6win44iek4c55fpa X-Rspamd-Queue-Id: A536240016 X-Rspamd-Server: rspam11 X-HE-Tag: 1722317111-200210 X-HE-Meta: U2FsdGVkX1/1JboqdT3CAtdmOZxmFeoPPaRu8m9QuUrvZNkTWA7hMRJu0JS6WOUL4yT0XLoVCm0T6iq+FLIJxzqXJPnMM1LDPrLr5ps00cruPJqmw0lijAukMsaEiLM+ptlGp7LnbbcKnYSBCg8uuC/pku1shD6+nk6ir/50Ga8GjV1Mm9Udii70Behjqycks1OPofQKNRnZ2I7wwywygqTCmyzkTT8Ndflk3CzLJkqNPwEKLt+GUv1jF0BUwDzNw/MFAp9Suu+pMfu9Tt1OC4KJRRMdEh7Slzu2qd1ez1oNk8CVLfgfBj/ubcVnPCNNF8iHy9Aq10/b2wxHO0L9FaWINrJ7yWSMlutTxR2LzGsbG6oxBPLPD/hH2y2OyWEpsc+KeNJ0kFmYtkUYAaZcc/Wj/us8h1PTT1qZnK3ypjS13BB8YSlsQiMO/V6aeS7v8HQMNEWisctY3F8ownec6x6C7/V+FH4tj3GsIzaaKYEVGbN6OkkH6D3x4UGQI9h4mTAog0XupnYSP41VRMN21stPCk1OvsUDpLHpgbYpvTQsqFImxzIuimMOg4IjGEEIiQtgc+F3xj6l1VELCeR+pTyjgbEi0l1PfMdrfYxQv15SPkVvBqn1p4zj7LZviAeu45xeEeJ+zFJpABY1HOqU0rr5hI9JqBQoQpRIvilSHXesKMgB/nvAYl69CZhDL1ESMsUJPVJqWAM/j3fGHsuDKQCb6Hr2hwwE1BmcQJCaaRHYJ8jp0xDQ12alZE7b74st5UHLhvKez/PQLmphxx8oyBrdQpfpykjZEQudd3gOK8DT9F0fbzWm7z76R3AejDkuGSY8boQkVtWHTh+D7i6n/y5OkppQb+wveDfkGLJEzotK9bW2giwTnpDJYiPxzb2rc6GQkz1z65I4zpke603T4DARepe3Pg52tX/xiVFn0NcO+EuUveKmwqEnU+B9eElBF7YZcb3UVyDsCYZGKUp 3MceGA9C 4V7Ka9yr20YgqMxU9FlpM/2y2KV97aQs5KnTdjLSDMb3re4+QiOF15k4115LLIxcXYl03C9rD/ApAwpuRy9yOnO2WQPL2Q4J5GF0CvMNo1XWEHn/ZpzJrhgNERqUaXQwr8t5AcpJuX9VqiM9Fbnw7fx4Ghty1XdReN2h4J3B85pJBOXVCsdff99gYXjncFHAT7XY2Xic5RZO+neLSE/xiqjuPxhQESGZxi6D3NDfV02Sjg31q5rn6aKYTVtKDFsWyQYjNumid7BKhjSemkPmzecjlbPxTQfl0mNOoBvtKtv+YJ3QASmeEFcSGQRmgtpXOEvXo7PjFncBHvvIgZAXzgKXf+A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2024=E5=B9=B47=E6=9C=8830=E6=97=A5(=E7=81=AB) 3:24 Yosry Ahmed : > > On Sat, Jul 27, 2024 at 4:06=E2=80=AFPM Takero Funaki wrote: > > > > This patch fixes an issue where the zswap global shrinker stopped > > iterating through the memcg tree. > > > > The problem was that shrink_worker() would restart iterating memcg tree > > from the tree root, considering an offline memcg as a failure, and abor= t > > shrinking after encountering the same offline memcg 16 times even if > > there is only one offline memcg. After this change, an offline memcg in > > the tree is no longer considered a failure. This allows the shrinker to > > continue shrinking the other online memcgs regardless of whether an > > offline memcg exists, gives higher zswap writeback activity. > > > > To avoid holding refcount of offline memcg encountered during the memcg > > tree walking, shrink_worker() must continue iterating to release the > > offline memcg to ensure the next memcg stored in the cursor is online. > > > > The offline memcg cleaner has also been changed to avoid the same issue= . > > When the next memcg of the offlined memcg is also offline, the refcount > > stored in the iteration cursor was held until the next shrink_worker() > > run. The cleaner must release the offline memcg recursively. > > > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > > Signed-off-by: Takero Funaki > > --- > > mm/zswap.c | 73 ++++++++++++++++++++++++++++++++++++------------------ > > 1 file changed, 49 insertions(+), 24 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index adeaf9c97fde..e9b5343256cd 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -765,12 +765,31 @@ void zswap_folio_swapin(struct folio *folio) > > } > > } > > > > +/* > > + * This function should be called when a memcg is being offlined. > > + * > > + * Since the global shrinker shrink_worker() may hold a reference > > + * of the memcg, we must check and release the reference in > > + * zswap_next_shrink. > > + * > > + * shrink_worker() must handle the case where this function releases > > + * the reference of memcg being shrunk. > > + */ > > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > > { > > /* lock out zswap shrinker walking memcg tree */ > > spin_lock(&zswap_shrink_lock); > > - if (zswap_next_shrink =3D=3D memcg) > > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_= shrink, NULL); > > + if (zswap_next_shrink =3D=3D memcg) { > > + do { > > + zswap_next_shrink =3D mem_cgroup_iter(NULL, zsw= ap_next_shrink, NULL); > > + } while (zswap_next_shrink && !mem_cgroup_online(zswap_= next_shrink)); > > + /* > > + * We verified the next memcg is online. Even if the n= ext > > + * memcg is being offlined here, another cleaner must b= e > > + * waiting for our lock. We can leave the online memcg > > + * reference. > > + */ > > I thought we agreed to drop this comment :) > > > + } > > spin_unlock(&zswap_shrink_lock); > > } > > > > @@ -1304,43 +1323,49 @@ static void shrink_worker(struct work_struct *w= ) > > /* Reclaim down to the accept threshold */ > > thr =3D zswap_accept_thr_pages(); > > > > - /* global reclaim will select cgroup in a round-robin fashion. = */ > > + /* global reclaim will select cgroup in a round-robin fashion. > > nit: s/global/Global > > > + * > > + * We save iteration cursor memcg into zswap_next_shrink, > > + * which can be modified by the offline memcg cleaner > > + * zswap_memcg_offline_cleanup(). > > + * > > + * Since the offline cleaner is called only once, we cannot lea= ve an > > + * offline memcg reference in zswap_next_shrink. > > + * We can rely on the cleaner only if we get online memcg under= lock. > > + * > > + * If we get an offline memcg, we cannot determine if the clean= er has > > + * already been called or will be called later. We must put bac= k the > > + * reference before returning from this function. Otherwise, th= e > > + * offline memcg left in zswap_next_shrink will hold the refere= nce > > + * until the next run of shrink_worker(). > > + */ > > do { > > spin_lock(&zswap_shrink_lock); > > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_= shrink, NULL); > > - memcg =3D zswap_next_shrink; > > > > /* > > - * We need to retry if we have gone through a full roun= d trip, or if we > > - * got an offline memcg (or else we risk undoing the ef= fect of the > > - * zswap memcg offlining cleanup callback). This is not= catastrophic > > - * per se, but it will keep the now offlined memcg host= age for a while. > > - * > > + * Start shrinking from the next memcg after zswap_next= _shrink. > > + * When the offline cleaner has already advanced the cu= rsor, > > + * advancing the cursor here overlooks one memcg, but t= his > > + * should be negligibly rare. > > + */ > > + do { > > + memcg =3D mem_cgroup_iter(NULL, zswap_next_shri= nk, NULL); > > + zswap_next_shrink =3D memcg; > > + } while (memcg && !mem_cgroup_tryget_online(memcg)); > > Let's move spin_lock() and spin_unlock() to be right above and before > the do-while loop, similar to zswap_memcg_offline_cleanup(). This > should make it more obvious what the lock is protecting. > > Actually, maybe it would be cleaner at this point to move the > iteration to find the next online memcg under lock into a helper, and > use it here and in zswap_memcg_offline_cleanup(). zswap_shrink_lock > and zswap_next_shrink can be made static to this helper and maybe some > of the comments could live there instead. Something like > zswap_next_shrink_memcg(). > > This will abstract this whole iteration logic and make shrink_worker() > significantly easier to follow. WDYT? > > I can do that in a followup cleanup patch if you prefer this as well. > I'd really appreciate it. Sorry to have kept you waiting for a novice coder. Thank you for all your comments and support.