From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19451C27C55 for ; Mon, 10 Jun 2024 19:17:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 745A56B009A; Mon, 10 Jun 2024 15:17:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CE936B009B; Mon, 10 Jun 2024 15:17:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5496C6B009C; Mon, 10 Jun 2024 15:17:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2C7096B009A for ; Mon, 10 Jun 2024 15:17:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BB02A1A0EAA for ; Mon, 10 Jun 2024 19:17:01 +0000 (UTC) X-FDA: 82215936642.13.A90EAD7 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf24.hostedemail.com (Postfix) with ESMTP id DD7CA18000A for ; Mon, 10 Jun 2024 19:16:59 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JTHS4LcC; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718047020; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xLI0BFkBBfPQ5TWI+0fPV2cVGs3g9gOOEuOzey2zZBw=; b=gDF2lnEOj+nf9UH5i41y2m9HpllTcFD/4F9B9CiNP40ThR+DfMDbS5DxqP4dl+6AygrfxX KJNMOr4sr+gl0xzrxUr3uA+rEFajNOFPrrDP+rGD1RCjkx8bNlpwaSpa3r6m9VE9gUz1gS lKdxHDk3eUHE3cPcQSwXSJZ9lFx5FVs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718047020; a=rsa-sha256; cv=none; b=NIbcoIrkFOTK+3wkJZ8zTTu724xTxLKy1g7LR2/2dqtqMKifJRTJ2H3fjABoZXNRZZ8qgd 15yR2gDOXIJXjAElsvhdlh72haa2FGZlnYb6aImhCjGMnCpbFGR7iEP6C4xE5r31CV3He7 vxA6AXPF9i67DamqQmy7tYIdgzlZwbw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JTHS4LcC; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a6e43dad8ecso526919166b.1 for ; Mon, 10 Jun 2024 12:16:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718047018; x=1718651818; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xLI0BFkBBfPQ5TWI+0fPV2cVGs3g9gOOEuOzey2zZBw=; b=JTHS4LcChF+ou258aKSCPWrRK3Qp+/v6+rnvr8kzI3rlhlzl/QS2EKGTx5qJC5iHyS 7k0m/xHYxISsYho17r3mvwYOj/iXGRkIbiolVRnG5EVbdCFewJDnKdS1tZBLY5M/v8jr RzLfAbv/z5nr2efqrRI1IoHF+z/A8lG5AV6jDLxDFrm5DvQdZuKpFnb07piFOU9489aT ctKUNPHG+N5oTJeKNdWu2A+vP9Wi7LBH64dR5Ysw+ooFLG98rmI7On+bpKTK9ByePOQc QH3h2ZZflJ7ybGnbx1EvyPLVci/u4gr+a8Qmv+i4N++GAAJPWQtTi3/cxr0zqEZHaB+T l5Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718047018; x=1718651818; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xLI0BFkBBfPQ5TWI+0fPV2cVGs3g9gOOEuOzey2zZBw=; b=OLnVYIh8pvj2xSkmLbFq1ZoId8r9l0qRxKh3qbI2zWHzg8m7ecQjE1WzZr0pHScZYf lWz9tlLaDrciZXOrX9pYrndUCv7qv78vk9WUlnzcZDpu46zXgOYy90oFaGfVhbGhghpj c3+bYpSojc4tHD5htGGk4n9nEyEqkI9gljSzt12SDXqMxIMRcVBZrNsfTNZvtYBhDbV8 Qd+mGEnXnmVSsgzLHX7jOldJ5vks4R390UK6h55XybvCPA+v5z/X55WFVAvxtBq1BIiY ALqHd3GjlHRJ5NeitdACJRrDZZVw+2vf0Con420Ue7rOoZK8V2DoJqcZmFLhGCuk9nOn kXKg== X-Forwarded-Encrypted: i=1; AJvYcCWu0JET9JqFdwmJFrg0gc46sYaC6szeHYki8kXEn7W0RTlFTPexD0r3vmgR1DkNzblpvm9pY+LTMQJT+k603Tye1AU= X-Gm-Message-State: AOJu0Yxs/6nzP7mumWFVvrmyFL1+eWGCHaoZysConPZlz+kM8uSU3t5T wt8crqNosBvQWFpFBwjUVbMcA48mVA7R3S3DRK80KOsXC/6ahns02DRpCfdjNbedupKgFdwFntv ZxEtIyQ61sCKoekozkr+cu+g4gmFt5UQSXVRH X-Google-Smtp-Source: AGHT+IHEyhLEgeimAkd9GXuJSNNv7L8w7oaZQ1vPG7m7h93lU03u3PVsTB3mMqMFL13Rf1nhKFODiGTyicrEiTKjGLI= X-Received: by 2002:a17:906:4708:b0:a6f:dc0:904b with SMTP id a640c23a62f3a-a6f34d9590bmr37050366b.23.1718047018045; Mon, 10 Jun 2024 12:16:58 -0700 (PDT) MIME-Version: 1.0 References: <20240608155316.451600-1-flintglass@gmail.com> <20240608155316.451600-2-flintglass@gmail.com> In-Reply-To: <20240608155316.451600-2-flintglass@gmail.com> From: Yosry Ahmed Date: Mon, 10 Jun 2024 12:16:19 -0700 Message-ID: Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: y9x4y1q6zb6pb65xd5aiquknfko1b6dt X-Rspamd-Queue-Id: DD7CA18000A X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1718047019-444261 X-HE-Meta: U2FsdGVkX1+auG2VQ2uxC/FNian1DTMQpN0LJ5DyRttj82azs+EhViQ85ysvC1d75rSM787OlqamJ6kZ97h9akoMFMNS82uGbWyP2eWVihL3o7FzvtOaay/5laIkPwLUNc2gBMo5QeXlZJnDMPPFVKX01+xfkQkZOnSPZeEAY7tabX7yMbQUOgFVybhWRxkZY/C6iZrr/jnxD1DiEuNoWfbUrBLNrbIcam41UWQuTJod6nU6xnUj5fQAlhfUO5AejxNe5x/GD2DGn74sZGy1SxhZd4MHPQTNMCvSi2t612WD3Ggu0KAY5JF0sXhjZUdOOK7BMPR1dfkd9IPfU5hhv7k7nS7w2BUaeUdTe+hJp8F5MnxYaQFlPwYCYmyPkZHfWt/7soXAzL3DK8TgV0H2mLJ9k6m89Gs54R48OwpBF8y9gw+BtPWUDigYlYvi/1DAXirZhwj+dd9l2jbTCle3tfgfHJITVLALigMJKv1tGkeRgZkA9ndhRZWmMVyeJfSQznQIuq16dB01184A63qeb3eXQU7NTBWOMQXLP3ztj4YRb1ethoMydvG2jSuZ+S+EYnTPnnMZi9HzDE9qTdlPbcAHNbD+uj2LYkzVmlC4VNpZNoUhqKGrPBXZG08Cid49SZf6yDtkE6qq7ztTcXqV/RqlEUk5qxHxv62+kY9Fnd5m/UdrglOJ9upfs9/tEVCeA2w+Qzp7ULY9NUT4fCjagOxxvihYfO9X2BRweWzejjcivLpX30de294tHvWLH9vEiw4s9wcl/fp1XzCge4smy5Rku1rLmSCsgcPTKb5qY6C0crNzFzxAeonR7ig4P6IoIfgmG/J2IoqUYDrYAC6efoFF1C7Yisd4H4qPeooJCKhOtUIDuvvpfSymw8zPeX47WFWBrdwYqIl0H98Gt+K0kPXh6tWac+XO08UTn877jDmN3I1+oZkaaHnr9XglYuE85mFHwPU/zXcaRsEIn+O 6D8PxEx/ I4fzwYUgjheEIN4MJrKzG3afaGH6oBYmcKo/PSecg7QKg3JR8tU0EJdUhTMBnc+6/65WRQB5bVQw6ptHgS9kKWJP9B+66sBS1BTk8SsETbEiRqg7YqcyVuUYeQ+Nds3vOhkFY4PJW5yAJpp4WtmOkOlqgy8z82GzZeeie X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Jun 8, 2024 at 8:53=E2=80=AFAM Takero Funaki = wrote: > > This patch fixes an issue where the zswap global shrinker stopped > iterating through the memcg tree. > > The problem was that `shrink_worker()` would stop iterating when a memcg > was being offlined and restart from the tree root. Now, it properly > handles the offlining memcg and continues shrinking with the next memcg. > > This patch also modified handing of the lock for offlined memcg cleaner > to adapt the change in the iteration, and avoid negligibly rare skipping > of a memcg from shrink iteration. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 87 ++++++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 68 insertions(+), 19 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 80c634acb8d5..d720a42069b6 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -827,12 +827,27 @@ void zswap_folio_swapin(struct folio *folio) > } > } > > +/* > + * This function should be called when a memcg is being offlined. > + * > + * Since the global shrinker shrink_worker() may hold a reference > + * of the memcg, we must check and release the reference in > + * zswap_next_shrink. > + * > + * shrink_worker() must handle the case where this function releases > + * the reference of memcg being shrunk. > + */ > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > { > /* lock out zswap shrinker walking memcg tree */ > spin_lock(&zswap_shrink_lock); > - if (zswap_next_shrink =3D=3D memcg) > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > + > + if (READ_ONCE(zswap_next_shrink) =3D=3D memcg) { > + /* put back reference and advance the cursor */ > + memcg =3D mem_cgroup_iter(NULL, memcg, NULL); > + WRITE_ONCE(zswap_next_shrink, memcg); > + } > + I am really finding it difficult to understand what the diff is trying to do. We are holding a lock that protects zswap_next_shrink. We always access it with the lock held. Why do we need all of this? Adding READ_ONCE() and WRITE_ONCE() where they are not needed is just confusing imo. > spin_unlock(&zswap_shrink_lock); > } > > @@ -1401,25 +1416,44 @@ static int shrink_memcg(struct mem_cgroup *memcg) > > static void shrink_worker(struct work_struct *w) > { > - struct mem_cgroup *memcg; > + struct mem_cgroup *memcg =3D NULL; > + struct mem_cgroup *next_memcg; > int ret, failures =3D 0; > unsigned long thr; > > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > > - /* global reclaim will select cgroup in a round-robin fashion. */ > + /* global reclaim will select cgroup in a round-robin fashion. > + * > + * We save iteration cursor memcg into zswap_next_shrink, > + * which can be modified by the offline memcg cleaner > + * zswap_memcg_offline_cleanup(). > + * > + * Since the offline cleaner is called only once, we cannot aband= one > + * offline memcg reference in zswap_next_shrink. > + * We can rely on the cleaner only if we get online memcg under l= ock. > + * If we get offline memcg, we cannot determine the cleaner will = be > + * called later. We must put it before returning from this functi= on. > + */ > do { > +iternext: > spin_lock(&zswap_shrink_lock); > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > - memcg =3D zswap_next_shrink; > + next_memcg =3D READ_ONCE(zswap_next_shrink); > + > + if (memcg !=3D next_memcg) { > + /* > + * Ours was released by offlining. > + * Use the saved memcg reference. > + */ > + memcg =3D next_memcg; 'memcg' will always be NULL on the first iteration, so we will always start by shrinking 'zswap_next_shrink' for a second time before moving the iterator. > + } else { > + /* advance cursor */ > + memcg =3D mem_cgroup_iter(NULL, memcg, NULL); > + WRITE_ONCE(zswap_next_shrink, memcg); Again, I don't see what this is achieving. The first iteration will always set 'memcg' to 'zswap_next_shrink', and then we will always move the iterator forward. The only difference I see is that we shrink 'zswap_next_shrink' twice in a row now (last 'memcg' in prev call, and first 'memcg' in this call). > + } > > /* > - * We need to retry if we have gone through a full round = trip, or if we > - * got an offline memcg (or else we risk undoing the effe= ct of the > - * zswap memcg offlining cleanup callback). This is not c= atastrophic > - * per se, but it will keep the now offlined memcg hostag= e for a while. > - * > * Note that if we got an online memcg, we will keep the = extra > * reference in case the original reference obtained by m= em_cgroup_iter > * is dropped by the zswap memcg offlining callback, ensu= ring that the > @@ -1434,16 +1468,25 @@ static void shrink_worker(struct work_struct *w) > } > > if (!mem_cgroup_tryget_online(memcg)) { > - /* drop the reference from mem_cgroup_iter() */ > - mem_cgroup_iter_break(NULL, memcg); > - zswap_next_shrink =3D NULL; > + /* > + * It is an offline memcg which we cannot shrink > + * until its pages are reparented. > + * > + * Since we cannot determine if the offline clean= er has > + * been already called or not, the offline memcg = must be > + * put back unconditonally. We cannot abort the l= oop while > + * zswap_next_shrink has a reference of this offl= ine memcg. > + */ You actually deleted the code that actually puts the ref to the offline memcg above. Why don't you just replace mem_cgroup_iter_break(NULL, memcg) with mem_cgroup_iter(NULL, memcg, NULL) here? I don't understand what the patch is trying to do to be honest. This patch is a lot more confusing than it should be. Also, I would like Nhat to weigh in here. Perhaps the decision to reset the iterator instead of advancing it in this case was made for a reason that we should honor. Maybe cgroups are usually offlined together so we will keep running into offline cgroups here if we continue? I am not sure. > spin_unlock(&zswap_shrink_lock); > - > - if (++failures =3D=3D MAX_RECLAIM_RETRIES) > - break; > - > - goto resched; > + goto iternext; > } > + /* > + * We got an extra memcg reference before unlocking. > + * The cleaner cannot free it using zswap_next_shrink. > + * > + * Our memcg can be offlined after we get online memcg he= re. > + * In this case, the cleaner is waiting the lock just beh= ind us. > + */ > spin_unlock(&zswap_shrink_lock); > > ret =3D shrink_memcg(memcg); > @@ -1457,6 +1500,12 @@ static void shrink_worker(struct work_struct *w) > resched: > cond_resched(); > } while (zswap_total_pages() > thr); > + > + /* > + * We can still hold the original memcg reference. > + * The reference is stored in zswap_next_shrink, and then reused > + * by the next shrink_worker(). > + */ > } > > /********************************* > -- > 2.43.0 >