From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A62A0C3DA4A for ; Sat, 3 Aug 2024 04:14:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C79A46B007B; Sat, 3 Aug 2024 00:14:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2AA96B0083; Sat, 3 Aug 2024 00:14:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF1B36B0085; Sat, 3 Aug 2024 00:14:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 906D26B007B for ; Sat, 3 Aug 2024 00:14:58 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EB7AA1C4D8B for ; Sat, 3 Aug 2024 04:14:57 +0000 (UTC) X-FDA: 82409618634.23.FC0E4D9 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf14.hostedemail.com (Postfix) with ESMTP id 1998F100008 for ; Sat, 3 Aug 2024 04:14:55 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MCksmBim; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722658448; a=rsa-sha256; cv=none; b=UrsMaNkXY3Ea38MIMMpe2Cm9fGaYA8lx8cbSloU3UASvyd/EGGHJFyB4HiojgIgTq7kVkL 0XP8tVby4mGId5jHD4VAPCg70sOU4Jz0JefNYDhMTQZcnc9GVNzcws3ToR+kvOBB69Z54c 2c6Z0OiQfupVTIea67hNPFmJ86V5vuc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MCksmBim; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722658448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q3VJn+k5RR3eFH5H5UkduuJdBdQKbGUZ1f4tj4+JJqU=; b=xukDaQ7HWndK/1guk4I9olYtZLAC/lbnELfPSU3JTBWvQ72fqZAJWJA3B8vIlo0llSjwxg IwR9qKf1VDEoUHXxxfYcPGv9lhgWgjpQWOCwj4T4oOmtzOyshWszRDi8MPJ6WTs6Y0KZsR 4OVhj94+wpIsYieQC6eHSHSHhAH3gMs= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a7b2dbd81e3so1118136666b.1 for ; Fri, 02 Aug 2024 21:14:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722658494; x=1723263294; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q3VJn+k5RR3eFH5H5UkduuJdBdQKbGUZ1f4tj4+JJqU=; b=MCksmBimT7ac6l3odhdUL6PDMQ9D12Cqq93xrkP0ml1+Rh7911y7/cVxDhKTDp0TOy b1kRlHQFOw8Vf8gvY+Paq3cM9rcuNl9pJQ9nUkEXnokGiazyAK27e1+GSo/lNvjuhO1K MMRcKC0/86/3R85C0Z6ZEQWY2rD4n60GP7ouDuGQz32ymAqu5MLN4ZMcgxAnxIftxyoM OTmh2QTFQpo3UFhZXzZQixbK4ewk1g9E8n2t+MjVfnweyaECTRr7eHbn+wGvsPGJZ9lt EKrLvix7lRWbKDPe3RjukPzrcDFHv/29H+amyPDuhfSAiTfevMJVm7FHbVSRykSMlvqn LHRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722658494; x=1723263294; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q3VJn+k5RR3eFH5H5UkduuJdBdQKbGUZ1f4tj4+JJqU=; b=DXy9RUn4X54xPV5GW+WIFgWJLDbEkLLs8slJRvUEW3xYnavXcTn27GUaUtvaUHj6be SvfJ1xk6Px75Lh+S58MVc0NravE2syKwPMH9sUYyquFYlbYCyyVqQuI8w2l1Ydcmyx+j JZx+Fdt5QtVxyMRohmSW4iAidZYIUIJysPU7I8hx25DQ9AifUOz12ihiaVLGF9ns6sm1 vck8pZsktTa4WyfH0S3kDs6+A8EHi5l/RJKQ8J6/woOKdqAgPSRgE6cgitRxbbNgYiNi MLz/upfav6W0l7IC7/R/KUOBhpO8Qg6rLH6fJ8WkTQz1jnlLb2c6oh+fYbzMPy9WXtxI nlLw== X-Forwarded-Encrypted: i=1; AJvYcCVXSHj3VzrFUZ+rf+CW19GX9S0DsKYMO36a/FttuG7V1sHHW1/0jkhV52w5cffpzTzUJncxqpZFQhsGJUfxPrBSQOU= X-Gm-Message-State: AOJu0YxawosxsSJ8CKDzhC10jr5+Xyg6XfyppSSIlWRjY/gAGYz9ebX2 zlG1PyrtR8D8S4pAnw9/O6R2cqlWRCoiQGaxEsJ1a3Vww1otzvvb3qHlIuKHcBi5FyDeed9camI s/pqkpm0I2yROpZG6LMpK7VrfbCdwngIF4E/K X-Google-Smtp-Source: AGHT+IEoMBpJZeESnd2W4YFmxshI61AgRv6W4RcOIJNoZalizIpnQjegykPGDNtfsMZsnz7LWshTQxgWgDf0htFSqGw= X-Received: by 2002:a17:906:da8a:b0:a77:dd1c:6274 with SMTP id a640c23a62f3a-a7dc5124d64mr281729466b.69.1722658493451; Fri, 02 Aug 2024 21:14:53 -0700 (PDT) MIME-Version: 1.0 References: <20240731004918.33182-1-flintglass@gmail.com> <20240731004918.33182-2-flintglass@gmail.com> In-Reply-To: <20240731004918.33182-2-flintglass@gmail.com> From: Yosry Ahmed Date: Fri, 2 Aug 2024 21:14:17 -0700 Message-ID: Subject: Re: [PATCH v5 1/2] mm: zswap: fix global shrinker memcg iteration To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1998F100008 X-Stat-Signature: xexzq7ua4wyhdgamxf1ynth91nzmcbox X-Rspam-User: X-HE-Tag: 1722658495-917252 X-HE-Meta: U2FsdGVkX1+xJK3KteTMZd/LmAgLHglhrbe2HPFOCM3ro5MjIoE7LVX5Kb78eaD9c0SFD/etxdxoU78P93FpLrcYi3PaaIZ3hDLk37ec163CePaLSI/HqGKRKVpxf2j4ZozmwSU8W0GBpi6TBdqWA4vhZC5ZAgz14OLoCf5pbgO7Rm86AWf8E+XxZK6LMwCsE33BYSGLIBwbbYgX5OMZ67kMd1GEzG+OqVEVUy/I3EcpDOwVsze8gYrimzzH6+EYNcd6K/cf43R9GaRpxIEbBmk+rzzok1JztcF+Cst0MKcyIvMB6QUgT5T6r+yRkwOut+lc0KMRrnEbwquYFZ9mIYqf7SZq5VGTdD2Yb645a2jPPjgrduf2MxGRtkbEUnVFw4Qa+aqTQscBjvVQogaHApllCXcc0ZPvFlIZKJ4uiI92/WbI7J4nUOIbWQtMjX07beRLqURko9qhvgS4JUm7I/InOTjEiHD+jeHBe48Jy/JzdSWq5r+ZMyvm1nKuSkfRPj48v6ieyN8ObY7uFPUL4YRO9qhREUIOF8GQ+wZCJ7irE2RZagQOWdvxxhkSbZep2MVIqBa/+/Q2RFKi04svqfSBu4sDGdNhxcPXvBc4a14rmphHNf0a7a5cH6Zxgh95BWsa+jP2b+UFGNCrBg3ZlJGKCTksPbMxiWpIBFt7ZofPuI3nzJJZtcsXk4yP/gFFx1Ec8AnPDlkXwUjD6LKHiX2NdwwqgzP0qVKPNEMpYC8Pm6ErbSNHt2EKPvUMB3yKrr+QzJkKgcxtV+3TtQTPyhSWPKLOI4NDHGPyRIsYIxuOsArd40J4kKL7UwmNbmgZvw/rXitbc0oqLdAlxCFE49Zv78Tvif+C/qOrNa6SReEhOMKOc+M1A8Jo6CkQAw9PHSiTb5IP1zX6VXwgCYJUMmdlX5gVswl/nmy7+iJi8Ct3pnhDFXyIXEIPYuj6bIJLKYYX3ckeMpmGXXxO+RO njhYUHkY OfFKdi4yX/xYExcvhdOeDNKRXhlIjkR84lQYJjq9DBT+2Bfby2zawR50cUl5dEbc+1WrmiAuQ4yKeYR+/XKQsMsMQ1m8KSK23N2x/Bd8Ib/w8C/Nv9RPHZ4OWNk4ZPp4dISyb3GUmjGbIQsPUOxeWgS9bTY4f0HRxqHYYRF2nGZ02SgrLfF2YbkUBX3wZaeFHbZPQU1D+OVWuJZzPKKYkvuJz0w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 30, 2024 at 5:49=E2=80=AFPM Takero Funaki wrote: > > This patch fixes an issue where the zswap global shrinker stopped > iterating through the memcg tree. > > The problem was that shrink_worker() would restart iterating memcg tree > from the tree root, considering an offline memcg as a failure, and abort > shrinking after encountering the same offline memcg 16 times even if > there is only one offline memcg. After this change, an offline memcg in > the tree is no longer considered a failure. This allows the shrinker to > continue shrinking the other online memcgs regardless of whether an > offline memcg exists, gives higher zswap writeback activity. > > To avoid holding refcount of offline memcg encountered during the memcg > tree walking, shrink_worker() must continue iterating to release the > offline memcg to ensure the next memcg stored in the cursor is online. > > The offline memcg cleaner has also been changed to avoid the same issue. > When the next memcg of the offlined memcg is also offline, the refcount > stored in the iteration cursor was held until the next shrink_worker() > run. The cleaner must release the offline memcg recursively. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Acked-by: Yosry Ahmed > Reviewed-by: Chengming Zhou > Reviewed-by: Nhat Pham > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 68 +++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 44 insertions(+), 24 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index adeaf9c97fde..3c16a1192252 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -765,12 +765,25 @@ void zswap_folio_swapin(struct folio *folio) > } > } > > +/* > + * This function should be called when a memcg is being offlined. > + * > + * Since the global shrinker shrink_worker() may hold a reference > + * of the memcg, we must check and release the reference in > + * zswap_next_shrink. > + * > + * shrink_worker() must handle the case where this function releases > + * the reference of memcg being shrunk. > + */ > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > { > /* lock out zswap shrinker walking memcg tree */ > spin_lock(&zswap_shrink_lock); > - if (zswap_next_shrink =3D=3D memcg) > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > + if (zswap_next_shrink =3D=3D memcg) { > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap= _next_shrink, NULL); > + } while (zswap_next_shrink && !mem_cgroup_online(zswap_ne= xt_shrink)); > + } > spin_unlock(&zswap_shrink_lock); > } > > @@ -1304,43 +1317,50 @@ static void shrink_worker(struct work_struct *w) > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > > - /* global reclaim will select cgroup in a round-robin fashion. */ > + /* > + * Global reclaim will select cgroup in a round-robin fashion. > + * > + * We save iteration cursor memcg into zswap_next_shrink, > + * which can be modified by the offline memcg cleaner > + * zswap_memcg_offline_cleanup(). > + * > + * Since the offline cleaner is called only once, we cannot leave= an > + * offline memcg reference in zswap_next_shrink. > + * We can rely on the cleaner only if we get online memcg under l= ock. > + * > + * If we get an offline memcg, we cannot determine if the cleaner= has > + * already been called or will be called later. We must put back = the > + * reference before returning from this function. Otherwise, the > + * offline memcg left in zswap_next_shrink will hold the referenc= e > + * until the next run of shrink_worker(). > + */ > do { > spin_lock(&zswap_shrink_lock); > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > - memcg =3D zswap_next_shrink; > > /* > - * We need to retry if we have gone through a full round = trip, or if we > - * got an offline memcg (or else we risk undoing the effe= ct of the > - * zswap memcg offlining cleanup callback). This is not c= atastrophic > - * per se, but it will keep the now offlined memcg hostag= e for a while. > - * > + * Start shrinking from the next memcg after zswap_next_s= hrink. > + * When the offline cleaner has already advanced the curs= or, > + * advancing the cursor here overlooks one memcg, but thi= s > + * should be negligibly rare. > + */ > + do { > + memcg =3D mem_cgroup_iter(NULL, zswap_next_shrink= , NULL); > + zswap_next_shrink =3D memcg; > + } while (memcg && !mem_cgroup_tryget_online(memcg)); I took a look at refactoring the loop to a helper, but it's probably not going to be any clearer because this loop has a tryget, and the loop in zswap_memcg_offline_cleanup() only has an online check. Using a tryget in the offline cleanup version would be wasteful and we'll put the ref right away. Instead, I think we should just move the spin_lock/unlock() closer to the loop to make the critical section more obvious, and unify the comments above and below into a single block. Andrew, could you please fold in the following diff (unless Takero objects)= : diff --git a/mm/zswap.c b/mm/zswap.c index babf0abbcc765..df620eacd1d11 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1364,24 +1364,22 @@ static void shrink_worker(struct work_struct *w) * until the next run of shrink_worker(). */ do { - spin_lock(&zswap_shrink_lock); - /* * Start shrinking from the next memcg after zswap_next_shr= ink. * When the offline cleaner has already advanced the cursor= , * advancing the cursor here overlooks one memcg, but this * should be negligibly rare. + * + * If we get an online memcg, keep the extra reference in c= ase + * the original one obtained by mem_cgroup_iter() is droppe= d by + * zswap_memcg_offline_cleanup() while we are shrinking the + * memcg. */ + spin_lock(&zswap_shrink_lock); do { memcg =3D mem_cgroup_iter(NULL, zswap_next_shrink, = NULL); zswap_next_shrink =3D memcg; } while (memcg && !mem_cgroup_tryget_online(memcg)); - /* - * Note that if we got an online memcg, we will keep the ex= tra - * reference in case the original reference obtained by mem_cgroup_iter - * is dropped by the zswap memcg offlining callback, ensuring that the - * memcg is not killed when we are reclaiming. - */ spin_unlock(&zswap_shrink_lock); if (!memcg) { > + /* > * Note that if we got an online memcg, we will keep the = extra > * reference in case the original reference obtained by m= em_cgroup_iter > * is dropped by the zswap memcg offlining callback, ensu= ring that the > * memcg is not killed when we are reclaiming. > */ > - if (!memcg) { > - spin_unlock(&zswap_shrink_lock); > - if (++failures =3D=3D MAX_RECLAIM_RETRIES) > - break; > - > - goto resched; > - } > - > - if (!mem_cgroup_tryget_online(memcg)) { > - /* drop the reference from mem_cgroup_iter() */ > - mem_cgroup_iter_break(NULL, memcg); > - zswap_next_shrink =3D NULL; > - spin_unlock(&zswap_shrink_lock); > + spin_unlock(&zswap_shrink_lock); > > + if (!memcg) { > if (++failures =3D=3D MAX_RECLAIM_RETRIES) > break; > > goto resched; > } > - spin_unlock(&zswap_shrink_lock); > > ret =3D shrink_memcg(memcg); > /* drop the extra reference */ > -- > 2.43.0 >