From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2102DC3DA63 for ; Tue, 23 Jul 2024 06:30:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D05F6B0085; Tue, 23 Jul 2024 02:30:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97F436B0088; Tue, 23 Jul 2024 02:30:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 822AF6B0089; Tue, 23 Jul 2024 02:30:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 61EB86B0085 for ; Tue, 23 Jul 2024 02:30:53 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1487AA1C3C for ; Tue, 23 Jul 2024 06:30:53 +0000 (UTC) X-FDA: 82370044386.25.E4DAE0A Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf29.hostedemail.com (Postfix) with ESMTP id 26FAA120020 for ; Tue, 23 Jul 2024 06:30:50 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=dvCn6Bb6; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721716189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gixz8dZHBgc5zRylZtlf8sz3hXFhSAuDE08kVvUavqs=; b=HmBrYbQTVJhRtMi0ZHvTmgqRjOIPoGrnrnR7VhMKrGNYTZsPIM8rPhD4ol1xBM/K0MqUyl 3H6QTT7T88rJeefzojxqIZqZ1uE/hvTtOfpNUg9UTPJE6WEBDzCxzRaB15cDJ5tyDYdo20 8p1l7btLWLEHgtjzCSIytavo45dOtcs= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=dvCn6Bb6; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721716189; a=rsa-sha256; cv=none; b=XFvaYgfd2aZTgZCqZTJMr5j77EKc6auBn4kkv9HepcWbWpWyT5p8UrAEbZhTVPgrKBGzhG nH0OwCACfvn4v0ODk7m1PJ+cEWuBu8942y4nogr2AOZD3bNZvUailw7BKE/xHD7KRAp9Up bXirfYJi+q7ahngwffyFhB1Y/YQTkYk= Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2eecd2c6432so74766571fa.3 for ; Mon, 22 Jul 2024 23:30:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721716249; x=1722321049; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gixz8dZHBgc5zRylZtlf8sz3hXFhSAuDE08kVvUavqs=; b=dvCn6Bb6lRa80Rh7jxs1oNfvi3ZIbj0Nz6KPJKRUWXqWIWW/fX8E53asLshUK+URgi lJADGUJNuWyP5I2DxVXaI2EE/NIAiW3oFcn3XCRR5YiLulRHswBcRLNBmHy1HakY479P gYLl/9dGY4DUsmFHV+YnD9NqJnXDu/nH4pPhvj6Vls9HDTMw4LTxkYebA78XzJH1kroT Pk19/iDAV81Ukm+5dTsJw2+84MoEMpaSZIeArjlHDBIXwcf/AwvEtP09bjG7iPyJu11n c1WlVhwY5hPZlRX9vQkXDnB4Rt6lizWD67Jq2+p1jC1a8mUF6tIgoOADused3RExged8 gyEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721716249; x=1722321049; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gixz8dZHBgc5zRylZtlf8sz3hXFhSAuDE08kVvUavqs=; b=LhXbGtgO8M2AKggXtpmblo0RzcUNOkHsRySyLzxNkf8Rbs64ZPqTrcKbLkRaDRDiNp iWc1oEpnCzo1D401LgifNFK0lvyZ3HsAbeb/AGonARrge1ic1ME1juGZTmSA2TwTRKGP 9iQKtUo7MEv/TPzp13kzqeqzZYqXmkjfu2VLNPHrwFY/ml8KLajmrJL27PRwA6i4d27U uVZMynVWXvlN58ouxVHXFoNoav6q8slwN2JNalkLePE/j0ofgtiwFm4XTjkHz3jws8H8 jzLbFRZrvtyi/eE9vF/2HEkjEcbiQQlF0+rbmQbCl4FkiMjcDLRQeWmVL2x9YVS+Nn7H hn5A== X-Forwarded-Encrypted: i=1; AJvYcCWjFmWW4RdLerCkWh3TUGv8Grb/XcaP8rYzvhJS7BV5Ju9Njsys3yu3vtqqGaarMWJ4yeflgW1TPUUyoaw312OltW4= X-Gm-Message-State: AOJu0Yx8lYH0HuuAAh6QqUsTllr7R/sbveHNk7J4To82mwCasyv9itM0 jVZ44CpUVsuMNfG04pbGpx8Vfdx0/pZbk7CObiV8RbfqKAfYOd046XSueHqkZ+pUdYUhms7zLrw vlUOaRHEAfRRO2lxDktR07MlvBDie4J8cZ3jBW/R1zxKL/h3m7mPI X-Google-Smtp-Source: AGHT+IEy9unlR+4QdQeJutICK2sO9G/qd8PbuQ0KDOxxghXJTcuq3XWPg1WGevyTZQnqI6Ecrv1NMQVTTJxxPlnN6lE= X-Received: by 2002:a2e:a595:0:b0:2ef:2c3c:512a with SMTP id 38308e7fff4ca-2ef2c3c5208mr49955271fa.42.1721716248735; Mon, 22 Jul 2024 23:30:48 -0700 (PDT) MIME-Version: 1.0 References: <20240720044127.508042-1-flintglass@gmail.com> <20240720044127.508042-2-flintglass@gmail.com> In-Reply-To: <20240720044127.508042-2-flintglass@gmail.com> From: Yosry Ahmed Date: Mon, 22 Jul 2024 23:30:12 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] mm: zswap: fix global shrinker memcg iteration To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 26FAA120020 X-Stat-Signature: tstcgnrqj7bjtuxd3rfcwrwie3nh58jk X-Rspam-User: X-HE-Tag: 1721716250-89632 X-HE-Meta: U2FsdGVkX18WKB+sUghNe+DZX9ug29kj/vkCy1/D820pow60N9QTxJnWdHdca7+ufUZpn1BDnEZ7hoAt8xjYkFMiSDBQp8KaiJxGDmUYlvMgnGo8pORzXd/ZqbAdxUX7gHJr37x1N+SMhQIKNWuDcwkXgUgZ0adKdNwiIOersGMkrJcit3deJjU/kziZC/B4CyurDT2+Mg70M8fdnNilrZFUW6YBbciaZj3GZaDQGczP7dMRLKMFvbB+atzPMOpcSjC/sI38iWkot9TYwhfiiYlcAfxRnTC9vQdmP2TZjQIkkZ6i5PuP2dYWpUruZy70FGSCSlkjyPqKYf2eo2jc/bIbliIZPOePMvUxQ6AnDVA0hFgPd45pf7Wfr7Dny4azBqgTWWjF+RE6GC0kOQ816QWm+EwzKy825DQ2Zm+IXafNHzssSahE29TOz0R/2fCI+/nzyP+FzPsuHMXOrxowdpYWkH+T5+wA1/03SgWg05bgMstNxBOubcqvcCe3ICiB8TCtCimmea4FuK0QYo6+GobskQMPGrXT66brJuErgLWOs+O7LDT2QqRe2ve/lV0AgaAdXX9eMZ5KkEzjC0t3746rPNIHkPk1K1B1S/izlvWkZwGxVwu5zyvE5BNUBpsu/JQmJawPVExB90L3icQ2GloVBO8efoPm4d1cZhb9OWxRZPgwIcVDCqgTQLc20xBOQT2+4a35dcX3qTp5Dg5SDD/Lz7fcqGbXAMQVbyB0ceZseOB4LmstAtoVigZ/MwkJzBZn/U9CHmwxuDMSYSLFf3oNaGBUES1AIuSMr3mhtj1JN3CSZlN+umZsTqZOaLCX42QZzF5acqD6Mhejzur/c1ZpJtVcwSdENMfEpKoj0FySi8NlNiYqnSqrpVqC/ltVE1CUvcOG8kT6oHdzjRY/yhdnzIq4UhF7b8PcJYTNTXKNQYXwc25veUzCNILN5Q6mxAormDOVNQy1O0kFw0L 8WQly087 vQsjYuqcrRooIQ8DSzMcl48YNE/NGki+8XK1acSeRNqLD9SSpWVdRqwcJvquvQEmDFyHAvD5ONL1T5KO0DeY71OAmsw8q8o3hEnBs9/UFRoOHl8fW6A9zpsRuyM+T5bp3RR7ckJk3mIte2LQlHU2dTzbIMC5ZG7z5IBcE0m9lw6nZVB9Z5SxG1HueqFY0qGqtpeMcPi4+dDWcIzp/+WnP9g0Prg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 19, 2024 at 9:41=E2=80=AFPM Takero Funaki wrote: > > This patch fixes an issue where the zswap global shrinker stopped > iterating through the memcg tree. > > The problem was that shrink_worker() would stop iterating when a memcg > was being offlined and restart from the tree root. Now, it properly > handles the offline memcg and continues shrinking with the next memcg. > > To avoid holding refcount of offline memcg encountered during the memcg > tree walking, shrink_worker() must continue iterating to release the > offline memcg to ensure the next memcg stored in the cursor is online. > > The offline memcg cleaner has also been changed to avoid the same issue. > When the next memcg of the offlined memcg is also offline, the refcount > stored in the iteration cursor was held until the next shrink_worker() > run. The cleaner must release the offline memcg recursively. > > Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") > Signed-off-by: Takero Funaki > --- > mm/zswap.c | 77 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 56 insertions(+), 21 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index a50e2986cd2f..6528668c9af3 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -775,12 +775,33 @@ void zswap_folio_swapin(struct folio *folio) > } > } > > +/* > + * This function should be called when a memcg is being offlined. > + * > + * Since the global shrinker shrink_worker() may hold a reference > + * of the memcg, we must check and release the reference in > + * zswap_next_shrink. > + * > + * shrink_worker() must handle the case where this function releases > + * the reference of memcg being shrunk. > + */ > void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) > { > /* lock out zswap shrinker walking memcg tree */ > spin_lock(&zswap_shrink_lock); > - if (zswap_next_shrink =3D=3D memcg) > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > + if (zswap_next_shrink =3D=3D memcg) { > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, > + zswap_next_shrink, NULL); > + } while (zswap_next_shrink && > + !mem_cgroup_online(zswap_next_shrink)); > + /* > + * We verified the next memcg is online. Even if the nex= t > + * memcg is being offlined here, another cleaner must be > + * waiting for our lock. We can leave the online memcg > + * reference. > + */ I think this comment and the similar one at the end of the loop in shrink_worker() are very similar and not necessary. The large comment above the loop in shrink_worker() already explains how that loop and the offline memcg cleaner interact, and I think the locking follows naturally from there. You can explicitly mention the locking there as well if you think it helps, but I think these comments are a little repetitive and do not add much value. I don't feel strongly about it tho, if Nhat feels like they add value then I am okay with that. Otherwise, and with Nhat's other comments addressed: Acked-by: Yosry Ahmed > + } > spin_unlock(&zswap_shrink_lock); > } > > @@ -1319,18 +1340,38 @@ static void shrink_worker(struct work_struct *w) > /* Reclaim down to the accept threshold */ > thr =3D zswap_accept_thr_pages(); > > - /* global reclaim will select cgroup in a round-robin fashion. */ > + /* global reclaim will select cgroup in a round-robin fashion. > + * > + * We save iteration cursor memcg into zswap_next_shrink, > + * which can be modified by the offline memcg cleaner > + * zswap_memcg_offline_cleanup(). > + * > + * Since the offline cleaner is called only once, we cannot leave= an > + * offline memcg reference in zswap_next_shrink. > + * We can rely on the cleaner only if we get online memcg under l= ock. > + * > + * If we get an offline memcg, we cannot determine if the cleaner= has > + * already been called or will be called later. We must put back = the > + * reference before returning from this function. Otherwise, the > + * offline memcg left in zswap_next_shrink will hold the referenc= e > + * until the next run of shrink_worker(). > + */ > do { > spin_lock(&zswap_shrink_lock); > - zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > - memcg =3D zswap_next_shrink; > > /* > - * We need to retry if we have gone through a full round = trip, or if we > - * got an offline memcg (or else we risk undoing the effe= ct of the > - * zswap memcg offlining cleanup callback). This is not c= atastrophic > - * per se, but it will keep the now offlined memcg hostag= e for a while. > - * > + * Start shrinking from the next memcg after zswap_next_s= hrink. > + * When the offline cleaner has already advanced the curs= or, > + * advancing the cursor here overlooks one memcg, but thi= s > + * should be negligibly rare. > + */ > + do { > + zswap_next_shrink =3D mem_cgroup_iter(NULL, > + zswap_next_shrink, NULL); > + memcg =3D zswap_next_shrink; > + } while (memcg && !mem_cgroup_tryget_online(memcg)); > + > + /* > * Note that if we got an online memcg, we will keep the = extra > * reference in case the original reference obtained by m= em_cgroup_iter > * is dropped by the zswap memcg offlining callback, ensu= ring that the > @@ -1344,17 +1385,11 @@ static void shrink_worker(struct work_struct *w) > goto resched; > } > > - if (!mem_cgroup_tryget_online(memcg)) { > - /* drop the reference from mem_cgroup_iter() */ > - mem_cgroup_iter_break(NULL, memcg); > - zswap_next_shrink =3D NULL; > - spin_unlock(&zswap_shrink_lock); > - > - if (++failures =3D=3D MAX_RECLAIM_RETRIES) > - break; > - > - goto resched; > - } > + /* > + * We verified the memcg is online and got an extra memcg > + * reference. Our memcg might be offlined concurrently b= ut the > + * respective offline cleaner must be waiting for our loc= k. > + */ > spin_unlock(&zswap_shrink_lock); > > ret =3D shrink_memcg(memcg); > -- > 2.43.0 >