From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81BE0C4167B for ; Tue, 5 Dec 2023 19:09:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9C9D6B0072; Tue, 5 Dec 2023 14:09:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D24DD6B008A; Tue, 5 Dec 2023 14:09:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9E626B008C; Tue, 5 Dec 2023 14:09:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A538A6B0072 for ; Tue, 5 Dec 2023 14:09:44 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 769E914021B for ; Tue, 5 Dec 2023 19:09:44 +0000 (UTC) X-FDA: 81533703888.03.0072AB9 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) by imf22.hostedemail.com (Postfix) with ESMTP id 98396C000E for ; Tue, 5 Dec 2023 19:09:42 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EsprmBX1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701803382; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J729J8T0oHmAd28LiRHuVX32xW95nvOWkbAOq4Mb/cE=; b=zBRBbJxUvRNgHbaHm3XfAXLw7BfETmIHY49PlO5jxHE92ApVxPTAFyA//D7XKUH/1FgWOs b3vquKj+3nSG6MUzxIZiCmVP7XcffMVGyDWGBIr4Zg8QTEcA46YyRJkqq9UB2KY6ey1xGI 8l6GliFIu4ICTbxR+9QlUmUNcbus/Bs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EsprmBX1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701803382; a=rsa-sha256; cv=none; b=BLuG0tnss+Y25xOm/wr9wOVgPlwUj3h1s41PS9qHY1VpKfLeLy2dee47xWAtra/TOQDj/a RiVmtrIxz0PbyyR7cXfhoBCf1J5cogN3tZDkold8hmwKtQSZ8wwwq1BP16cd60rAPXZciA xqPsbPRwkRh7+GcIsD58vrvd0P9ABbw= Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-35d5b30eb85so10986595ab.3 for ; Tue, 05 Dec 2023 11:09:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701803381; x=1702408181; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=J729J8T0oHmAd28LiRHuVX32xW95nvOWkbAOq4Mb/cE=; b=EsprmBX1fRBK7MzRzZs3iF4h3SW6Ry5M8I4fSmUc2PxuJetgtu/vcSGPdyyCIXon/7 vIbja1xtIyQZhU+KfscmjpqT81z8Dg15flUQMwaLDUzYOiIVLl2ByNuSlkwyme8lQtVV VfyAXGViaorQGC46l1/Wb5m6LZKXAo7LBlmZVa4Qf+UEhjOKO1OFGjkE0Xe0uLkFkLEF cD4J3CheEjXxinL1LlKx/6M0O3+9Bi3VL0dWT4i1fVdNSwo+fToCx7F3+VrmiNNPpLx5 jjQx5bO67kpYrS3jIDFN4cz6CQnQF4velIMh2D4NQDcnJmZf2DX4kR0FTqpEf5CHoiSn U1lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701803381; x=1702408181; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J729J8T0oHmAd28LiRHuVX32xW95nvOWkbAOq4Mb/cE=; b=UdE3leROoxkzFRJ1ufLXVrUyS5esiSUMQpqkPjrVm4095+JvifiAZZ3OO15kejBTiS e9GyOntAeORxJ9NWa4eEB+3PlJHFt6Lv9kgUZVWOo3GgKy3V1vfh8ife44exNenBCVrC cYoEqa3UfplF1rgVbupDX1FWfntqb5AaQxhY3gObeHdqt03Zpebu/vr/Ptm3VWj2vvD8 B43gtcXM0IG5YFErrWNQqefEnW6oRa7snpFCOCpxTiaf9g8AxgwKTnAz+LWI3FTqsajh bdvSEBah/tm8WNUv5g1uq/qRtEBqEa0DV8f5JGrHhgqFpa6/eNOg0nHl106K+VgpyxZO XjYA== X-Gm-Message-State: AOJu0Yxwogjvr8+gXsvyHkad7R9qf5qLzuTzq6NJY6mh2y9E/BMf+uEI oE8NBeSFILwLR5Omq4vZqLSzUDN8g/IZ3s11sUw= X-Google-Smtp-Source: AGHT+IE2MjGPFNHCe+1rynSzRfDzuhYy+Elaw7KC+ql1s6Aba1jUjS/b0zNqFaH8LsgB+lOTg8RBGqEyfiUsqAT+7qg= X-Received: by 2002:a92:d6d2:0:b0:35d:6741:600a with SMTP id z18-20020a92d6d2000000b0035d6741600amr4601683ilp.58.1701803381643; Tue, 05 Dec 2023 11:09:41 -0800 (PST) MIME-Version: 1.0 References: <20231130194023.4102148-1-nphamcs@gmail.com> <20231130194023.4102148-4-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 5 Dec 2023 11:09:30 -0800 Message-ID: Subject: Re: [PATCH v8 3/6] zswap: make shrinking memcg-aware To: Yosry Ahmed Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 98396C000E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: aaq81k6xowpicc418c4dxbec3imze1xj X-HE-Tag: 1701803382-525376 X-HE-Meta: U2FsdGVkX190XfVe9qInuIRf1aHMwJZzP9UDyhCISPavj95oxi/qcTu/bYiGyTfLTKT7ONSdVozSlgm1m617/LcefdQg5LUBJLduDcRPgm6t+nVsCq74yWfPGLNWgheGaOC+TyzkZ7TPYrDnipYtKkVBfDvCIhkArhlj+mVB+w7YcEl+mI74kIUN+OjhdQtL0st4PhnDniWuhdzUV4Cca3SP1VMuxTnPCyYpt2RiHfqq6mp4/xwsQnuEjTjZboxg9rPAfYZXfOuq4YEgRV/ZKbpkCvN0JlumvWiE2ZHidS6eIwBXVAVCZNTCQsAGsJoUZWPKzIIEMHcL6RIvTyHd0seLWJqCdmwiJ9NLJrinqr+p4KS6RO/YPzr6ec0KFwJ6BAZfmRlAYdCE4tq3RHfnsmXPZDfQVIZffgMf/oYxEK2EFvKo0ABqkOOtMbNCNEfriblv6hJ2XuOrb3vBCqWX4hxZsdXZskjsR8JOt+aUKlIBVeTQMDEtG30C/jzGqwFaSuYvz0DFhlPavpO4svEHF16sIZkHMMLdO333s8jmQCdwbXIQvMA8nVg+ZtBl/W/pGzqcAmIHpxeamH9Mhq5ydnZySM7FS7s65V/ojlDMmIYjKENtywLGLd+9kbzf3JUND3PMEbj9AZg36s3tB08g9B6WErDo200NWDRiaTudIylrbmLOLehgSRRXyiyEEgAk7M3ooYId0WF/eN9skeJ7hvr3UgVVOCZfm1GJgwjkqIa3bdogGjK4Tl9uz7Xa7j7Amll4S4M/za4gZCLqFVndqkvPPkQtS3hM0MT/2O5GbU7yoBFf3Ke/q46KGOJSnPKtvdW/VivJjS0kUaum+6LClXLUCphGBd8TIyPsa4gEFZPaBs6ikGEQyBlLLMF6y549LXzXJ3hEPp8DBCi8uxU1Am6+0FDK1Xs8rQfAMFxOzThKLfFQvJj/40JrKTholfbPy4l1DibdvZn4b03kT8H ipGLdgK6 iO97n3MwV/OWmtTXkvU4VFuPSiKh66a9WEu23EZn4L+aQioKm8pasjKL3+oJlb3ns0QKAAjgpZv6FKb9FyHitT6Hcz9Lo0akpshhVVuUNP9vgfaTV6K2JIJeYvCvqyhDOGGmlzbWNr7Yrxva9EtCoNkvhoKViudX8Bcco5cFhflVai7WqbmNXBJ3aopuQ7ILTMViif4vCBFyao9abIeIFq98oya6GuuSr5Nqhyjnx+LkqkJ9SogMyMYx6P7oyFzUPGKzznRUShgGZd2drwYBwUuPmKTb2T2P6AWk32T/UrH2VLAG9fxhHG2tgMyExGBvAaC8GRcJp++ETqDum1Nuj+2ojbyR0XV5kWbux2z3gzpvbZP0A37iomldvYmJj8p5SUZBw6rB0ykw9p57+6gRt77fUir9UzEeHgaz7ULryHCQAZqA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 5, 2023 at 11:00=E2=80=AFAM Yosry Ahmed = wrote: > > [..] > > > > static void shrink_worker(struct work_struct *w) > > > > { > > > > struct zswap_pool *pool =3D container_of(w, typeof(*pool), > > > > shrink_work); > > > > + struct mem_cgroup *memcg; > > > > int ret, failures =3D 0; > > > > > > > > + /* global reclaim will select cgroup in a round-robin fashi= on. */ > > > > do { > > > > - ret =3D zswap_reclaim_entry(pool); > > > > - if (ret) { > > > > - zswap_reject_reclaim_fail++; > > > > - if (ret !=3D -EAGAIN) > > > > + spin_lock(&zswap_pools_lock); > > > > + pool->next_shrink =3D mem_cgroup_iter(NULL, pool->n= ext_shrink, NULL); > > > > + memcg =3D pool->next_shrink; > > > > + > > > > + /* > > > > + * We need to retry if we have gone through a full = round trip, or if we > > > > + * got an offline memcg (or else we risk undoing th= e effect of the > > > > + * zswap memcg offlining cleanup callback). This is= not catastrophic > > > > + * per se, but it will keep the now offlined memcg = hostage for a while. > > > > + * > > > > + * Note that if we got an online memcg, we will kee= p the extra > > > > + * reference in case the original reference obtaine= d by mem_cgroup_iter > > > > + * is dropped by the zswap memcg offlining callback= , ensuring that the > > > > + * memcg is not killed when we are reclaiming. > > > > + */ > > > > + if (!memcg) { > > > > + spin_unlock(&zswap_pools_lock); > > > > + if (++failures =3D=3D MAX_RECLAIM_RETRIES) > > > > break; > > > > + > > > > + goto resched; > > > > + } > > > > + > > > > + if (!mem_cgroup_online(memcg)) { > > > > + /* drop the reference from mem_cgroup_iter(= ) */ > > > > + mem_cgroup_put(memcg); > > > > > > Probably better to use mem_cgroup_iter_break() here? > > > > mem_cgroup_iter_break(NULL, memcg) seems to perform the same thing, rig= ht? > > Yes, but it's better to break the iteration with the documented API > (e.g. if mem_cgroup_iter_break() changes to do extra work). Hmm, a mostly aesthetic fix to me, but I don't have a strong opinion otherw= ise. > > > > > > > > > Also, I don't see mem_cgroup_tryget_online() being used here (where I > > > expected it to be used), did I miss it? > > > > Oh shoot yeah that was a typo - it should be > > mem_cgroup_tryget_online(). Let me send a fix to that. > > > > > > > > > + pool->next_shrink =3D NULL; > > > > + spin_unlock(&zswap_pools_lock); > > > > + > > > > if (++failures =3D=3D MAX_RECLAIM_RETRIES) > > > > break; > > > > + > > > > + goto resched; > > > > } > > > > + spin_unlock(&zswap_pools_lock); > > > > + > > > > + ret =3D shrink_memcg(memcg); > > > > > > We just checked for online-ness above, and then shrink_memcg() checks > > > it again. Is this intentional? > > > > Hmm these two checks are for two different purposes. The check above > > is mainly to prevent accidentally undoing the offline cleanup callback > > during memcg selection step. Inside shrink_memcg(), we check > > onlineness again to prevent reclaiming from offlined memcgs - which in > > effect will trigger the reclaim of the parent's memcg. > > Right, but two checks in close proximity are not doing a lot. > Especially that the memcg online-ness can change right after the check > inside shrink_memcg() anyway, so it's a best effort thing. > > Anyway, it shouldn't matter much. We can leave it. > > > > > > > > > > + /* drop the extra reference */ > > > > > > Where does the extra reference come from? > > > > The extra reference is from mem_cgroup_tryget_online(). We get two > > references in the dance above - one from mem_cgroup_iter() (which can > > be dropped) and one extra from mem_cgroup_tryget_online(). I kept the > > second one in case the first one was dropped by the zswap memcg > > offlining callback, but after reclaiming it is safe to just drop it. > > Right. I was confused by the missing mem_cgroup_tryget_online(). > > > > > > > > > > + mem_cgroup_put(memcg); > > > > + > > > > + if (ret =3D=3D -EINVAL) > > > > + break; > > > > + if (ret && ++failures =3D=3D MAX_RECLAIM_RETRIES) > > > > + break; > > > > + > > > > +resched: > > > > cond_resched(); > > > > } while (!zswap_can_accept()); > > > > - zswap_pool_put(pool); > > > > } > > > > > > > > static struct zswap_pool *zswap_pool_create(char *type, char *comp= ressor) > [..] > > > > @@ -1240,15 +1395,15 @@ bool zswap_store(struct folio *folio) > > > > zswap_invalidate_entry(tree, dupentry); > > > > } > > > > spin_unlock(&tree->lock); > > > > - > > > > - /* > > > > - * XXX: zswap reclaim does not work with cgroups yet. Witho= ut a > > > > - * cgroup-aware entry LRU, we will push out entries system-= wide based on > > > > - * local cgroup limits. > > > > - */ > > > > objcg =3D get_obj_cgroup_from_folio(folio); > > > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > > > - goto reject; > > > > + if (objcg && !obj_cgroup_may_zswap(objcg)) { > > > > + memcg =3D get_mem_cgroup_from_objcg(objcg); > > > > > > Do we need a reference here? IIUC, this is folio_memcg() and the foli= o > > > is locked, so folio_memcg() should remain stable, no? > > > > Hmmm obj_cgroup_may_zswap() also holds a reference to the objcg's > > memcg, so I just followed the patterns to be safe. > > Perhaps it's less clear inside obj_cgroup_may_zswap(). We can actually > pass the folio to obj_cgroup_may_zswap(), add a debug check that the > folio is locked, and avoid getting the ref there as well. That can be > done separately. Perhaps Johannes can shed some light on this, if > there's a different reason why getting a ref there is needed. > > For this change, I think the refcount manipulation is unnecessary. Hmm true. I'm leaning towards playing it safe - worst case scenario, we can send a follow up patch to optimize this (perhaps for both places, if neither place requires pinning the memcg). But I'll wait for Johannes to chime in with his opinions on the matter. > > > > > > > > > > > Same for the call below. > > > > > > > + if (shrink_memcg(memcg)) { > > > > + mem_cgroup_put(memcg); > > > > + goto reject; > > > > + } > > > > + mem_cgroup_put(memcg); > > > > + } > > > > > > > > /* reclaim space if needed */ > > > > if (zswap_is_full()) { > [..]