From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2335C071FD for ; Fri, 19 Apr 2024 19:10:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42B3C6B007B; Fri, 19 Apr 2024 15:10:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DB366B0082; Fri, 19 Apr 2024 15:10:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A3056B0083; Fri, 19 Apr 2024 15:10:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0ED746B007B for ; Fri, 19 Apr 2024 15:10:42 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 47B1D16155A for ; Fri, 19 Apr 2024 19:10:41 +0000 (UTC) X-FDA: 82027223082.03.257652F Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf04.hostedemail.com (Postfix) with ESMTP id 83F6840011 for ; Fri, 19 Apr 2024 19:10:39 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TayzVFzv; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713553839; a=rsa-sha256; cv=none; b=zygpm92vK3pt1OfyHFPZmegCVRW+Lltwo8+lk7bsml6XQE6ajDO7h8jXPAqe8CxWy6Q34A AKpFCmS/hglVYQksAmpQl1lXi0xQsSjlKbj0CIOLwLgNafRL6HXnLBjtiPHatRO7fJgeY/ hMpEZfGLtz5QC1FJQXtQq+GXqZsPK58= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TayzVFzv; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713553839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uJFtcGdhGKjEBBpWXj+iy28DWazemVM0Sr/i5zcF/Xc=; b=6/YxnizuKFhZA9cxI5A8qF/XGh7LupmRApeCFyuF3x/ar7RgNyGvWCSGeIfH2SUv+R14GK jpekvlkiBakqufCB+Xcpvld0kfRtCOUtLkLxRO5H8tq8B28s9D/AmxcQ9Jp5CiQjOrrwRz gmIlcEsEDmH85i8Y3r2TrfneJpR3vt0= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a52aa665747so279520566b.2 for ; Fri, 19 Apr 2024 12:10:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713553838; x=1714158638; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uJFtcGdhGKjEBBpWXj+iy28DWazemVM0Sr/i5zcF/Xc=; b=TayzVFzvxR2A8xMayfYwEzWyVOtPkARjh+qAVHPp19BKnwPiOULPB8dC5mQC9Rh9Mg Kmx3/m3yTdhRPwiEkVQEw4Ec92KIAF5OxThsD63JdoYv/S1xKJ4/nxb1UGJ+A44CpgQ0 QD0dp/135jFzhSwgmijkfFFxiF00wUbIMfyKHzk0gFA/oVgrdWj4ad5xKTWSFpSmxVtV aHRJaofwCuzehr5EDBlJqZF0ePN0tSQxF9HeMD1Ec2R/590unLsqX5Dw++TsiZOBfdMA jwTw6GbXpHCGM+SWH9z/WP3jONOX47EhbZmyZzkpw1hqQCOQi258vFScG+QgtQnYAwl1 ZG+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713553838; x=1714158638; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uJFtcGdhGKjEBBpWXj+iy28DWazemVM0Sr/i5zcF/Xc=; b=OwujxRLtvX5npwfkScYbmhCpNYlcU1Uz7oLSv+2uUMAxyWW6F2NhelSzWy1Brf99MG sLMGnP1fjEI1DvRqmn/C/l6K3F6tasurRXwsgg2gAt9ArvGoS/zCVW+MK36xHHHUXgOM Rrxm2XTtHtVxkiPE0j8DcimtFBHC8dMNcoVcjxo/VQe2no0LZWUBQ+UfVv+c4Xsojrpm DaojPz+f/E24Jp+pnCmJ9HfZ3dsEZSQYn6UTVAl2AlB2KBLdoyLk35wjMoYZ1odTzoLI Zu3XUTLTKfX0zouw6qwg5eBcpC8/u4C6XGxBOQOCWQ+q8dvHqWMpSt2VURrd71rDqYZR ZsMw== X-Forwarded-Encrypted: i=1; AJvYcCXRvrvWCj2VeKsfzyHifSPWZ7gJAgb9N2/JK4N4UepiEUxpzsizjkHAKYuI3q4pkqR5SbJ31PS4PUnjVM5TbPhY6q0= X-Gm-Message-State: AOJu0Yw25EztbAn9hJ25vtLGWFO33uB5vZqNsN7EOKV4es8KS6vYCquL 1PsuJnepZfGl7L0gqBvkv6Ioz6m2abLxTyS+j+rEz7tsq5nlWHuRyqeBOtIrNaXCbjbDdSKV8lJ YTV2pKsNrt+RVWi62OBV/EDOzsulpP0DCO3vQ X-Google-Smtp-Source: AGHT+IExSU2QtTRid0+YRFdGQUwOihOm+cSrHE8ZuZNDhhMB/hnR+BAgCzxr8bHrjtAfhnlbAyR5OdrGT+33PVZSQFc= X-Received: by 2002:a17:907:2d25:b0:a55:358f:783c with SMTP id gs37-20020a1709072d2500b00a55358f783cmr2276033ejc.24.1713553837773; Fri, 19 Apr 2024 12:10:37 -0700 (PDT) MIME-Version: 1.0 References: <3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2> <246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev> <20240417143324.GA1055428@cmpxchg.org> <4c3ppfjxnrqx6g52qvvhqzcc4pated2q5g4mi32l22nwtrkqfq@a4lk6s5zcwvb> <20240418124043.GC1055428@cmpxchg.org> <20240419142231.GD1055428@cmpxchg.org> In-Reply-To: <20240419142231.GD1055428@cmpxchg.org> From: Yosry Ahmed Date: Fri, 19 Apr 2024 12:10:01 -0700 Message-ID: Subject: Re: [REGRESSION] Null pointer dereference while shrinking zswap To: Johannes Weiner Cc: Christian Heusel , Chengming Zhou , Nhat Pham , Seth Jennings , Dan Streetman , Vitaly Wool , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Runge , "Richard W.M. Jones" , Mark W , regressions@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 83F6840011 X-Stat-Signature: d1mwwc8ty39hcup7zigg3gj7moir7axf X-Rspam-User: X-HE-Tag: 1713553839-362406 X-HE-Meta: U2FsdGVkX19Fs4EcR4S+1NHsR8PoppIZMeqYx19go4/6b7ksiSQhY4IVASKswkK5kKgy+Do8mRUKKwfAMdFxzKblvGKai6Pei4JpFUbuUDHM4sT6KxirysEAdhLD5O44Gnj2JMvJUlbi2/Hdvi+rLroAPxAkJPV3+d2Z8Y+Xl4xyTvjhSz7W8JLr2YlFX8LCfGvPCx10RvHpx++lkGLcAoKQ0FTC+rFDnVUV22sYWE2dTaQrNU1ydcnT8u0+RNLzh6nD5XyRQvOuY/dRDu/yFH2Sro9ljDWxSmjgWaO6CrkrlUapBuMBdZr0xwWSQl0Er1MMtvozx4pr+SvK8HYwzWC+VbzEZu8KTA1KY3oAFHedZ3eNIL/5pv0vG+5bWDscROj+6ATO6PFF6pqVrI5gX2BtuokMkloZdlKG6hsX0GmzV9X/vR5Skntp7GJneTfRCT1GUIKtTrlu2q1Va+0jN30DTNPvaGzGK4sBmKMIyMw8O6qBxVu1OYpjR3CYKiLuABuaMlR+JSk/OuSFxo3+PzMYRbajsPf2JkA4j6g6k4T6AF54VamLBd1uQiyyVHtGdTyJz4ib6rPePAX9fj8PMee/+OkcEozDOBmhu4nIa+vBHC1GV/hUz3s0McRznLeB7nfvbCieUmGniQZ8HFAOhtfEgHI63x5bVRA2oIIldmvtswmaIx4Y2NVMnPFgfsiSAQVtbnXk9ymm86efM5gkF7ATWlanpsxSZSTmFxvKOI2SiSxNSqinGu3KLgwzerDbdck0Qgfqeo6+zvpGmIbrZKGRRyIrrB4AS+O3B+T4gzzF0EPOn4zklKU6aDbCebdWqaLLdse4Q2PUEv7NR7Ba3VPPhTwc4iMl4FcwHfuxSqlHb8BhpuDk9D7G/mfJsaO0GaJEw/ftLV0iIS9KW7EBKaRC/rFErnI1yTTGOESn4LBvbdfBpFSAkclKGsO1+Y4Qb0mczcE4GCnqCqKpx2U wXA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 19, 2024 at 7:22=E2=80=AFAM Johannes Weiner wrote: > > On Thu, Apr 18, 2024 at 01:09:22PM -0700, Yosry Ahmed wrote: > > On Thu, Apr 18, 2024 at 5:40=E2=80=AFAM Johannes Weiner wrote: > > > > > > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: > > > > On 24/04/17 10:33AM, Johannes Weiner wrote: > > > > > > > > > > Christian, can you please test the below patch on top of current > > > > > upstream? > > > > > > > > > > > > > Hey Johannes, > > > > > > > > I have applied your patch on top of 6.9-rc4 and it did solve the cr= ash for > > > > me, thanks for hacking together a fix so quickly! =F0=9F=A4=97 > > > > > > > > Tested-By: Christian Heusel > > > > > > Thanks for confirming it, and sorry about the breakage. > > > > > > Andrew, can you please use the updated changelog below? > > > > > > --- > > > > > > From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 200= 1 > > > From: Johannes Weiner > > > Date: Thu, 18 Apr 2024 08:26:28 -0400 > > > Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disab= le=3Dmemory > > > > > > Christian reports a NULL deref in zswap that he bisected down to the > > > zswap shrinker. The issue also cropped up in the bug trackers of > > > libguestfs [1] and the Red Hat bugzilla [2]. > > > > > > The problem is that when memcg is disabled with the boot time flag, > > > the zswap shrinker might get called with sc->memcg =3D=3D NULL. This = is > > > okay in many places, like the lruvec operations. But it crashes in > > > memcg_page_state() - which is only used due to the non-node accountin= g > > > of cgroup's the zswap memory to begin with. > > > > > > Nhat spotted that the memcg can be NULL in the memcg-disabled case, > > > and I was then able to reproduce the crash locally as well. > > > > > > [1] https://github.com/libguestfs/libguestfs/issues/139 > > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=3D2275252 > > > > > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressu= re") > > > Cc: stable@vger.kernel.org [v6.8] > > > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org > > > Reported-by: Christian Heusel > > > Debugged-by: Nhat Pham > > > Suggested-by: Nhat Pham > > > Tested-By: Christian Heusel > > > Signed-off-by: Johannes Weiner > > > > Thanks for fixing this. A couple of comments/questions below, but anywa= y LGTM: > > > > Acked-by: Yosry Ahmed > > > > > --- > > > mm/zswap.c | 25 ++++++++++++++++--------- > > > 1 file changed, 16 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index caed028945b0..6f8850c44b61 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(str= uct shrinker *shrinker, > > > if (!gfp_has_io_fs(sc->gfp_mask)) > > > return 0; > > > > > > -#ifdef CONFIG_MEMCG_KMEM > > > - mem_cgroup_flush_stats(memcg); > > > - nr_backing =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE= _SHIFT; > > > - nr_stored =3D memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > -#else > > > - /* use pool stats instead of memcg stats */ > > > - nr_backing =3D zswap_pool_total_size >> PAGE_SHIFT; > > > - nr_stored =3D atomic_read(&zswap_nr_stored); > > > -#endif > > > + /* > > > + * For memcg, use the cgroup-wide ZSWAP stats since we don't > > > + * have them per-node and thus per-lruvec. Careful if memcg i= s > > > + * runtime-disabled: we can get sc->memcg =3D=3D NULL, which = is ok > > > + * for the lruvec, but not for memcg_page_state(). > > > + * > > > + * Without memcg, use the zswap pool-wide metrics. > > > + */ > > > + if (!mem_cgroup_disabled()) { > > > > With the current shrinker code, it seems like we cannot get sc->memcg > > =3D=3D NULL unless mem_cgroup_disabled() is true indeed. However, maybe > > it's better to check if sc->memcg is not NULL directly instead? > > > > This would be more resilient in case the shrinker code changes and > > passing sc->memcg =3D=3D NULL becomes possible in other cases (e.g. dur= ing > > global shrinking). It seems like other shrinkers do this, for example > > see count_shadow_nodes() and deferred_split_count(). > > Eh, I'm not sure it's better or worse, so it's a bit hard to care. We > shouldn't get NULL here when memcg is enabled, and if somebody > introduces that bug it's better to catch it early than run into subtle > priority inversions when the kernel is deployed to millions of hosts. No strong opinion here, I just thought consistency with other shrinkers would be nice. > > > I am also wondering if we should also check !mem_cgroup_is_root() > > here? We can avoid the expensive global flush and use the global stats > > directly in this case. I could also send a follow up patch for this if > > that's preferred. > > I'd rather not proliferate more memcg internals in this code. If this > is a concern, optimizing it in the flush and stat functions would make > more sense. Reclaim already flushes the subtree before getting here, > so odds are good this is a no-op in most cases. I agree that with per-memcg thresholding, it is very unlikely that this flush does anything. FWIW, if it turns out to be a problem, optimizing it in the flush functions would not be as straightforward, because the memcg code is not aware that zswap has up-to-date global versions of the stats. Anyway, let's table this until someone actually complains, just seemed to me like a low hanging fruit.