From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79B6DE77188 for ; Wed, 8 Jan 2025 04:21:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07EC56B0088; Tue, 7 Jan 2025 23:21:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02F486B0089; Tue, 7 Jan 2025 23:21:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E38826B008A; Tue, 7 Jan 2025 23:21:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C53546B0088 for ; Tue, 7 Jan 2025 23:21:28 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D33B2C0F0C for ; Wed, 8 Jan 2025 04:14:42 +0000 (UTC) X-FDA: 82982968404.17.1FF684F Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 059258000A for ; Wed, 8 Jan 2025 04:14:40 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hdwVeL23; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736309681; a=rsa-sha256; cv=none; b=fLjnGGBAxTdmNelrEz2rLSk3lH04wBWSantAgqJap6GxUM2VDl5pRjcBeq7q4IArB29B2j ni77MCdrGRLqLmyL4iP51ZxRoYGKMa2udguD7jkjXxR2ch6ypjLX6awmFvJUyLsyt0s6hU V/XZAyV/l8LhZgtQNdRXy0HmijnA76c= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hdwVeL23; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736309681; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R31pDW1Q9MlB4MqPshmRbuMgjuC8TTAY7ZPP2XACOUY=; b=k9m4kLUrwiA3gJmP5+6lAG3Bj3nNOd9aq0n0fInhuyw4tTuql9xu0CO4VXrAB3wa+DeGBe VliM/kbQsdSlTcMObxSd8MZG/z/HQnvMRUPXwBza2gT4EvKrXpwo44uym6t1bu8ng5GA3J 2qsi97s4ggw1pRz15A6kqhtHTV+peKs= Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7b6f95d2eafso1586197085a.3 for ; Tue, 07 Jan 2025 20:14:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736309680; x=1736914480; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R31pDW1Q9MlB4MqPshmRbuMgjuC8TTAY7ZPP2XACOUY=; b=hdwVeL23wl3nH/bGDBVnuU2cEGERRM03PHOR4y42iI9Is3LDcITgg1jTW3QIhhsWhr qUE6OWcteSEZkdVtD1j9OeNoqNaURB1gjca5ODAj1B4Vogk5Sm3JgQBpdv/EquNpnM6b w7SknEbVXaBsQkl8fhLbbKDiLCFwpeQfvUdVIOHTctqbg3DCWzbTs8EXLRarS9PSrSc+ qw73Tw7Hb1XIF9fNldnbx8hcQbycB+A176jPDVaLcCmKVQqauVzL9uU7RJ0rsINxIPMu DK3s5WEn7hMbOc4pOs2sSzO6mK2InntbNaOe6uJ66POu1qnnKxIi22KR63x3ACZxB6/B 2rsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736309680; x=1736914480; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R31pDW1Q9MlB4MqPshmRbuMgjuC8TTAY7ZPP2XACOUY=; b=HrGFaPY7A7VsFQcSz8w+LXwwdvPwCjuVdt/lfc+SVXwVjErVQ8KKUtX0XXLFDafxUa qDOhUD31K0MtdK1pjcFh6xEIxj0ZeJCOdYiKKJUJuBKUo4YYYNFmObmVcyKz9QZ1ZNMJ RKJvGg3V9mbbNZ8PNLqdAIREXBULOPibRWkcUr1RO+U1tVi+4UTV9HhhqlZA3BXNinjg 9ptO0nbEf3s5faDURmNJrW6/GtH+ht34bKLgiU9uctv3e00ag5EkABCCSuNoT+byis++ 2wTgLXUKUWQNiBBznY6e6KagTJLKrp2Z2oawSFp3PWoQezOGBjYK5q6sI8MzxlrEABac HIpg== X-Forwarded-Encrypted: i=1; AJvYcCVlxiMdxEDcjcazmB+cp6MTh8DVHNKePnTJubWfw+xPOgQPmUK1tnux7rD6XqBpzYD+hi+k9DWOow==@kvack.org X-Gm-Message-State: AOJu0YyU4XCobyJcY5hay3A8Fadt+KZwCUikA8N/tcKGE7XDDeyZAa2F qJsyR8foMHQED4JFYIKZjD0H0REk4SJNxz921hJaI6LtNBl+yWkXBv16uZ9Oi/dJwa8ijZAaSVC F30a0lJG1BZS0qLtFAiz+jkiC+Wo3zwZrWgUn X-Gm-Gg: ASbGncsgdgm9rFoQfziLs4RmFCT6PcyRULnibYP6kgh83DqdFfLP6lHpUum2tjSjLh+ 2dU9oNxzH//8gYg4qv9TuOfxIB+UqWIOp/cc= X-Google-Smtp-Source: AGHT+IECFnRdIGy87ZmbOwzV6GoyM9JLzwrOwaRgGda8wpIbwy6GFgWZMIpgxue6wE6cCYG4ero7lP0npeYs1nqaxbQ= X-Received: by 2002:ad4:5f0d:0:b0:6dd:d317:e0aa with SMTP id 6a1803df08f44-6df9b1f6c82mr23929456d6.8.1736309679862; Tue, 07 Jan 2025 20:14:39 -0800 (PST) MIME-Version: 1.0 References: <20250107074724.1756696-1-yosryahmed@google.com> <20250107074724.1756696-2-yosryahmed@google.com> <20250107180345.GD37530@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 20:14:03 -0800 X-Gm-Features: AbW1kvYo53HhZssyW2ItGi-bQfGp9y6AHV5XnIZy1rVs6GqiqRe1qKuCcv9a9HI Message-ID: Subject: Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug To: Nhat Pham , Andrew Morton Cc: Johannes Weiner , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 059258000A X-Stat-Signature: m3aes5ergswpungfxexy9bkeia61ju1w X-HE-Tag: 1736309680-313568 X-HE-Meta: U2FsdGVkX18R9iZZOfKtpSNXV1BYhQv4WYvrcHIZeFmuh1P+sKjQ+k2JURAKzwkKZ5daJJ1rawZSE71ono9M70+oNbGJJ3qQAYHFjoLfVLnHh0ybSMP2SuVTy8EOEc2xu+2oC+X6pxWxMdI3iCuQ9g8feyLUmmEV+xHRweV5VA0nqwjl4buiuIktWa9LiapDNrA4K3w6uPIzurbOskFpPE9IVOFpQQ0dofjnwdRKNhUs3dsrfVPaLcrzVRnDzM+U8LX6ZD2DS9Y4au5bypFCNen2YBKtad6PmfESGFxTTRwQ4icMBE/PKgmvLKzbqlenwj3Z1kCXPVAUysYqqh8vtUMS9zkW8IXcOR+Xd5Hr9qly4UVBdUyixxeIFapbKnP6oTknCMYSO7K7sfdHyWOhGofeM4alVE6VHlFF/Y8FDtV5MDkyNeVakDcRtafyioR6Rn9XllW36OPuNhzpGbmLG6Y46YoVN/P+EiHGlzu9HK1zyjStuGBvXMwGW+20b0PCtImpcXx7jXJw+fxyZtXr9jDZChmF+wQuRUyGWJD6isvUE3a9XHV3/xrQko+SnhzehubhRKUCqs1LvtMgXQbEYx4wgRycnu2CIsgaf0qkr2jsBaqaexOeTbJn3VM/Ux10QjRiqKfF6ghpL/+WxIFqtxOmEyeIRJRQCYeSr0pnV36USPHd/yNccWRMIZKJOFZWfKqzryjQiBOKIO1Qv3eirFfnCRVPHRJwF4i3ypgXS34eFm4cBp5mn8uXve7/qziB2oFSTmKpMGws1RY0MIA9tDSZohVeB5BGCLAguLPcug+25PFd9BTIhIr53c0WeikBtzTOL+m5Cb6PdGAfWlSk95cWInuRb8LGNiFaZMYLmMaYve5vi4aD4sBKnGiGSn0oBdN+0FcV4bFZnypbME5h1ARbIygk3z66gHi4RaY7W5OpDXsNEDlXouf55B2x2GpphaD3sqLiEb4PzeHxgIU 9vQyFmIc OnJ/0baitAsGiBXEXHxHEqUu166JwEB1GNx5j0VVguK/2suGFPWsa2+lovByEasZMVvzJROol2TwO+4v7gUf6sLYiiwm5wE24am2cSJDYjBlFDLIAw0e9F3DRGfCbQI1E8GZNufYcoLMZg6eokKo9mohHgTAuP64xIVSzcrK+ZA8e2OyDtWfkoiQva6MyBfwH0vMRcD+Es1tXPnWWuiCOy7wd8bQNJ3nKBST//uf24koR3kNJ8JXek+bqfxfO/vmmOeK5aCWmv8SXjwIVjs9c0etQvXiJ0Fi1ef6N+UtFWD9qaYin2uHo/dYUhD1C9ivortv12EHUuo9UL04= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 7:56=E2=80=AFPM Nhat Pham wrote: > > On Wed, Jan 8, 2025 at 3:17=E2=80=AFAM Yosry Ahmed wrote: > > > > On Tue, Jan 7, 2025 at 10:13=E2=80=AFAM Yosry Ahmed wrote: > > > > > > On Tue, Jan 7, 2025 at 10:03=E2=80=AFAM Johannes Weiner wrote: > > > > > > > > On Tue, Jan 07, 2025 at 07:47:24AM +0000, Yosry Ahmed wrote: > > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx= of the > > > > > current CPU at the beginning of the operation is retrieved and us= ed > > > > > throughout. However, since neither preemption nor migration are = disabled, > > > > > it is possible that the operation continues on a different CPU. > > > > > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still = in use, > > > > > we run into a UAF bug as the resources attached to the acomp_ctx = are freed > > > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: mov= e to use > > > > > crypto_acomp API for hardware acceleration") when the switch to t= he > > > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_com= p was > > > > > retrieved using get_cpu_ptr() which disables preemption and makes= sure the > > > > > CPU cannot go away from under us. Preemption cannot be disabled = with the > > > > > crypto_acomp API as a sleepable context is needed. > > > > > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer t= o > > > > > per-acomp_ctx") increased the UAF surface area by making the per-= CPU > > > > > buffers dynamic, adding yet another resource that can be freed fr= om under > > > > > zswap compression/decompression by CPU hotunplug. > > > > > > > > > > There are a few ways to fix this: > > > > > (a) Add a refcount for acomp_ctx. > > > > > (b) Disable migration while using the per-CPU acomp_ctx. > > > > > (c) Use SRCU to wait for other CPUs using the acomp_ctx of the CP= U being > > > > > hotunplugged. Normal RCU cannot be used as a sleepable context is > > > > > required. > > > > > > > > > > Implement (c) since it's simpler than (a), and (b) involves using > > > > > migrate_disable() which is apparently undesired (see huge comment= in > > > > > include/linux/preempt.h). > > > > > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for = hardware acceleration") > > > > > Cc: > > > > > Signed-off-by: Yosry Ahmed > > > > > Reported-by: Johannes Weiner > > > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmp= xchg.org/ > > > > > Reported-by: Sam Sun > > > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD= 5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > > > --- > > > > > mm/zswap.c | 31 ++++++++++++++++++++++++++++--- > > > > > 1 file changed, 28 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index f6316b66fb236..add1406d693b8 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -864,12 +864,22 @@ static int zswap_cpu_comp_prepare(unsigned = int cpu, struct hlist_node *node) > > > > > return ret; > > > > > } > > > > > > > > > > +DEFINE_STATIC_SRCU(acomp_srcu); > > > > > + > > > > > static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_no= de *node) > > > > > { > > > > > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_= pool, node); > > > > > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->ac= omp_ctx, cpu); > > > > > > > > > > if (!IS_ERR_OR_NULL(acomp_ctx)) { > > > > > + /* > > > > > + * Even though the acomp_ctx should not be currentl= y in use on > > > > > + * @cpu, it may still be used by compress/decompres= s operations > > > > > + * that started on @cpu and migrated to a different= CPU. Wait > > > > > + * for such usages to complete, any news usages wou= ld be a bug. > > > > > + */ > > > > > + synchronize_srcu(&acomp_srcu); > > > > > > > > The docs suggest you can't solve it like that :( > > > > > > > > Documentation/RCU/Design/Requirements/Requirements.rst: > > > > > > > > Also unlike other RCU flavors, synchronize_srcu() may **not** be > > > > invoked from CPU-hotplug notifiers, due to the fact that SRCU gra= ce > > > > periods make use of timers and the possibility of timers being > > > > temporarily =E2=80=9Cstranded=E2=80=9D on the outgoing CPU. This = stranding of timers > > > > means that timers posted to the outgoing CPU will not fire until > > > > late in the CPU-hotplug process. The problem is that if a notifie= r > > > > is waiting on an SRCU grace period, that grace period is waiting = on > > > > a timer, and that timer is stranded on the outgoing CPU, then the > > > > notifier will never be awakened, in other words, deadlock has > > > > occurred. This same situation of course also prohibits > > > > srcu_barrier() from being invoked from CPU-hotplug notifiers. > > > > > > Thanks for checking, I completely missed this. I guess it only works > > > with SRCU if we use call_srcu(), but then we need to copy the pointer= s > > > to a new struct to avoid racing with the CPU getting onlined again. > > > Otherwise we can just bite the bullet and add a refcount, or use > > > migrate_disable() despite that being undesirable. > > > > > > Do you have a favorite? :) > > > > I briefly looked into refcounting. The annoying thing is that we need > > to handle the race between putting the last refcount in > > zswap_compress()/zswap_decompress(), and the CPU getting onlined again > > and re-initializing the refcount. One way to do it would be to put all > > dynamically allocated resources in a struct with the same struct with > > the new refcount, and use RCU + refcounts to allocate and free the > > struct as a whole. > > > > I am leaning toward just disabling migration for now tbh unless there > > are objections to that, especially this close to the v6.13 release. > > I much prefer the refcounting solution. IMO it's the "proper" fix - > disabling migration is such a heavy-handed resolution. A massive > hammer for a tiny nail, so to speak. I may have found a simpler "proper" fix than disabling migration, please see my suggestion in: https://lore.kernel.org/lkml/CAJD7tkYpNNsbTZZqFoRh-FkXDgxONZEUPKk1YQv7-TFMW= WQRzQ@mail.gmail.com/ > > Is this a frequently occured problem in the wild? If so, we can > disable migration to firefight, and then do the proper thing down the > line. I don't believe so. Actually, I think the deadlock introduced by the previous fix is more problematic than the UAF it fixes. Andrew, could you please pick up patch 1 (the revert) while we figure out the alternative fix? It's important that it lands in v6.13 to avoid the possibility of deadlock. Figuring out an alternative fix is less important.