From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E37E2E77197 for ; Tue, 7 Jan 2025 21:59:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F9356B008A; Tue, 7 Jan 2025 16:59:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A9306B0095; Tue, 7 Jan 2025 16:59:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 449846B0096; Tue, 7 Jan 2025 16:59:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 254776B008A for ; Tue, 7 Jan 2025 16:59:26 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CCE3644E01 for ; Tue, 7 Jan 2025 21:59:25 +0000 (UTC) X-FDA: 82982022690.29.4F37A35 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf30.hostedemail.com (Postfix) with ESMTP id 0353680017 for ; Tue, 7 Jan 2025 21:59:22 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fSluMw8r; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736287163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7DAh/oAJ8aLvhN0jleLakh87KCtpa5pUhpozrVuVGeM=; b=ph+GRY4+Ivl+dxDUK8s0vojIMK5WobQ/0bXH+MO7XX8mD8g1o4geuQDphfW5E/Z4czanTG USRP1dJNzcRD7TxUBtYHqH7Ww7EJ9xaEyP0RHFuP+1LxdF8SNqK3ufGnumOYJ3c1lrgMfK gtJ6R9uxy35n2GaensNWnX2KJDRCmp4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736287163; a=rsa-sha256; cv=none; b=eCxxrdi/7AhCJG2q2OjRxyVtz7m6Fha34j+1MmrK2tzdgSbMah170p28cGUSBSOkB0SAXg UYzSzdkW+fdW/0BqliDLRDa3X4o4wXP0rIgM1Qj9rLFJL2uPgT0jo6yiV6uCZgmK+H0BNh OFJBZ8qGnGdm7l00QEwDm+MJQx99Qvs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fSluMw8r; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6d92e457230so154990536d6.1 for ; Tue, 07 Jan 2025 13:59:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736287162; x=1736891962; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7DAh/oAJ8aLvhN0jleLakh87KCtpa5pUhpozrVuVGeM=; b=fSluMw8rLKLv3k7LbqWCmy3BP7BUvGdtGFL54uvDKAh7iU9KVjeLcnSP+lvcTTMQAQ Bktn7bhcxI3ACDlRf8b06ArX/PCEuexD6Wxxou2JQNr6CwljRXmyJ+P3HrPolILdtFi7 OGKEnGIhtkSfxVMBNcl8cguNW7av0CPT4msG/ZH8qJbZoAgTd1rge98Y2vSjj7qf26st hDM+fjP+5daGqZInD1fhItvWD4yYb4+XJmV5wpGzriiDPQDPGzQmJia98QH7wfCH7JE8 x9NhNwIXAfJwMziHZpGejXJfoiAqi2vMA3fMVdP3bZvVErGvkJwBSmptPmY1QTRgOKKp WKbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736287162; x=1736891962; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7DAh/oAJ8aLvhN0jleLakh87KCtpa5pUhpozrVuVGeM=; b=fUa06mTc594sI0yV3QjOu2C57pdHs3s9p7g1kvY8uAgNcAogoNJmgPFbIuwtOLc1DL bOyfVNM7QW1DUuSJOh97x5saPO8d9KLb7w/1Gq5/1iFmWZYYZ3pysA0dDMZMfnOK+D+M DV8BGvJspC0Ughx0pHVBiApxs3Fo2TK8/iZ2k5xGtCtZC5VFpuTH9yYJnmyTsUQlZe6A rVq7OypgGWB7WEhUKHRI6XAo2Jsr3BZmG3VYtpxcaaoM9x0kXe9EVZot7h3/IS1sJgEV m8yrlWlt51tpAtX5DauNQtuAcaeQ9VWLIaopNd3t1TScfuTpO0A1qneEvXbl9MG5g0Bp DX4Q== X-Forwarded-Encrypted: i=1; AJvYcCWBrftZuPv5CY37PztqdfiM/ypAfpks1CHHbrMs5B7EvjEMuEi8RU/JwjdDIagt9LNkWd9xNkqZ/Q==@kvack.org X-Gm-Message-State: AOJu0Yy3a7QRKZxFo9vCy4DxFCvChwtAILJGHFMqHUC49G8R82Izzr9P Bqb6AP10ZQs6K1/sWlS00INBR+pquQnPXok9Ru7iflgLMSXO+2QE9cUBD5NaliVBwMM963n5Jt+ xklZbwj5d41HYhGrSwnuJtz5bX4AtC8mqGn7V X-Gm-Gg: ASbGnctXIEKOay/dM32py5d8iLVPsP9ypL/B4nt56hWjC6ai1hp76IkI+w1/OrwsGHJ WPFQQn85sn8krM99ZV81o4vR4FlioE6izAdS9v5WoCCxB/JgDIetGexiNXRVPXTXW0hoO X-Google-Smtp-Source: AGHT+IEido2qo2/HV1Z8wP3eQewiqY8nQhqAxWXvdzYKFLUPPEP30NxW/bqGPCWQlTrk6iNaoRCw7iXOeUoba1MVgV4= X-Received: by 2002:ad4:5f0d:0:b0:6dd:d317:e0aa with SMTP id 6a1803df08f44-6df9b1f6c82mr10348176d6.8.1736287161958; Tue, 07 Jan 2025 13:59:21 -0800 (PST) MIME-Version: 1.0 References: <20250107074724.1756696-1-yosryahmed@google.com> <20250107074724.1756696-2-yosryahmed@google.com> <20250107180345.GD37530@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 13:58:45 -0800 X-Gm-Features: AbW1kvbVWZ-2eLzm0L7j3z9hgsVFgf1vgJ2zHgq6aQ27r1p46DWoZo3tGSd-uLM Message-ID: Subject: Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug To: Johannes Weiner Cc: Andrew Morton , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Kanchana P Sridhar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 0353680017 X-Stat-Signature: zibyhjpgdg3zgd7wskuo356bx38pwrwm X-Rspam-User: X-HE-Tag: 1736287162-315236 X-HE-Meta: U2FsdGVkX1/k3hpgaX/yvIvBoTa9TgDfSF5bCRtwngiqnEB9vf6AImHDvWHz3yw9lalZir6qDlupn2Lu7oDXaRURtdI9NACRnb1PzfvQXGP453s+gLErohE9v50dZnW4dV+FtyGbzGp8+RpG8qEpArE0z18hrLYyp7+8AGswKkMJjWC81cLI6i0hs/lvdej3krKoQtw68vQzAN41mKVXffzdJdTF/96nscR46T6wKfSPaqxRRWPvWDv3Bqd6uwwWRvAjxl9WmpBBEvCFfsjpeD0RvAnJ7znf5vkbdUV/rEbOPpI+lLcUycidaGwhdEsoo+gEDlZ2NhA7/Xapo9gbIS32+8xfKHQBFYBa87svhNIFS7YbnD1Lq8vNwpCSC0DEqR5BuxsvEHnqZ601Oo3N230LPJ1ZjtHs8tYUJ7QlKeBYE4db/d5P7qMvwdXTFSyJK4qsCrpbchgsxqpyc1oZUCJdZx7LKPhJOqXgfrCUExM+Xa/lCMRGNq750Z1bHaZ6n2bHch/rWJNTYArSQgXHuT/KzPCqBpcuPwnVKPTCdtdwa8+APxI5oH3xFBHVlsyktbfOlNszmioVW+eTnRZkv937oVchFZ4HEGov+I0x58HSebUSVg8c3W54phuMN7BJxC6r7XU3n0D6jkwwOWGQxfY1UWbUx4oBHzf03SZ9KPYb5A7hAosaT2i5ke2nIwI5xPCsMZEG8dEuUJOL3mBW5cBefxpjOvu3+hNXQiINErrpDATLK72uJkw/uiuP1XgG1oiepLaRtLAUoTX7CR9WhgQTjedZJIAeYpAQ/xJnA1aG/9eGxkPf9/tB1MmgfwynBatv1q/waOAEvfweAbwB75kz7vwm8g+bVSJpxzHBs6rQ7phTLxOlrMR6c7dB1IUSr2yelTVxhUOHXu8k6eGpElQlH8eRwK9boMdsSEX2GgmNgdSwANSuIr15t6AoYEd4dc4nlDqFSnGWvuxrjRp ObjKpxQ0 Peqb/LPmRr5T1otvxJL+LMxxikgXs1tuBJ4AMm/AfebfvrfvduJc1odhE8KZFVBOZMYjewsQXiOOhW3zFZE/r8LRqQpdYLUPLBGw5Ip80nllD4fTVg5YA3GV1B28j+5URkkmbj6j+HbZweKgVPXLV/S5gBiSOjpsXp6gyBIzo799T/0bQ9+8Ris5y1xP33XGWDHSFipx4EDByA6orqpXXldxNHNgVlwSnCtVA/EppgajvfF+CdHKPFd4OHWQdPHvHqHKyL79uAfSgvcQd6fplpd5IX4wM7zo4umv7H3FuTN9p8XtGTQ+1bIre9WZ/OJZ+vNGZQ9ldGaocZe4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 1:42=E2=80=AFPM Yosry Ahmed = wrote: > > On Tue, Jan 7, 2025 at 12:16=E2=80=AFPM Yosry Ahmed wrote: > > > > On Tue, Jan 7, 2025 at 10:13=E2=80=AFAM Yosry Ahmed wrote: > > > > > > On Tue, Jan 7, 2025 at 10:03=E2=80=AFAM Johannes Weiner wrote: > > > > > > > > On Tue, Jan 07, 2025 at 07:47:24AM +0000, Yosry Ahmed wrote: > > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx= of the > > > > > current CPU at the beginning of the operation is retrieved and us= ed > > > > > throughout. However, since neither preemption nor migration are = disabled, > > > > > it is possible that the operation continues on a different CPU. > > > > > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still = in use, > > > > > we run into a UAF bug as the resources attached to the acomp_ctx = are freed > > > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: mov= e to use > > > > > crypto_acomp API for hardware acceleration") when the switch to t= he > > > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_com= p was > > > > > retrieved using get_cpu_ptr() which disables preemption and makes= sure the > > > > > CPU cannot go away from under us. Preemption cannot be disabled = with the > > > > > crypto_acomp API as a sleepable context is needed. > > > > > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer t= o > > > > > per-acomp_ctx") increased the UAF surface area by making the per-= CPU > > > > > buffers dynamic, adding yet another resource that can be freed fr= om under > > > > > zswap compression/decompression by CPU hotunplug. > > > > > > > > > > There are a few ways to fix this: > > > > > (a) Add a refcount for acomp_ctx. > > > > > (b) Disable migration while using the per-CPU acomp_ctx. > > > > > (c) Use SRCU to wait for other CPUs using the acomp_ctx of the CP= U being > > > > > hotunplugged. Normal RCU cannot be used as a sleepable context is > > > > > required. > > > > > > > > > > Implement (c) since it's simpler than (a), and (b) involves using > > > > > migrate_disable() which is apparently undesired (see huge comment= in > > > > > include/linux/preempt.h). > > > > > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for = hardware acceleration") > > > > > Cc: > > > > > Signed-off-by: Yosry Ahmed > > > > > Reported-by: Johannes Weiner > > > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmp= xchg.org/ > > > > > Reported-by: Sam Sun > > > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD= 5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > > > --- > > > > > mm/zswap.c | 31 ++++++++++++++++++++++++++++--- > > > > > 1 file changed, 28 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index f6316b66fb236..add1406d693b8 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -864,12 +864,22 @@ static int zswap_cpu_comp_prepare(unsigned = int cpu, struct hlist_node *node) > > > > > return ret; > > > > > } > > > > > > > > > > +DEFINE_STATIC_SRCU(acomp_srcu); > > > > > + > > > > > static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_no= de *node) > > > > > { > > > > > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_= pool, node); > > > > > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->ac= omp_ctx, cpu); > > > > > > > > > > if (!IS_ERR_OR_NULL(acomp_ctx)) { > > > > > + /* > > > > > + * Even though the acomp_ctx should not be currentl= y in use on > > > > > + * @cpu, it may still be used by compress/decompres= s operations > > > > > + * that started on @cpu and migrated to a different= CPU. Wait > > > > > + * for such usages to complete, any news usages wou= ld be a bug. > > > > > + */ > > > > > + synchronize_srcu(&acomp_srcu); > > > > > > > > The docs suggest you can't solve it like that :( > > > > > > > > Documentation/RCU/Design/Requirements/Requirements.rst: > > > > > > > > Also unlike other RCU flavors, synchronize_srcu() may **not** be > > > > invoked from CPU-hotplug notifiers, due to the fact that SRCU gra= ce > > > > periods make use of timers and the possibility of timers being > > > > temporarily =E2=80=9Cstranded=E2=80=9D on the outgoing CPU. This = stranding of timers > > > > means that timers posted to the outgoing CPU will not fire until > > > > late in the CPU-hotplug process. The problem is that if a notifie= r > > > > is waiting on an SRCU grace period, that grace period is waiting = on > > > > a timer, and that timer is stranded on the outgoing CPU, then the > > > > notifier will never be awakened, in other words, deadlock has > > > > occurred. This same situation of course also prohibits > > > > srcu_barrier() from being invoked from CPU-hotplug notifiers. > > > > > > Thanks for checking, I completely missed this. I guess it only works > > > with SRCU if we use call_srcu(), but then we need to copy the pointer= s > > > to a new struct to avoid racing with the CPU getting onlined again. > > > Otherwise we can just bite the bullet and add a refcount, or use > > > migrate_disable() despite that being undesirable. > > > > > > Do you have a favorite? :) > > > > I briefly looked into refcounting. The annoying thing is that we need > > to handle the race between putting the last refcount in > > zswap_compress()/zswap_decompress(), and the CPU getting onlined again > > and re-initializing the refcount. One way to do it would be to put all > > dynamically allocated resources in a struct with the same struct with > > the new refcount, and use RCU + refcounts to allocate and free the > > struct as a whole. > > > > I am leaning toward just disabling migration for now tbh unless there > > are objections to that, especially this close to the v6.13 release. > > (Sorry for going back and forth on this, I am essentially thinking out lo= ud) > > Actually, as Kanchana mentioned before, we should be able to just hold > the mutex in zswap_cpu_comp_dead() before freeing the dynamic > resources. The mutex is allocated when the pool is created and will > not go away during CPU hotunplug AFAICT. It confused me before because > we call mutex_init() in zswap_cpu_comp_prepare(), but it really should > be in zswap_pool_create() after we allocate the pool->acomp_ctx. Nope. It's possible for zswap_cpu_comp_dead() to hold the mutex and free the resources after zswap_[de]compress() calls raw_cpu_ptr() but before it calls mutex_lock().