From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93641E77197 for ; Tue, 7 Jan 2025 20:17:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A15D06B0088; Tue, 7 Jan 2025 15:17:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C56E6B008C; Tue, 7 Jan 2025 15:17:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 866076B0092; Tue, 7 Jan 2025 15:17:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 691456B0088 for ; Tue, 7 Jan 2025 15:17:28 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B1AE2140C3D for ; Tue, 7 Jan 2025 20:17:27 +0000 (UTC) X-FDA: 82981765734.29.1EDC119 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf30.hostedemail.com (Postfix) with ESMTP id CF2AE80017 for ; Tue, 7 Jan 2025 20:17:25 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1CqXSGpt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736281045; a=rsa-sha256; cv=none; b=SxMlyGYMTVEHFWyWDjDACG3xgd7JCqrHCMKzrvf7Ndbv5gNcsEgTAohYmVNjDuPfyCHMkn 9Pf2U+pwDQVcZUAR400ooEE30Y6T4e8jwPXh+HO76H1Ud7SiLfk42TzM+5kWDWVwdk9/Q4 yX27Ta5+gN+jH8r+eSaWxK8O38mwBPQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1CqXSGpt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736281045; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mvqe1PC/G3IaqYSQdQJ+R7eS0VozXBmHvJNCzQEjWOc=; b=tr+PhDCHC02MKR+VsZbnTc1bxk+XAXx4uDQlSSKvx8CH90VyhgmbEGH9drQN0u7zO2ouS3 vv90WN4JXuGhA8nH+Zt8ecjndmdhtghnqvaNrpkuLPt4yTcBxEkmp4nqFk90j/9wRuVmpw XFGMLVV4unvHQYb5kLn5B/emMoczXns= Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6d8e8cb8605so87377976d6.0 for ; Tue, 07 Jan 2025 12:17:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736281045; x=1736885845; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Mvqe1PC/G3IaqYSQdQJ+R7eS0VozXBmHvJNCzQEjWOc=; b=1CqXSGptbTcmzi1yZIjAskvg7LXkJ+dbQwKYjiVwhywTr5zKNSyJmx+Bh657bfLOGh Mc+0UmRnuUSE45AsUUxbafd11/MbLi0HfBq2KRsItVv8g4J+vJprM8KXuTtS07SDyTGx tBsfk2iZPIKgkhLCoOZ/ekeiqFPEa/bZs6S3mlU0u5gpjBW4WZ6Cqo1KBSqoDPy101XL y3FnM4D83QNZEeDIYjd0nkxZ5cImBJ5uFi9p63P/dkro0bvJLnDXAsK+sFCjkUR90k5G mBrwaFjIFiyz7y+Ck6zQVJDgs0gduqBGSTL3dK2a7foW74ObzdUMIBLTMjZpq455UUKK Mkuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736281045; x=1736885845; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mvqe1PC/G3IaqYSQdQJ+R7eS0VozXBmHvJNCzQEjWOc=; b=ho5TKA8aTFc9xpn8VzPFFoyC97bu+Cf3y1iAcaazzzgOioEESOakhG5YuuefyNdZ4o Irmu2LqOTT7W+oQ2Gx5WYKCylKG+nXmEE6Hm5Iy0gC8/Pd+R6bAy1ocNM70HOzOKcxm4 BMO2LFE3RjoefbahsyruMi3ygqXardPp+hDtUoG5DRv1s14XFej0F8bw+pRU/R+k+XZr QQCk9WfB4+hUhag3dKe1m2jP7vWC/Xg09eAjo/Tn2qiI6JnZSw/EfVG7jEbt5H//ovi6 3uqgTQgurWEG9gIxTkb7oOeqecR5ZcWDjONtN4NJQyC+OtUdN/vezQygLFXqg5gOKllm refA== X-Forwarded-Encrypted: i=1; AJvYcCVv13Vu5iFbf73xAor3eiLmSnHaY7VeMwOKByCBoMyNqSVqys+l7OPOelehWOXSXr2isi+QE468Mg==@kvack.org X-Gm-Message-State: AOJu0YwjFMgEHGM5gp9RMVGOtis9/cgGB9o9uzAIkl/m266QcvNc6eE1 OQsASF6uVhbYbFIWIRQyCOPuNN+dI2zkU900ujMHHXhvNUd9wns/J3fYMVXicbA7FnQtlg7H9Cp BjAkOgQEWdOEyoA0qqIr7t86vwv9QioGfrOqL X-Gm-Gg: ASbGncs8Lp0irJQwSv5BymIYRSnHF89vwOU1Ir0jUIuKdFkT3BJKs/jaPTotoQUmlBu w2W5MnMOF1EHgxSD02RoYyyYtpWgdlpAxfcG26Af61s5nqWql2Zs0/ujB1Fh/tksCsig= X-Google-Smtp-Source: AGHT+IHOlbU5xytV1QZ1QdFWKgLei0GKmFJDuvjy5ycmQswyGFlT/hcDvf1+Km1IHbTITrw5X34E/o64bLQMikjR+jE= X-Received: by 2002:a05:6214:1313:b0:6d8:a7f7:c3a1 with SMTP id 6a1803df08f44-6df9b2e71afmr6705226d6.40.1736281044696; Tue, 07 Jan 2025 12:17:24 -0800 (PST) MIME-Version: 1.0 References: <20250107074724.1756696-1-yosryahmed@google.com> <20250107074724.1756696-2-yosryahmed@google.com> <20250107180345.GD37530@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 12:16:48 -0800 X-Gm-Features: AbW1kvbV4tcEHZ7JvQg_yo9txrH78u1aichCSuxgH_Q_va6juVYqGkoVkLNf3KY Message-ID: Subject: Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug To: Johannes Weiner Cc: Andrew Morton , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CF2AE80017 X-Stat-Signature: 74a78nfqo7pr9rufk69ok7j8chu8diuh X-HE-Tag: 1736281045-913964 X-HE-Meta: U2FsdGVkX18Ml7B55R9fdGGDGeZTHCS0A2EaNCPuO/RzR87UiWQXnIgTsQqBOJNHNcUWYR1KG9pDBK8KZJK5FXb3F3Q7U3C1vjnrA3l3b8uNjzXtj9FNku4aJ9JykUwv+lP4FR4MMDtyfWUpMD/5DIkzIpYauIJZh/ZQ/KmtPdY1H9XFQ+fqLYFhbw97inCQX93NlopH1tXdKX8TkHj6D1as1EOccabHRtGcqTy94I+wmCe0SuWhZVpPHFd9xqplgCYdJiCaxUFpIPcamR+58rXXkEGRmLj1X9uNrS6VW2mZt4tzvle3zxQRjgPo4n+WHDJxV8rcyfE7KrDah5YPr4OY0j/D0kuiWucJZULnaqcEbBim08zO7Kv6zxTjekiXqMKHG4djx/+McLJbN3xehp84O3unseyonLVkM9SVyQp/cjxb+MDSp+t/p92UWOfBg+LuHWzn7ntNJDNuGft9ZT5Nm+ndhw9jSSOm0Tpa3bSR3KbsNWeMZYVf5Jjsyw2YJh4dXbHzRV/Eix98cjVKfyuAY7TQcBsrxGV979ywhYroZBjq2mQH/SGe/XH38lspgqpF68QLSXmpV3aiPvPC+4YKFHa1ecup3jE0HIxJv2tg7xwDV+ejZ5+W96JHrAdr2yaT2GMO2eBjKeZbvhZq4ldPznPkf0S3dRpQjrpvq9sWAE1Fxlxtaer6VuJZL2VyynEcLrAMqMXy92zpdcr2KweXU/o14vm9TmfR034k0FAsQxK9Vwcx5ZqdasZszAdhOs9bvfORb5f8w4ltCG9rmmox6bgBSV3XPFr2uICGKjZkj0Lt1JJC2kElCoG14P6ORPnz+sGcejt57jCWS0QEpYq9bMhRpulQ7Syx8/n1a+zsFIPxXthNOTBIaZMQ2P8bszPKgDycYGHRA77Ep0/QOwI4mmtk6MnPrVsOvLSI1CqAOJ2dFuAUdJzpHj2L4+oFDRFgD4C3Y8T/n1ec8lP 3f8DS6uJ 891VU3R6KWPeeFe8GuDZCny8qlq/VAkzs76QCbHNZrOk+iT/4OwWFagzRLpZV6+3H76+1W5HPrA9RTpe8HGOvjsgXH+AOzmc3NOqX1xIKaj8OGu49+3hGoZ1FJioq6e/KNcxAbYW/4EDpW0AjXh9Weduj/Hp9pB/cTcyGmiYkgXMzv0PPHWZEfGBwJh/m5kl7o5cuFYEw/czYKCyQuTYU3oT8iJULsuVK5H1tMqchQ3pkwhwsoAtqBLJ+JqEEIplK4zBbl+xT3cUXt+QA3O8WjUwPsOuUL15L7tqGIUvSe3VTUVEOXDML7TCANUk/edspT9gFkKD5T2GKqbw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 10:13=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, Jan 7, 2025 at 10:03=E2=80=AFAM Johannes Weiner wrote: > > > > On Tue, Jan 07, 2025 at 07:47:24AM +0000, Yosry Ahmed wrote: > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of = the > > > current CPU at the beginning of the operation is retrieved and used > > > throughout. However, since neither preemption nor migration are disa= bled, > > > it is possible that the operation continues on a different CPU. > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still in u= se, > > > we run into a UAF bug as the resources attached to the acomp_ctx are = freed > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to= use > > > crypto_acomp API for hardware acceleration") when the switch to the > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp wa= s > > > retrieved using get_cpu_ptr() which disables preemption and makes sur= e the > > > CPU cannot go away from under us. Preemption cannot be disabled with= the > > > crypto_acomp API as a sleepable context is needed. > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to > > > per-acomp_ctx") increased the UAF surface area by making the per-CPU > > > buffers dynamic, adding yet another resource that can be freed from u= nder > > > zswap compression/decompression by CPU hotunplug. > > > > > > There are a few ways to fix this: > > > (a) Add a refcount for acomp_ctx. > > > (b) Disable migration while using the per-CPU acomp_ctx. > > > (c) Use SRCU to wait for other CPUs using the acomp_ctx of the CPU be= ing > > > hotunplugged. Normal RCU cannot be used as a sleepable context is > > > required. > > > > > > Implement (c) since it's simpler than (a), and (b) involves using > > > migrate_disable() which is apparently undesired (see huge comment in > > > include/linux/preempt.h). > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hard= ware acceleration") > > > Cc: > > > Signed-off-by: Yosry Ahmed > > > Reported-by: Johannes Weiner > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg= .org/ > > > Reported-by: Sam Sun > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2= e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > --- > > > mm/zswap.c | 31 ++++++++++++++++++++++++++++--- > > > 1 file changed, 28 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index f6316b66fb236..add1406d693b8 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -864,12 +864,22 @@ static int zswap_cpu_comp_prepare(unsigned int = cpu, struct hlist_node *node) > > > return ret; > > > } > > > > > > +DEFINE_STATIC_SRCU(acomp_srcu); > > > + > > > static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *= node) > > > { > > > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool= , node); > > > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_= ctx, cpu); > > > > > > if (!IS_ERR_OR_NULL(acomp_ctx)) { > > > + /* > > > + * Even though the acomp_ctx should not be currently in= use on > > > + * @cpu, it may still be used by compress/decompress op= erations > > > + * that started on @cpu and migrated to a different CPU= . Wait > > > + * for such usages to complete, any news usages would b= e a bug. > > > + */ > > > + synchronize_srcu(&acomp_srcu); > > > > The docs suggest you can't solve it like that :( > > > > Documentation/RCU/Design/Requirements/Requirements.rst: > > > > Also unlike other RCU flavors, synchronize_srcu() may **not** be > > invoked from CPU-hotplug notifiers, due to the fact that SRCU grace > > periods make use of timers and the possibility of timers being > > temporarily =E2=80=9Cstranded=E2=80=9D on the outgoing CPU. This stra= nding of timers > > means that timers posted to the outgoing CPU will not fire until > > late in the CPU-hotplug process. The problem is that if a notifier > > is waiting on an SRCU grace period, that grace period is waiting on > > a timer, and that timer is stranded on the outgoing CPU, then the > > notifier will never be awakened, in other words, deadlock has > > occurred. This same situation of course also prohibits > > srcu_barrier() from being invoked from CPU-hotplug notifiers. > > Thanks for checking, I completely missed this. I guess it only works > with SRCU if we use call_srcu(), but then we need to copy the pointers > to a new struct to avoid racing with the CPU getting onlined again. > Otherwise we can just bite the bullet and add a refcount, or use > migrate_disable() despite that being undesirable. > > Do you have a favorite? :) I briefly looked into refcounting. The annoying thing is that we need to handle the race between putting the last refcount in zswap_compress()/zswap_decompress(), and the CPU getting onlined again and re-initializing the refcount. One way to do it would be to put all dynamically allocated resources in a struct with the same struct with the new refcount, and use RCU + refcounts to allocate and free the struct as a whole. I am leaning toward just disabling migration for now tbh unless there are objections to that, especially this close to the v6.13 release.