From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71B0AE77188 for ; Thu, 9 Jan 2025 00:26:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE4DE6B0089; Wed, 8 Jan 2025 19:26:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C95586B008C; Wed, 8 Jan 2025 19:26:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B356D6B0092; Wed, 8 Jan 2025 19:26:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9520B6B0089 for ; Wed, 8 Jan 2025 19:26:36 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0C0E716151F for ; Thu, 9 Jan 2025 00:26:36 +0000 (UTC) X-FDA: 82986022392.25.E978887 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by imf25.hostedemail.com (Postfix) with ESMTP id 25713A0005 for ; Thu, 9 Jan 2025 00:26:34 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EJeo6G5u; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.178 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736382394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dLgE0Q0cIPTZK8RuOUBCdOv21yWnN9ykzNp2HSvPXk8=; b=AFeCQ092dvRyHv25iuAvo/uTcW4cTTsPIgo1L/AJfZu2CyzTdFRCyAEUs07YzEGIwu6AWl K+O3y/IFscFLKvD/A2wGXjZfcw4LO3VzHoOCc3jRSaP88PWh7yYBu9KXgzQMcfMJnROQEM 1t9WSHhQbYfR4vCxfHDH4UeVSGAnj04= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EJeo6G5u; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.222.178 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736382394; a=rsa-sha256; cv=none; b=dfxfQj0oQoHY8m8Epg62BWrLgdMRHvx7lPVvHUQU2WmZvKwbEtuL2LK4WPNIevIr10f+Eb D2CTBYOrbuPeaRYkIh0/8TxT+GCDhQfNcMB75nYtCclAKuahIBkdQdG7tQGGJsQQjOtBfY ukSv+abMpwEN/Zld3oalV6aEjqOzzX0= Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-7b98a2e3b3eso15379885a.2 for ; Wed, 08 Jan 2025 16:26:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736382393; x=1736987193; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dLgE0Q0cIPTZK8RuOUBCdOv21yWnN9ykzNp2HSvPXk8=; b=EJeo6G5uLHLw0fI3PX2f9UQ/Vs40+glWxm6K9khj2ULkvQtF3QdYBRDOi9leFinwK7 0wWxkrozBijDaqhEGKRqjP07WtWCKW2e5WR53cHFZyK91q+4GJI1Mdiy92Cli01tJvCd XG3CYvYj1hG7nX7Vgf9qpZq60jMsRee2uz7eXurfcN47F0xubvL8wOKs74CBDzMuaEiv SEA38RDJmIYy9Qr25nuL3Pj0hJuI0RwDqQthPM+wEkupxhVNipBV0X6yBdS9Gp1ioMSx JqgNDpbFTbyEQDAKEDpL7dv0EUCimUf73UleZIlXTpE6va2+UcdWE/KLUxOy/ratmkSc KhtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736382393; x=1736987193; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dLgE0Q0cIPTZK8RuOUBCdOv21yWnN9ykzNp2HSvPXk8=; b=tpRqb5d9e8L0qyiqLg3NER6/5bHQT+iZxBN6E30PfiIShHUU+UbJ52JEkxco6hYY6N 2rlGBwYhJ+meofhq1oZd4N/g8TKlY2XL/z6yaj+0hIZIr3OVG6WAKyUF0Jb5QttnEmud MiaD1QXeCixGikwqVX5XMwI6PZ1XnwvAY0DcFHpUZpEQmvCEVva4JMEpycXIDjUzrvzF xij0RDIN6TKTgfnTlZKMwK4rtpUXygs6O2kaZo5fK93LQg8reh0k9PCHGR/hqV+w2Ywu vsgV/t2/5Ap7Q+g/ldudPel54zq7SrOb0T0UcaCCz9+tiIhrzDrAcf4RQR3bY/u7oDeY 3BZA== X-Forwarded-Encrypted: i=1; AJvYcCU9nWq1zCJDjYBnCKaAkg7w6Cwz6VHACi1g7J2eA6XrcJ1bBMQ5FpeYt/kJUxL7UpytfSCkvTtf7w==@kvack.org X-Gm-Message-State: AOJu0YzO4o1bi3tqV/lm+fnwH7E52hX9qJeUlSK6WjynSf3qlKSdTMkn /pm/v03UO7s0gh8q1hi1ou5J3ayvpUGAUF68qtLkEMCE+K2E0CLor3wDrSsMZuTVlXDFKn2V5JP RYwoW7eesj/4j+yVTeAvXLGLGhob1eswDlT75 X-Gm-Gg: ASbGnctuITq8Je/IYGv3KFIgmV1xmU8HKQh0UdnMg/3SviopbD45j2gAL3d1dd6J/k3 ivTeJdhquqQBabhDYI8W9T/eQ9A85XLgGwt0= X-Google-Smtp-Source: AGHT+IENYv9CnknFzOJbCPZ2UbTQBoseHnypwVSub8bpZrdmMEr6ykgHuNKpG3EaOV6/1f5d27arO2reMg67ASRSi90= X-Received: by 2002:a05:6214:2a4e:b0:6d8:9a85:5b34 with SMTP id 6a1803df08f44-6df9b1d22a4mr78269516d6.10.1736382392701; Wed, 08 Jan 2025 16:26:32 -0800 (PST) MIME-Version: 1.0 References: <20250108222441.3622031-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 8 Jan 2025 16:25:56 -0800 X-Gm-Features: AbW1kvahwVALjXZE_lkhWbY4X6wXHB4twdvoLsLVkOmzbIi19-49HpmhuP3WZF0 Message-ID: Subject: Re: [PATCH v2] mm: zswap: properly synchronize freeing resources during CPU hotunplug To: "Sridhar, Kanchana P" Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 25713A0005 X-Rspam-User: X-Stat-Signature: dduonuwfdaos8u3ghbhcuqmi8akf33rx X-HE-Tag: 1736382394-752233 X-HE-Meta: U2FsdGVkX19puiFs9CSWpJeeCa8dnxFIhqFJ7KJckfXWOz+wSYQ16VzVbm6ktRb8M8JvCoWe18WBLnAxKcIn+JQk7pidsaaWeeiMTMIvmd6Sjyvlbd3H2pw6fpTxZpg5hzO1CMadJgSJU7n4CwqfYzAkmkQWBmUwcN9YagogyKUZojPnhgBzb1sGIy/AR06rwejATANJKTMQod1cwGOcRygz7Q1PpPCq9mkFuXfL4vzRGaw6C2w5AhXkT5Icd1+n5DrJOlrNSnskR6QCvlTWP1WOFOUCd9OUIDq9TEQd4hGYAJH0newR9j8qaZ6LjMY0Zkt+G+k0q1IaboGzpYU2lTuVYf1WfOEc6gk3QI9gVxb6Yx0GnmRK2Tsgp92XEHtuMQRJTJeRVY+Oz/7GZkfeLe4hkjwuCKAoYlbhF78AjgYEwxX7Z2SAVNVc2PlPvo51hBZXF6NjOSfxTy5uNqVaj1Wqc/7HOsJY1rT0BKqREdilyzalzV9FSrqJgcYIIDRDRYRGwM0SXyLGQmG1ta/C3Pg3vyu4isyjf2LB/QPHIGmdykotXR9WyZJvEZIs0RumnMCUp1XoIpYBGnIq0oPTn2rhO5wytdQjwAXTGaQjK117yae7DHy0WHx/yhvAvEy7AwNOsU/8dckW7P20yRP10GQ8g3gfSs5sPchMU21qUsbglavhlNJhhejc/aiRLumUPiqDF3MJ6Nz79anes/wEZeBcFXR4TjpvzCqGn5B6j5uKBFyfegAms3haOgaEwEnBssradEt5kQ/4uIjDqRiyuJyfvNH9DpZU54qqha6mpYJhzZ+jUDDVAMG5tYBGq8jcS43Cm5DGFElTM7089BAqxdv+DMGLBEWsepU6IL70sIyt5T5SUXKylaw1aMjOEsJrC1VlLJXbPncMGib58EtRrHMUteg8/nQru6e0SLJMfxPNJjGgHPGzqMBM9FDRc4mlAnegVpCj30LTr/V9Hv0 oncIbxXk kivjPk5lAqiX3GUU/7nuKbbY9wE4/OVf5JpX3SKpUi+/pi13gxNBH4bFEeIXu1CABpgd8aBaHDHeBuSVQ8g4MWw7G9wNgZ5zKZPMV+dbPj82tLJF/bP1ACutpyJAtjr0yq2H4ZTnSieKaPCNeaXlGD7DTH7nSmSDDBI7Z+id9wrWDBtFwn3VM0FFqMb0I0kxFLJCZohsONFzvdCv1yu2nTXWpUP9kobFVsngFrrRXGkmu79L0J0V8AMdfNKU5WS0hUlYss5iZ4Maxk/ka3Ana6BYG7Hj8BjszUKycMVB1Q7sy608qQUHlcDyvO+gmuKiurkTGhLSEB5Zn6emMcGLHKgtHyIl6ezTGIoPNIcnhu5p1xas3aK3z8omtvS8KQgoY+RZZn1rdtuO51Q32Yy7F3B7n7CYm7v+JrNmrnrz1fa1Pk4UKgoTwEWnFyinVIJX/nGKVCtvLhoTBBUADbyv2E/5NMFmArglIqkUD9nBxuTrscDz5domyR/w6BlwG+ZmJpABy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 4:12=E2=80=AFPM Sridhar, Kanchana P wrote: > > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Wednesday, January 8, 2025 2:25 PM > > To: Andrew Morton > > Cc: Johannes Weiner ; Nhat Pham > > ; Chengming Zhou ; > > Vitaly Wool ; Barry Song ; Sam > > Sun ; Sridhar, Kanchana P > > ; linux-mm@kvack.org; linux- > > kernel@vger.kernel.org; Yosry Ahmed ; > > stable@vger.kernel.org > > Subject: [PATCH v2] mm: zswap: properly synchronize freeing resources > > during CPU hotunplug > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of > > the > > current CPU at the beginning of the operation is retrieved and used > > throughout. However, since neither preemption nor migration are > > disabled, it is possible that the operation continues on a different > > CPU. > > > > If the original CPU is hotunplugged while the acomp_ctx is still in use= , > > we run into a UAF bug as some of the resources attached to the acomp_ct= x > > are freed during hotunplug in zswap_cpu_comp_dead() (i.e. > > acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp). > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to > > use crypto_acomp API for hardware acceleration") when the switch to the > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > > retrieved using get_cpu_ptr() which disables preemption and makes sure > > the CPU cannot go away from under us. Preemption cannot be disabled > > with the crypto_acomp API as a sleepable context is needed. > > > > Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating > > and freeing resources with compression/decompression paths. Make sure > > that acomp_ctx.req is NULL when the resources are freed. In the > > compression/decompression paths, check if acomp_ctx.req is NULL after > > acquiring the mutex (meaning the CPU was offlined) and retry on the new > > CPU. > > > > The initialization of acomp_ctx.mutex is moved from the CPU hotplug > > callback to the pool initialization where it belongs (where the mutex i= s > > allocated). In addition to adding clarity, this makes sure that CPU > > hotplug cannot reinitialize a mutex that is already locked by > > compression/decompression. > > > > Previously a fix was attempted by holding cpus_read_lock() [1]. This > > would have caused a potential deadlock as it is possible for code > > already holding the lock to fall into reclaim and enter zswap (causing = a > > deadlock). A fix was also attempted using SRCU for synchronization, but > > Johannes pointed out that synchronize_srcu() cannot be used in CPU > > hotplug notifiers [2]. > > > > Alternative fixes that were considered/attempted and could have worked: > > - Refcounting the per-CPU acomp_ctx. This involves complexity in > > handling the race between the refcount dropping to zero in > > zswap_[de]compress() and the refcount being re-initialized when the > > CPU is onlined. > > - Disabling migration before getting the per-CPU acomp_ctx [3], but > > that's discouraged and is a much bigger hammer than needed, and could > > result in subtle performance issues. > > > > [1]https://lkml.kernel.org/20241219212437.2714151-1- > > yosryahmed@google.com/ > > [2]https://lkml.kernel.org/20250107074724.1756696-2- > > yosryahmed@google.com/ > > [3]https://lkml.kernel.org/20250107222236.2715883-2- > > yosryahmed@google.com/ > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for > > hardware acceleration") > > Cc: > > Signed-off-by: Yosry Ahmed > > Reported-by: Johannes Weiner > > Closes: > > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ > > Reported-by: Sam Sun > > Closes: > > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O > > curuL4tPg6OaQ@mail.gmail.com/ > > --- > > > > This applies on top of the latest mm-hotfixes-unstable on top of 'Rever= t > > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and > > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was > > dropped. > > > > v1 -> v2: > > - Move the initialization of the mutex to pool initialization. > > - Use the mutex to also synchronize with the CPU hotplug callback (i.e. > > zswap_cpu_comp_prep()). > > - Naming cleanups. > > > > --- > > mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++--------- > > ---- > > 1 file changed, 46 insertions(+), 14 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb236..4d7e564732267 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char > > *type, char *compressor) > > struct zswap_pool *pool; > > char name[38]; /* 'zswap' + 32 char (max) num + \0 */ > > gfp_t gfp =3D __GFP_NORETRY | __GFP_NOWARN | > > __GFP_KSWAPD_RECLAIM; > > - int ret; > > + int ret, cpu; > > > > if (!zswap_has_pool) { > > /* if either are unset, pool initialization failed, and w= e > > @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char > > *type, char *compressor) > > goto error; > > } > > > > + for_each_possible_cpu(cpu) > > + mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex); > > + > > ret =3D > > cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE, > > &pool->node); > > if (ret) > > @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int > > cpu, struct hlist_node *node) > > struct acomp_req *req; > > int ret; > > > > - mutex_init(&acomp_ctx->mutex); > > - > > + mutex_lock(&acomp_ctx->mutex); > > acomp_ctx->buffer =3D kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, > > cpu_to_node(cpu)); > > - if (!acomp_ctx->buffer) > > - return -ENOMEM; > > + if (!acomp_ctx->buffer) { > > + ret =3D -ENOMEM; > > + goto buffer_fail; > > + } > > > > acomp =3D crypto_alloc_acomp_node(pool->tfm_name, 0, 0, > > cpu_to_node(cpu)); > > if (IS_ERR(acomp)) { > > @@ -844,6 +848,8 @@ static int zswap_cpu_comp_prepare(unsigned int > > cpu, struct hlist_node *node) > > ret =3D -ENOMEM; > > goto req_fail; > > } > > + > > + /* acomp_ctx->req must be NULL if the acomp_ctx is not fully > > initialized */ > > acomp_ctx->req =3D req; > > For this to happen, shouldn't we directly assign: > acomp_ctx->req =3D acomp_request_alloc(acomp_ctx->acomp); > if (!acomp_ctx->req) { ...} For the initial zswap_cpu_comp_prepare() call on a CPU, it doesn't matter because a failure will cause a failure of initialization or CPU onlining. For calls after a CPU is offlined, zswap_cpu_comp_dead() will have set acomp_ctx->req to NULL, so leaving it unmodified keeps it as NULL. Although I suppose the comment is not really clear because zswap_cpu_comp_prepare() could fail before acomp_request_alloc() is ever called and this would not be a problem (as the onlining will fail). Maybe it's best to just drop the comment. Andrew, could you please fold this in? diff --git a/mm/zswap.c b/mm/zswap.c index 4d7e564732267..30f5a27a68620 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -848,8 +848,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node) ret =3D -ENOMEM; goto req_fail; } - - /* acomp_ctx->req must be NULL if the acomp_ctx is not fully initialized */ acomp_ctx->req =3D req; crypto_init_wait(&acomp_ctx->wait); > > I was wondering how error conditions encountered in zswap_cpu_comp_prepar= e() > will impact zswap_[de]compress(). This is probably unrelated to this patc= h itself, > but is my understanding correct that an error in this procedure will caus= e > zswap_enabled to be set to false, which will cause any zswap_stores() to = fail early? Yeah that is my understanding, for the initial call. For later calls when a CPU gets onlined I think the CPU onlining fails and the acomp_ctx is not used.