From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B93B5E77188 for ; Wed, 8 Jan 2025 21:38:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38A226B007B; Wed, 8 Jan 2025 16:38:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 339FC6B0083; Wed, 8 Jan 2025 16:38:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2021F6B0085; Wed, 8 Jan 2025 16:38:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 022FD6B007B for ; Wed, 8 Jan 2025 16:38:54 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A24F1160B05 for ; Wed, 8 Jan 2025 21:38:54 +0000 (UTC) X-FDA: 82985599788.20.F5C2D77 Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) by imf06.hostedemail.com (Postfix) with ESMTP id ABFC718000A for ; Wed, 8 Jan 2025 21:38:52 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736372332; a=rsa-sha256; cv=none; b=wKQ2rg9h+KBFvo6mB8tLqUR779ax8ljBvQ0nNpzCvhMx+mnas5pCHbR8GoYK6eVClR307I 3ZK2AtLQTmG0Ry86DkGb8cAVQxwe3YyOAZFJToOz3X5DmDfgw+Hg+EpzezqHRS/j9wpWfK ZqQbO1c9goupeomd7F0JTovzGpXEsuU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736372332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3s0qVu/VqiCoKV6ZaSrjNbeV/DPOv/JZxoR4oPPdw1c=; b=b9hmNeyUeeHVkhZm+JD/cXmyow/8og4hWYRfjKteR0aQit5KAcCMakDOQDPoxJP3TR9iOP EfSxriZPLIwlnBvizK8gDQyR6TZTryYjqP1kcNvVYr7uLsbxlTZ6Lyf9GxqesGIu+Q1S8Z 5zybUNhJfJ8LutLTqQfuAJqCSAQ9wJQ= Received: by mail-vk1-f171.google.com with SMTP id 71dfb90a1353d-51bb029fdd1so145503e0c.0 for ; Wed, 08 Jan 2025 13:38:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736372332; x=1736977132; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3s0qVu/VqiCoKV6ZaSrjNbeV/DPOv/JZxoR4oPPdw1c=; b=PJtng7ZCcC4+c9WewArEw1Fw2jFML+rm2gIykt4hE76vEEVDYyCgnStSfHxkL9hTsG Wvg8gFvVsOifz3OtvoraWoSspc4/j6/cmgAXS0nREy3J5vA26/LuUoUghKaJqvePzqVS n9Z9px/ZDeaBYMKQBQZhzNH8S0nCAzXi8ydQ2KxEVwETe4Ci8meSBt2pezrhi5wv2pTU rCqgJn+7BzK/EL8ky1Qu3XdB24hDHmL7gAkLcDCtPbNT2oAw5NCFyMZgZcecRMmuu9vu xu6rd5QvgtOkvKwL3qMyIc5MtWkCsU5vEXv3pQlm2vEkwwRfvYsnUeqQ2E/N1ALiJppU ETTA== X-Forwarded-Encrypted: i=1; AJvYcCVnKjDQLBAmdFL9b8MkDPqFP2P5KSWH3PkURbAlyxI9L8Q7vm9arc6nYeOgvLYG/BAnkqfj485Cpg==@kvack.org X-Gm-Message-State: AOJu0Ywz3zSN9QjtMat44QDqVe7p/1+GycdaUTi2owVLd1m5Ndp/V8Ln Pg83giVZi93nXg7TFwWxBXlSqOphR6FHGdEK7ekzSYb2j5nk9O2MYXAQLqJozqF9u8q3Ug4jElx QAsTIpxhbjp3OMhAu2PvcNBY8k4E= X-Gm-Gg: ASbGncuTJOl9iEi35GGuSTBUKLbXFee1hbYai5DmTCoQqUxF7wADRz1P4Sg+080pwUM bH09QOBC25b7tw29Gut4+nrWa9/t6igjkASoLY47gSYXAr/KTSlTfIKuI8R2/3AIXCQky5Xc8 X-Google-Smtp-Source: AGHT+IEKXuweN+FxZk3E2jSDi14xQY//UX97egwLOz0Pcdn4d06tNUmeiyRBB/h+7mXAkoTl2Or4WLRmUbP5AINKOXs= X-Received: by 2002:a05:6122:3bd6:b0:50d:6a43:d525 with SMTP id 71dfb90a1353d-51c6c22ea8fmr3712039e0c.1.1736372331678; Wed, 08 Jan 2025 13:38:51 -0800 (PST) MIME-Version: 1.0 References: <20250108161529.3193825-1-yosryahmed@google.com> In-Reply-To: From: Barry Song Date: Thu, 9 Jan 2025 10:38:40 +1300 X-Gm-Features: AbW1kvarJ64wHxrb7SK34eR15amEr79qITZ5h0D5l4xRysF8EGjxmolC7_JBV0g Message-ID: Subject: Re: [PATCH] mm: zswap: properly synchronize freeing resources during CPU hotunplug To: "Sridhar, Kanchana P" Cc: Yosry Ahmed , Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: ABFC718000A X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: euiii5mq6sskxtai3gszczmofycccu9f X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1736372332-767887 X-HE-Meta: U2FsdGVkX19Ttc1jMMwRUidDFJT2Q5a2MRR0Q5HioO1gNUzFdIJ4QeuOyRnVgy6U0GMw1RwWe4XvnIFInFZaSrca6HwTCDPqOsYKjm6Evp0bn1gyeqB2hsJy1GpZCd8VxzwkktduMkeK6n/N2FR8Q1okucZFhzHG4LWbazCPhOPePtwtFNBTsWdAslA5yTFv4hJrmHVA6v8n1oCazLBLgwhumYrGCnLqW4naFnXe1zMKo3zLllfOyg6eXGqEtQkEZkHXqZr34c7j8Qy9JNxu6uZlDtpzT3orJ5XVR97GLQqzOM2DFo1RevRV0y6pFY+jsU53FkzTvOu6nShxppLsA5OtnKO+5riSRydxj8u/preIpiJ0I/WpnsXS2Xgqz4PDKBaT9pDSKoxGZ7o63QAiLnpY5axc70cSIvaAv7upPA8OmAwd4xvAuctP4wWhxEQwhn4cW+77SsMuEAQH/uGCar4WW4Jq29n0VL8ztdSTjucaW1qTTaxFHxts4/5EdCv73XgqFHSfqCSuTZMCdCX9VHQGjlPY5k43K0gFr9P/ZpU1GD8jywZttHjXpK5JXpDvbMp7GhE7Jn9MDxEIntRJhv8IAsKQqRSQLI/uvwdkvXyYlldBGE6LoaD/t3fEUqw3rqBaFS8GkhE9nfnpHpnB9prDXwn3wshmClW8MGMITEAw7TpupHzj7YtCXeQzbq6Z4uHdv9hAmyb+55/MJhtbx9qICiFj0soww7H84ITdlntCJsOYGsV0e35SKtugTFmeU1DfeIsY5iThHs9ZC8+EMnzEh1lyg3dTfJVDTmGOmOH3uo80ZxoHPlbAbhteV0bTyslTjZBlL5wRzrubVSuYNofHEXxQt4r4Y1WDmuWL7/Y5OzJ4DAMDuMv29IYvieWsZkMc7LpV/IhyYZdOTuzH1SiHjAzQ/bgI4Evbo1JVendU8u0PmwRVRdth2PFRnVO9biHmhwBM88fY1plK6ja RbiekKJr +ulR4FyEK/Buf5pbJWPDy02/seJ9jluAOtbTIVlV2x6C4aPvTOCAhuhPQHjVbqimQGf+uZl3BzoXEJYwitR44kEzFQxYevaZ2QeLT9A2Y0WXQLcdbv0qQpIT/HgIPPZ88NgIvbSlUmiKDvUvE/zSNzVBA/Nv4+sw0FDixduY78WpAYY7YEX2kaY97JxOn9lGdEHlnt3u/kuJh6bdnShtvy+zUiPiE5mahOnDsJQk1bxdTBA6ycFLPUQB1zVfzcBnRlHSdKCns9lnL+uR0jBO88EIPm8c/Up4qIzNVeIwt40GSQ7u1+VNKQU8zSzclsUO/h1tTBzmqyZ5Y4dyF2WZycsbHvgbIXlFToCs2rhCJe42TOdBS+tpNJgzRNzGUfOjjfBP4eAhAbg6fsQZ9mvwvlM/l6sWVsbJWf9YUUezt3oEYCvxhFJVefNWBn12dlTqlnVaPujRHuikwL8Hb4fqijDLtJ9FXVu7JpCfZYQ6cf4Ie7WLkkrFSMvLL4rOInDnL/wrSgSzSzBsUN6mzw/nR+4uHySMiAy4qflWXZL/NchPQ2/RypL29OcsfpTugfZ0gLxw6bWITX03fTl3iR+p37QEnzzETMMhmpoRZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 9, 2025 at 9:23=E2=80=AFAM Sridhar, Kanchana P wrote: > > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Wednesday, January 8, 2025 8:15 AM > > To: Andrew Morton > > Cc: Johannes Weiner ; Nhat Pham > > ; Chengming Zhou ; > > Vitaly Wool ; Barry Song ; Sam > > Sun ; Sridhar, Kanchana P > > ; linux-mm@kvack.org; linux- > > kernel@vger.kernel.org; Yosry Ahmed ; > > stable@vger.kernel.org > > Subject: [PATCH] mm: zswap: properly synchronize freeing resources duri= ng > > CPU hotunplug > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of > > the > > current CPU at the beginning of the operation is retrieved and used > > throughout. However, since neither preemption nor migration are > > disabled, it is possible that the operation continues on a different > > CPU. > > > > If the original CPU is hotunplugged while the acomp_ctx is still in use= , > > we run into a UAF bug as some of the resources attached to the acomp_ct= x > > are freed during hotunplug in zswap_cpu_comp_dead(). > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to > > use crypto_acomp API for hardware acceleration") when the switch to the > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > > retrieved using get_cpu_ptr() which disables preemption and makes sure > > the CPU cannot go away from under us. Preemption cannot be disabled > > with the crypto_acomp API as a sleepable context is needed. > > > > During CPU hotunplug, hold the acomp_ctx.mutex before freeing any > > resources, and set acomp_ctx.req to NULL when it is freed. In the > > compress/decompress paths, after acquiring the acomp_ctx.mutex make sur= e > > that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed > > by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU= . > > > > This adds proper synchronization to ensure that the acomp_ctx resources > > are not freed from under compress/decompress paths. > > > > Note that the per-CPU acomp_ctx itself (including the mutex) is not > > freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and > > acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU > > after it is hotunplugged. > > Only other fail-proofing I can think of is to initialize the mutex right = after > the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple > it from the cpu onlining. This further clarifies the intent for this mute= x > to be used at the same lifetime scope as the acomp_ctx itself, independen= t > of cpu hotplug/hotunplug. Good catch! That step should have been executed immediately after calling alloc_percpu(). Initially, the mutex was dynamically allocated and initialized in zswap_cpu_comp_prepare(). Later, it was moved to the context and allocated statically. It would be better to relocate the mutex_init() accordingly. > > Thanks, > Kanchana > > > > > Previously a fix was attempted by holding cpus_read_lock() [1]. This > > would have caused a potential deadlock as it is possible for code > > already holding the lock to fall into reclaim and enter zswap (causing = a > > deadlock). A fix was also attempted using SRCU for synchronization, but > > Johannes pointed out that synchronize_srcu() cannot be used in CPU > > hotplug notifiers [2]. > > > > Alternative fixes that were considered/attempted and could have worked: > > - Refcounting the per-CPU acomp_ctx. This involves complexity in > > handling the race between the refcount dropping to zero in > > zswap_[de]compress() and the refcount being re-initialized when the > > CPU is onlined. > > - Disabling migration before getting the per-CPU acomp_ctx [3], but > > that's discouraged and is a much bigger hammer than needed, and could > > result in subtle performance issues. > > > > [1]https://lkml.kernel.org/20241219212437.2714151-1- > > yosryahmed@google.com/ > > [2]https://lkml.kernel.org/20250107074724.1756696-2- > > yosryahmed@google.com/ > > [3]https://lkml.kernel.org/20250107222236.2715883-2- > > yosryahmed@google.com/ > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for > > hardware acceleration") > > Cc: > > Signed-off-by: Yosry Ahmed > > Reported-by: Johannes Weiner > > Closes: > > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ > > Reported-by: Sam Sun > > Closes: > > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O > > curuL4tPg6OaQ@mail.gmail.com/ > > --- > > > > This applies on top of the latest mm-hotfixes-unstable on top of 'Rever= t > > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and > > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was > > dropped. > > > > --- > > mm/zswap.c | 42 +++++++++++++++++++++++++++++++++--------- > > 1 file changed, 33 insertions(+), 9 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb236..4e3148050e093 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu, > > struct hlist_node *node) > > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool, > > node); > > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool- > > >acomp_ctx, cpu); > > > > + mutex_lock(&acomp_ctx->mutex); > > if (!IS_ERR_OR_NULL(acomp_ctx)) { > > if (!IS_ERR_OR_NULL(acomp_ctx->req)) > > acomp_request_free(acomp_ctx->req); > > + acomp_ctx->req =3D NULL; > > if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) > > crypto_free_acomp(acomp_ctx->acomp); > > kfree(acomp_ctx->buffer); > > } > > + mutex_unlock(&acomp_ctx->mutex); > > > > return 0; > > } > > > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock( > > + struct crypto_acomp_ctx __percpu *acomp_ctx) > > +{ > > + struct crypto_acomp_ctx *ctx; > > + > > + for (;;) { > > + ctx =3D raw_cpu_ptr(acomp_ctx); > > + mutex_lock(&ctx->mutex); > > + if (likely(ctx->req)) > > + return ctx; > > + /* > > + * It is possible that we were migrated to a different CP= U > > after > > + * getting the per-CPU ctx but before the mutex was > > acquired. If > > + * the old CPU got offlined, zswap_cpu_comp_dead() could > > have > > + * already freed ctx->req (among other things) and set it= to > > + * NULL. Just try again on the new CPU that we ended up o= n. > > + */ > > + mutex_unlock(&ctx->mutex); > > + } > > +} > > + > > +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx) > > +{ > > + mutex_unlock(&ctx->mutex); > > +} > > + > > static bool zswap_compress(struct page *page, struct zswap_entry *entr= y, > > struct zswap_pool *pool) > > { > > @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page, > > struct zswap_entry *entry, > > gfp_t gfp; > > u8 *dst; > > > > - acomp_ctx =3D raw_cpu_ptr(pool->acomp_ctx); > > - > > - mutex_lock(&acomp_ctx->mutex); > > - > > + acomp_ctx =3D acomp_ctx_get_cpu_lock(pool->acomp_ctx); > > dst =3D acomp_ctx->buffer; > > sg_init_table(&input, 1); > > sg_set_page(&input, page, PAGE_SIZE, 0); > > @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struc= t > > zswap_entry *entry, > > else if (alloc_ret) > > zswap_reject_alloc_fail++; > > > > - mutex_unlock(&acomp_ctx->mutex); > > + acomp_ctx_put_unlock(acomp_ctx); > > return comp_ret =3D=3D 0 && alloc_ret =3D=3D 0; > > } > > > > @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry > > *entry, struct folio *folio) > > struct crypto_acomp_ctx *acomp_ctx; > > u8 *src; > > > > - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); > > - mutex_lock(&acomp_ctx->mutex); > > - > > + acomp_ctx =3D acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx); > > src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > > /* > > * If zpool_map_handle is atomic, we cannot reliably utilize its > > mapped buffer > > @@ -986,10 +1010,10 @@ static void zswap_decompress(struct > > zswap_entry *entry, struct folio *folio) > > acomp_request_set_params(acomp_ctx->req, &input, &output, > > entry->length, PAGE_SIZE); > > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx- > > >req), &acomp_ctx->wait)); > > BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); > > - mutex_unlock(&acomp_ctx->mutex); > > > > if (src !=3D acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > + acomp_ctx_put_unlock(acomp_ctx); > > } > > > > /********************************* > > -- > > 2.47.1.613.gc27f4b7a9f-goog > Thanks Barry