From: Barry Song <baohua@kernel.org>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Vitaly Wool <vitalywool@gmail.com>,
Sam Sun <samsun1006219@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] mm: zswap: properly synchronize freeing resources during CPU hotunplug
Date: Thu, 9 Jan 2025 10:38:40 +1300 [thread overview]
Message-ID: <CAGsJ_4x=ZNJwj4R6Z3c5-rPs-D5xzTnv039=8VQxo7OdyxavMg@mail.gmail.com> (raw)
In-Reply-To: <SJ0PR11MB56781DA3F7B94E44753FAB51C9122@SJ0PR11MB5678.namprd11.prod.outlook.com>
On Thu, Jan 9, 2025 at 9:23 AM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Wednesday, January 8, 2025 8:15 AM
> > To: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>; Nhat Pham
> > <nphamcs@gmail.com>; Chengming Zhou <chengming.zhou@linux.dev>;
> > Vitaly Wool <vitalywool@gmail.com>; Barry Song <baohua@kernel.org>; Sam
> > Sun <samsun1006219@gmail.com>; Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com>; linux-mm@kvack.org; linux-
> > kernel@vger.kernel.org; Yosry Ahmed <yosryahmed@google.com>;
> > stable@vger.kernel.org
> > Subject: [PATCH] mm: zswap: properly synchronize freeing resources during
> > CPU hotunplug
> >
> > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> > the
> > current CPU at the beginning of the operation is retrieved and used
> > throughout. However, since neither preemption nor migration are
> > disabled, it is possible that the operation continues on a different
> > CPU.
> >
> > If the original CPU is hotunplugged while the acomp_ctx is still in use,
> > we run into a UAF bug as some of the resources attached to the acomp_ctx
> > are freed during hotunplug in zswap_cpu_comp_dead().
> >
> > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> > use crypto_acomp API for hardware acceleration") when the switch to the
> > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
> > retrieved using get_cpu_ptr() which disables preemption and makes sure
> > the CPU cannot go away from under us. Preemption cannot be disabled
> > with the crypto_acomp API as a sleepable context is needed.
> >
> > During CPU hotunplug, hold the acomp_ctx.mutex before freeing any
> > resources, and set acomp_ctx.req to NULL when it is freed. In the
> > compress/decompress paths, after acquiring the acomp_ctx.mutex make sure
> > that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed
> > by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
> >
> > This adds proper synchronization to ensure that the acomp_ctx resources
> > are not freed from under compress/decompress paths.
> >
> > Note that the per-CPU acomp_ctx itself (including the mutex) is not
> > freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and
> > acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU
> > after it is hotunplugged.
>
> Only other fail-proofing I can think of is to initialize the mutex right after
> the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple
> it from the cpu onlining. This further clarifies the intent for this mutex
> to be used at the same lifetime scope as the acomp_ctx itself, independent
> of cpu hotplug/hotunplug.
Good catch! That step should have been executed immediately after
calling alloc_percpu(). Initially, the mutex was dynamically
allocated and initialized in zswap_cpu_comp_prepare(). Later, it
was moved to the context and allocated statically. It would be
better to relocate the mutex_init() accordingly.
>
> Thanks,
> Kanchana
>
> >
> > Previously a fix was attempted by holding cpus_read_lock() [1]. This
> > would have caused a potential deadlock as it is possible for code
> > already holding the lock to fall into reclaim and enter zswap (causing a
> > deadlock). A fix was also attempted using SRCU for synchronization, but
> > Johannes pointed out that synchronize_srcu() cannot be used in CPU
> > hotplug notifiers [2].
> >
> > Alternative fixes that were considered/attempted and could have worked:
> > - Refcounting the per-CPU acomp_ctx. This involves complexity in
> > handling the race between the refcount dropping to zero in
> > zswap_[de]compress() and the refcount being re-initialized when the
> > CPU is onlined.
> > - Disabling migration before getting the per-CPU acomp_ctx [3], but
> > that's discouraged and is a much bigger hammer than needed, and could
> > result in subtle performance issues.
> >
> > [1]https://lkml.kernel.org/20241219212437.2714151-1-
> > yosryahmed@google.com/
> > [2]https://lkml.kernel.org/20250107074724.1756696-2-
> > yosryahmed@google.com/
> > [3]https://lkml.kernel.org/20250107222236.2715883-2-
> > yosryahmed@google.com/
> >
> > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
> > hardware acceleration")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> > Closes:
> > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
> > Reported-by: Sam Sun <samsun1006219@gmail.com>
> > Closes:
> > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
> > curuL4tPg6OaQ@mail.gmail.com/
> > ---
> >
> > This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
> > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and
> > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
> > dropped.
> >
> > ---
> > mm/zswap.c | 42 +++++++++++++++++++++++++++++++++---------
> > 1 file changed, 33 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb236..4e3148050e093 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -869,17 +869,46 @@ static int zswap_cpu_comp_dead(unsigned int cpu,
> > struct hlist_node *node)
> > struct zswap_pool *pool = hlist_entry(node, struct zswap_pool,
> > node);
> > struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool-
> > >acomp_ctx, cpu);
> >
> > + mutex_lock(&acomp_ctx->mutex);
> > if (!IS_ERR_OR_NULL(acomp_ctx)) {
> > if (!IS_ERR_OR_NULL(acomp_ctx->req))
> > acomp_request_free(acomp_ctx->req);
> > + acomp_ctx->req = NULL;
> > if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
> > crypto_free_acomp(acomp_ctx->acomp);
> > kfree(acomp_ctx->buffer);
> > }
> > + mutex_unlock(&acomp_ctx->mutex);
> >
> > return 0;
> > }
> >
> > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(
> > + struct crypto_acomp_ctx __percpu *acomp_ctx)
> > +{
> > + struct crypto_acomp_ctx *ctx;
> > +
> > + for (;;) {
> > + ctx = raw_cpu_ptr(acomp_ctx);
> > + mutex_lock(&ctx->mutex);
> > + if (likely(ctx->req))
> > + return ctx;
> > + /*
> > + * It is possible that we were migrated to a different CPU
> > after
> > + * getting the per-CPU ctx but before the mutex was
> > acquired. If
> > + * the old CPU got offlined, zswap_cpu_comp_dead() could
> > have
> > + * already freed ctx->req (among other things) and set it to
> > + * NULL. Just try again on the new CPU that we ended up on.
> > + */
> > + mutex_unlock(&ctx->mutex);
> > + }
> > +}
> > +
> > +static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *ctx)
> > +{
> > + mutex_unlock(&ctx->mutex);
> > +}
> > +
> > static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > struct zswap_pool *pool)
> > {
> > @@ -893,10 +922,7 @@ static bool zswap_compress(struct page *page,
> > struct zswap_entry *entry,
> > gfp_t gfp;
> > u8 *dst;
> >
> > - acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
> > -
> > - mutex_lock(&acomp_ctx->mutex);
> > -
> > + acomp_ctx = acomp_ctx_get_cpu_lock(pool->acomp_ctx);
> > dst = acomp_ctx->buffer;
> > sg_init_table(&input, 1);
> > sg_set_page(&input, page, PAGE_SIZE, 0);
> > @@ -949,7 +975,7 @@ static bool zswap_compress(struct page *page, struct
> > zswap_entry *entry,
> > else if (alloc_ret)
> > zswap_reject_alloc_fail++;
> >
> > - mutex_unlock(&acomp_ctx->mutex);
> > + acomp_ctx_put_unlock(acomp_ctx);
> > return comp_ret == 0 && alloc_ret == 0;
> > }
> >
> > @@ -960,9 +986,7 @@ static void zswap_decompress(struct zswap_entry
> > *entry, struct folio *folio)
> > struct crypto_acomp_ctx *acomp_ctx;
> > u8 *src;
> >
> > - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > - mutex_lock(&acomp_ctx->mutex);
> > -
> > + acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool->acomp_ctx);
> > src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> > /*
> > * If zpool_map_handle is atomic, we cannot reliably utilize its
> > mapped buffer
> > @@ -986,10 +1010,10 @@ static void zswap_decompress(struct
> > zswap_entry *entry, struct folio *folio)
> > acomp_request_set_params(acomp_ctx->req, &input, &output,
> > entry->length, PAGE_SIZE);
> > BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > >req), &acomp_ctx->wait));
> > BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > - mutex_unlock(&acomp_ctx->mutex);
> >
> > if (src != acomp_ctx->buffer)
> > zpool_unmap_handle(zpool, entry->handle);
> > + acomp_ctx_put_unlock(acomp_ctx);
> > }
> >
> > /*********************************
> > --
> > 2.47.1.613.gc27f4b7a9f-goog
>
Thanks
Barry
prev parent reply other threads:[~2025-01-08 21:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 16:15 Yosry Ahmed
2025-01-08 20:23 ` Sridhar, Kanchana P
2025-01-08 20:50 ` Yosry Ahmed
2025-01-08 21:38 ` Barry Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGsJ_4x=ZNJwj4R6Z3c5-rPs-D5xzTnv039=8VQxo7OdyxavMg@mail.gmail.com' \
--to=baohua@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=kanchana.p.sridhar@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=samsun1006219@gmail.com \
--cc=stable@vger.kernel.org \
--cc=vitalywool@gmail.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox