From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B6B1E77197 for ; Wed, 8 Jan 2025 00:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 765526B0082; Tue, 7 Jan 2025 19:12:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 713236B0083; Tue, 7 Jan 2025 19:12:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 601E16B0088; Tue, 7 Jan 2025 19:12:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 438B86B0082 for ; Tue, 7 Jan 2025 19:12:49 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C3F2DB055B for ; Wed, 8 Jan 2025 00:12:48 +0000 (UTC) X-FDA: 82982358816.17.B54C31B Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) by imf10.hostedemail.com (Postfix) with ESMTP id E906BC000A for ; Wed, 8 Jan 2025 00:12:46 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M5UP6wyV; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736295166; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zyFOZ8N3L8gopi8oYWFnwAPpujFC0e4PTIWnAX+e2tY=; b=b9cZ6d6mwk6XbdFQDh0tqODEKfOzbKw6gcyTPcVrhURwQYrLyuuL34lbf4I1GJBz3rRU1K d1HcSwystMHNrV0guU2WKGd5QkC4kajov5GqkWMYfcbhYGEixxZvbIjsxiVv3hN0JIbqNX WRLvzLQemuCIiHOngVT6BKVGr9f5NZc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736295167; a=rsa-sha256; cv=none; b=Fo1h6OJXKZgxp2eezlsHmAodthqz+vR2XNfyrGqfWQdD8PsOBXvp1rBwqycpvuHJ51Ra/k cTdJLm810INjqaTWQGmrmRtu0cvTQdpobc+oLx+cHBK5v5bvSLJe8Oc3FDlG+BA32yF7qL F8ofAr1WnqSZlykhQA3WOZcrgUhbldk= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M5UP6wyV; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-6d8fa32d3d6so103118206d6.2 for ; Tue, 07 Jan 2025 16:12:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736295166; x=1736899966; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zyFOZ8N3L8gopi8oYWFnwAPpujFC0e4PTIWnAX+e2tY=; b=M5UP6wyVvAZJj3mKgmi+gmY5OrdKw48zb2c4CR7ONb1SstRsKA4n8ZLy+hN1OT8sy6 TfUoZHUsgWeHmZ/zqEZFeAZKnCvRprUrmEtweLpd/cK+zPaFZpsNsEe1ZH6i/bdimeLZ dUt1ny+OfUMnYnEspD8KdpGYIhR19epkCFpe6yzvZzg3emFbVcH5zc8+S5BhP5p3uv3y w/qG3tLQeA6S/sf8aYNpkU+4dOqVBIFiyc0iMCUKJ3sGVcAcs47YMrwoHJYkJkHQW/Ml VKaG7gX5I2/mewHAXjw40N/G9dGub+31usvzTKCsdMUaxqVtQLL3MuheI/OnjHdxXBNb EUPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736295166; x=1736899966; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zyFOZ8N3L8gopi8oYWFnwAPpujFC0e4PTIWnAX+e2tY=; b=QTg/EH6DlIVMUDBn3o6sYPNxarA8Q1O6/I4TvsRxpnQuyGnKEYWi2NZAIHM29Ii37K ZMReOAu9t3YdbnfTf5ALQZ1/s/VYqsdo3FgAuM2EN+QjGpIL4JD2LIA/TNkfAGoL/zrA DvqVkKzOGxoXh1CRluN2HKst20FGVHifFEMZTAf5ZwFsPblRr0wazYEZ5i8dqX4tNJG/ CHK3UmFIYcWLyVFa0f7FPSZVMe3xpnCBXK5xOrPtqsdToVsSdIMm1Ouhvj7jart8D9eb mX2uNQQtGnUHL3nierDxOeVBrbPYP1AoEBGkVChMa081ojPlBJVkvLcx+al6+BW5GHA6 r+Vg== X-Forwarded-Encrypted: i=1; AJvYcCUtNi7ZvEsE8Fi3fP/GXeYOHUGUhlLBFn2UkZeGqe5AaGJxBVDJ001+U+n3eDGINoKR5NO3/iLtZA==@kvack.org X-Gm-Message-State: AOJu0Yxion7DLwit8DNKUkgm0gkBnOzVzpCnpMZdpn32rY3SvjGe9ZQE MJu0aV6l/ydZy9eIWfzwB10LC+Ac3Sm8IyqUKtYfx2HICCOKu4SdjnDEk09uPPMf4UVjqrTaAVg xX0bQWpyksPxuVxF/abH92akcPEsg5NrTRuGi X-Gm-Gg: ASbGnctAn3XuAl8Ti0sjYLhOk1ooadbmWPieWjRppq46sVLjN1YWzww6vE0ERYoINq8 l2DZSWS8NPYV7bGLY49Kh1pYUIBG7/xolBmIhFSKv71kQGoRgVqjb50sDQsltTz8nPXIF X-Google-Smtp-Source: AGHT+IGG1H1HkngClefMwX+rP3psq1Y99MvS2W7HfkMNMccBO5UF8GeRK9GanfB9Z03ow6XNJJ+6jgAUXI6fK0/c2q8= X-Received: by 2002:a05:6214:300f:b0:6d8:a730:110c with SMTP id 6a1803df08f44-6df9b301163mr16423506d6.38.1736295165819; Tue, 07 Jan 2025 16:12:45 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> <20250107222236.2715883-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 16:12:09 -0800 X-Gm-Features: AbW1kvYzpWpq2pT1tCl4QyyuO236RYxeZfy1X4uJ4OsxzPPVJfcp_Hvu7Tl6yXs Message-ID: Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx To: "Sridhar, Kanchana P" Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Barry Song , Sam Sun , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E906BC000A X-Stat-Signature: 8z9npfps4gn7bayb785m5zws9k8tjnfq X-Rspam-User: X-HE-Tag: 1736295166-65724 X-HE-Meta: U2FsdGVkX1+mip8Pyq+++We9xUOE+g0J2pgk6zENLFLBRixWzySX6OWk0VpsSfVJOgcIXYI+TeezzUfb21XlaRZcDmN5gYRTAKPwuSLdP1cJQ0G19b8xuRdeHj8jJbNW4/wcfTZaMoFk8E2TJpnrgNDfMQ6Xpkt7auUn/0nk++k4By9AHT39PDpwP9XFKt7l00eqw1gUF95RmqfRZGG7EYk/QxPZp5aCOsZefjSBJJj56MRPjEIqENHmSSKKTjcBXFwDzxKX0OLvZ70s+PPp3UGVh1XNciHlXvcbJ4trVGw8pwiGxhyUME3JFq6w4BcOLrRIEv+p84cnx1GMNAQh8zzcvP3sb9QB3WePcNrNNBcSgozD0FRzLh1yCxXGkSIsVol3uv3kwBzoP0gbflEXB8neRJiPt+Dj3lmfRkqvx9qlgstvHMOqIh/fMAFKoTSkS+JBW5mVHIZ0WKlGB/nuRR7PghnwV/dF/fK1mjmhk3WapOAvpxGhWbvIFk5DnRBGUgPZF5kAE7YRslR5037+IHJI+bLQ59PJpchWR7n4TJ+q9SmBnN/6zHnjJ1RMnzkCIx0thq7aUZhGXvstL7LUXIFQ79PFEBAXP85UDpAWsRynOyFCcUe6IdAwyuKVV57AmDTYXnQOAiL1mtjPSWHqhAu6D2TWO4XWt/m8F+LXvLAVpyBM0IPU6OwD1SG7dre2E9OUEjeQkovnMXZrHBjp3pSjD+yxYIExjuc+BhnKtJgKHoyvEZoN+krK+jC+uVZyejCLENU+HoxFB0OeybSoRKmzWiSdSuXywS6eITvg73CKcKU+zQSXKUZfoKHUETLzwxvCTagzlg8pA/KDNitRslTWZreKq4h5I0NC6fvcYXjv/YwSoToNH8d8Sto+AfbExHvM6/g5Q9TlUoFntP7yA2vZo800fLFCYlry7tlM8ia530XVrUjzTOSOy3wqkcg66Xje6VFdnTJxraLGgWV zD2N4xR7 Xxz4CyiY3FVfxEqerH4vmQyBXzVx+ySsWIrcFiBra6oBv+AW7D1F3w1RwmP+6DPOE6lgQuew5d/ftDcXIzAgfNiVTCacDKtwAi54bE1c6RIj0BkuvXJ+oqu1+v0OMdcPBXjnC9fteUbhbGQcuwonVyPqRetbEGNVq7ExD5MTvinBVmdo2/nSytnnCOtJ9IssLb3Zkrm0hYzf1UOMNu/NQPM9c6OmGXgfk8/+oUJwVZtDOWZYCONCzF2ibuKFOWCObT6KaVFTsGhjz5NMHtRl6ZbJk/CvbRrTCFrt9+Fvo5qm89tp95bE7ekFngLJSXta1X/qY8F5a8mq13N8jFhtFcpwxlSzXBZj8sxAsRn9LXS1kCF2GDP8Ndt4fetoTrAwKDTe89XWxl8v4uIYRaYSjEhMG7sFrkRaqRejKbLUfuhE6VNp97mD17+fcH6PRnug+YpRlt9dKAhKh200Dkwbqv8KUPMMNSrxSTuFSAnsXyz7dNN0TPt7hl6vrcAgcfY3x5ALpU8qvdkfi9ocVLAgLYCJievHmYE4xx62jawTTPPPuSeM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 4:02=E2=80=AFPM Sridhar, Kanchana P wrote: > > Hi Yosry, > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Tuesday, January 7, 2025 2:23 PM > > To: Andrew Morton > > Cc: Johannes Weiner ; Nhat Pham > > ; Chengming Zhou ; > > Vitaly Wool ; Barry Song ; Sam > > Sun ; Sridhar, Kanchana P > > ; linux-mm@kvack.org; linux- > > kernel@vger.kernel.org; Yosry Ahmed ; > > stable@vger.kernel.org > > Subject: [PATCH v2 2/2] mm: zswap: disable migration while using per-CP= U > > acomp_ctx > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of > > the > > current CPU at the beginning of the operation is retrieved and used > > throughout. However, since neither preemption nor migration are disabl= ed, > > it is possible that the operation continues on a different CPU. > > > > If the original CPU is hotunplugged while the acomp_ctx is still in use= , > > we run into a UAF bug as the resources attached to the acomp_ctx are fr= eed > > during hotunplug in zswap_cpu_comp_dead(). > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to > > use > > crypto_acomp API for hardware acceleration") when the switch to the > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > > retrieved using get_cpu_ptr() which disables preemption and makes sure = the > > CPU cannot go away from under us. Preemption cannot be disabled with t= he > > crypto_acomp API as a sleepable context is needed. > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to > > per-acomp_ctx") increased the UAF surface area by making the per-CPU > > buffers dynamic, adding yet another resource that can be freed from und= er > > zswap compression/decompression by CPU hotunplug. > > > > This cannot be fixed by holding cpus_read_lock(), as it is possible for > > code already holding the lock to fall into reclaim and enter zswap > > (causing a deadlock). It also cannot be fixed by wrapping the usage of > > acomp_ctx in an SRCU critical section and using synchronize_srcu() in > > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in > > CPU-hotplug notifiers (see > > Documentation/RCU/Design/Requirements/Requirements.rst). > > > > This can be fixed by refcounting the acomp_ctx, but it involves > > complexity in handling the race between the refcount dropping to zero i= n > > zswap_[de]compress() and the refcount being re-initialized when the CPU > > is onlined. > > > > Keep things simple for now and just disable migration while using the > > per-CPU acomp_ctx to block CPU hotunplug until the usage is over. > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for > > hardware acceleration") > > Cc: > > Signed-off-by: Yosry Ahmed > > Reported-by: Johannes Weiner > > Closes: > > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ > > Reported-by: Sam Sun > > Closes: > > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O > > curuL4tPg6OaQ@mail.gmail.com/ > > --- > > mm/zswap.c | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb236..ecd86153e8a32 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cpu, > > struct hlist_node *node) > > return 0; > > } > > > > +/* Remain on the CPU while using its acomp_ctx to stop it from going o= ffline > > */ > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct > > crypto_acomp_ctx __percpu *acomp_ctx) > > +{ > > + migrate_disable(); > > + return raw_cpu_ptr(acomp_ctx); > > +} > > + > > +static void acomp_ctx_put_cpu(void) > > +{ > > + migrate_enable(); > > +} > > + > > static bool zswap_compress(struct page *page, struct zswap_entry *entr= y, > > struct zswap_pool *pool) > > { > > @@ -893,8 +905,7 @@ static bool zswap_compress(struct page *page, struc= t > > zswap_entry *entry, > > gfp_t gfp; > > u8 *dst; > > > > - acomp_ctx =3D raw_cpu_ptr(pool->acomp_ctx); > > - > > + acomp_ctx =3D acomp_ctx_get_cpu(pool->acomp_ctx); > > mutex_lock(&acomp_ctx->mutex); > > > > dst =3D acomp_ctx->buffer; > > @@ -950,6 +961,7 @@ static bool zswap_compress(struct page *page, struc= t > > zswap_entry *entry, > > zswap_reject_alloc_fail++; > > > > mutex_unlock(&acomp_ctx->mutex); > > + acomp_ctx_put_cpu(); > > I have observed that disabling/enabling preemption in this portion of the= code > can result in scheduling while atomic errors. Is the same possible while > disabling/enabling migration? IIUC no, migration disabled is not an atomic context. > > Couple of possibly related thoughts: > > 1) I have been thinking some more about the purpose of this per-cpu acomp= _ctx > mutex. It appears that the main benefit is it causes task blocked er= rors (which are > useful to detect problems) if any computes in the code section it co= vers take a > long duration. Other than that, it does not protect a resource, nor = prevent > cpu offlining from deleting its containing structure. It does protect resources. Consider this case: - Process A runs on CPU #1, gets the acomp_ctx, and locks it, then is migrated to CPU #2. - Process B runs on CPU #1, gets the same acomp_ctx, and tries to lock it then waits for process A. Without the lock they would be using the same lock. It is also possible that process A simply gets preempted while running on CPU #1 by another task that also tries to use the acomp_ctx. The mutex also protects against this case. > > 2) Seems like the overall problem appears to be applicable to any per-cpu= data > that is being used by a process, vis-a-vis cpu hotunplug. Could it b= e that a > solution in cpu hotunplug can safe-guard more generally? Really not = sure > about the specifics of any solution, but it occurred to me that this= may not > be a problem unique to zswap. Not really. Static per-CPU data and data allocated with alloc_percpu() should be available for all possible CPUs, regardless of whether they are online or not, so CPU hotunplug is not relevant. It is relevant here because we allocate the memory dynamically for online CPUs only to save memory. I am not sure how important this is as I am not aware what the difference between the number of online and possible CPUs can be in real life deployments. > > Thanks, > Kanchana > > > return comp_ret =3D=3D 0 && alloc_ret =3D=3D 0; > > } > > > > @@ -960,7 +972,7 @@ static void zswap_decompress(struct zswap_entry > > *entry, struct folio *folio) > > struct crypto_acomp_ctx *acomp_ctx; > > u8 *src; > > > > - acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); > > + acomp_ctx =3D acomp_ctx_get_cpu(entry->pool->acomp_ctx); > > mutex_lock(&acomp_ctx->mutex); > > > > src =3D zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); > > @@ -990,6 +1002,7 @@ static void zswap_decompress(struct zswap_entry > > *entry, struct folio *folio) > > > > if (src !=3D acomp_ctx->buffer) > > zpool_unmap_handle(zpool, entry->handle); > > + acomp_ctx_put_cpu(); > > } > > > > /********************************* > > -- > > 2.47.1.613.gc27f4b7a9f-goog >