From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ECEAE77198 for ; Tue, 7 Jan 2025 23:26:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB12A6B008C; Tue, 7 Jan 2025 18:26:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D60826B0092; Tue, 7 Jan 2025 18:26:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDB1C6B0095; Tue, 7 Jan 2025 18:26:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9812C6B008C for ; Tue, 7 Jan 2025 18:26:34 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3F32DA046F for ; Tue, 7 Jan 2025 23:26:34 +0000 (UTC) X-FDA: 82982242308.26.FA71D86 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf15.hostedemail.com (Postfix) with ESMTP id 5DFA0A0009 for ; Tue, 7 Jan 2025 23:26:32 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qUJXhl1K; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736292392; a=rsa-sha256; cv=none; b=63NeE0Xqr3pE6/F8BEIyU+dNIpbYZeF/ipSeJyoWRGfbUIEd7Z6N49c65e8hegRvZaHKCm EGVM1vw5evljEhRhKDgHatNMFJbNzd32KLuwl5caFZxxKZ4asEn//mtDtGpmNQEBdnUP4N 3xc2wfKW7CYMfeDgpdZD0mfGcA51IL4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qUJXhl1K; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736292392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8hGrSqrmdpX4tyBMP+VqZLOR834ah3UzKPlD5+TusBg=; b=aeyTiL/tYCbRStqzvkR10drIkxtArmYRCmo1jyDW5QaSOniOuo/RnjPMFMyl+R7Gvn/abb KGAa2B9idr6GnmGWXPu5bQw/CaH46EGWLe+TbiNuq/rCjvrMkBr8+vQRxGDNq4V8Y3YV8N ppRJFGPdgyNNv1xv3N3/dIR1rKshkp0= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6dd15d03eacso3056066d6.0 for ; Tue, 07 Jan 2025 15:26:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736292391; x=1736897191; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8hGrSqrmdpX4tyBMP+VqZLOR834ah3UzKPlD5+TusBg=; b=qUJXhl1KaIPxLiS6O+O80NVMHZM4nzNY1v1sSuK07G+RuLcxcWNZ24H4viiAinsOYH /hyyFDN8HRAttxVlD0iBinA9SYcXf7BvPzceAjG1bRWiIPjYAjaeQ/aOBDb0Zi2kNUBV GTvh2jLexAEyd7U925ZEg1p3yjnNVBFoSPnNUX53ou6yxWNgOSg8QFlaLa7v52OjqkZq 7VWIFOWBxGxDC5dRxc9cJ/7XbKAQmyU9q7B90c4HQs7ind9ZwtLM6/iVJSnJ7JA/oim6 VRIcu4qzMRkiMwW+y3eU9kRS2B3iDr8jUkas38rC+O4RIfM/q50ubJEy7bUHcvTNhZmW zVXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736292391; x=1736897191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8hGrSqrmdpX4tyBMP+VqZLOR834ah3UzKPlD5+TusBg=; b=FT0JimsBjQidHZScpTjM0fTxgD1TKbTw77w6C9MuWsl43HZqATHcXDGj3vJUNwrHkl EFtT9u8au+p0pm+P7opozreMZrPoeoPyzkdvREmrX5mEJfeaAamLsaDOsMvdqQKtCqpG Wq/eS8FhThAX9nUe3zUTXSyEHCLA0gXrAp/0CsGInuF77L8un42V9Fq9/9gz8r2KVxr/ JtJ2RKPkWClo8QhsoKfWOrtEKzhpMIfaoxy52OBzEGjWETij0vptRFpqzGx0iBhPZhXi g++ZUFsY5ZzFyoQId/YDGcgBpglO26upqxGBT7meTlhnvUNcHCZRn6EGPCtiAyFZs1vK PFfg== X-Forwarded-Encrypted: i=1; AJvYcCVYfCs3kXQAV7/H8xAtEyivxYrvn0eNzoj3UuPCq8cWlh3LSVrKRH6zzwnuqVzlmAb27fhgjIEs7w==@kvack.org X-Gm-Message-State: AOJu0YzYEg6vqR4qvYVVMlPswl0AaLpGjWTRE5y00vAuUnK8IqZ3MdQ7 hTFK+eMoUNkzx+9fRwBFPsdwQzrw79CoDzj35MMwlyQ/KCT8T0Ai522wP67lAGFYEFHKh0yWXWK 99RVjtD/0tfBzsUkUTBGqLnZp22MH4fXYQEZE X-Gm-Gg: ASbGncvlvCnX1bKBR2FxtnL3HSYc00p637MMr8OZnOWNlMJ3mgPBp/Y3RlDZoUc7FZb zkMDa68QjyAtxKHyLPoyC2O3rSnepSEM1V2kIgQU5tGJoT1D+He3LGDj4XtzIVY1MmePN X-Google-Smtp-Source: AGHT+IGeo4buhf3gUhtYgSoQ5NrDJaoAHdwhJtgc4THHGbI2Xo7Th1vgKNT5hAoKBC0XC89QB4ds+0P/GLNv1cPW9NM= X-Received: by 2002:ad4:5969:0:b0:6d9:162a:27a2 with SMTP id 6a1803df08f44-6df8e89a962mr73887636d6.21.1736292391235; Tue, 07 Jan 2025 15:26:31 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> <20250107222236.2715883-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 15:25:54 -0800 X-Gm-Features: AbW1kvZgg6_6Cmw-lUETFjN7suc5PDdh2Ki8tuiK3wxZF9aqlELvcIfv6Q9XBqA Message-ID: Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx To: Barry Song Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5DFA0A0009 X-Stat-Signature: j1eebd5y8o8fthryimj4shm45ep6ewu6 X-Rspam-User: X-HE-Tag: 1736292392-216324 X-HE-Meta: U2FsdGVkX1/8nO4bNA/hlNJtHtOvWNL2I9ABCYk86tGyhvc6TNqCWZASv0a6D2s0PaqlUcrIU1uOEC9oq3+5qw2jzAKD0zzZ+aFC4DYWEfI+fwV8+pb8whvirP4QjTNhhOVy2B7VfeHFUvreFnVktCgAzNgrZ7zCaujuOb+HBp9cmubmcmHDPN27NHJNzYoD4+myWi47xo0GtaMKblWcXBQNt4Rf+OOSHu8yxTqfeednql6apu5VI2FxJIyfqpsB9tDzl3qoVpTAxtlkmiEfqM8wc0PyzzNDnXQFCZ4VuR0gbWkm80m1CaljDmIzprKSXyHumy97bma8svTxFjJ1l0RVANXlPgptUI9qhejN9/XsN6rvlX5JteHlo5TMkNe7lc//nzKKMcZ3M60rStRnAR3fTB0kultyFP8VdSqgrooovZnamxghXvYKyABWW/LQE3JCBiSZYsYswmR7UgEHMUjDvDUj2yRg42PUNo37vxwNh3Qv+SOHQgwwUZ8PeZteFBbLFvSP7HDzkK4v4TIphhPTK1w5qMcoBJZWwRiGcKfukfoDAR+J9WYB022bExyQI/mWeBp2lxyBKySMxZCwPagy1kXGxDIlofa1CGz72BA8CBXoQ+JM74yLmjmeBy3QDJtBo8mjfc40Vjg7G6AZhAF9Mej/uBAPnteOQCzFjqervWnU/8RVMqEVZiPt5SrW1ai8XxCpSKwJtjsQFQTRyfQAjCS3dEQTDzYhbrquUf4JxjSXldcW46o5j2DmvhYl1sFuvnwRWOMdDsv6Zeljz2SZH09oqQI3Q2r+WRPLN5FOH3SsDEdm0ydqM++UHeYJ9oJjnsDd+58kukN08N+vx6zeucpSkfGUXiOw9DJFjl5u/5z0NEczCGATR5ZKhpUjYLJHCUKq32ORQhjU+v1ZIpNZmJofLmmLAH9WkF7q+Ti/8CiHg6ZcQv7PCshX+wZJFmdZvV2ryv07LeK32Vj 1rNzRTnF i5TrkumAHumfXnmUgxXeM0MgUFNjPfv0UyKGdIh1vYsCaTUghunnP0T3jfZUnfQbLZWhXvVNSs5XEoj4hUgKXQkFOV11xDUzO5cAo6KyqZnsrZahCvPWfO/KA4SOM8ZRGKtNdWDcOpz9LNz6ZjnFWcprTpFGfvJ5bBWGH3v+vnqVqBFgM/GVBqsM0B88BmdrZvAmauBmtzz8KLTgvJx+r2a8puV9VjQAOrHs6Z1toI/pNOAefiSbNyuimQI163cFQg6IM9nlsp3TJiSPTA1ugpx6w/hgdqHUg2ijaiP4qB6GO4pmrYxPKGZxFpYcF5k0rPb0sIT+t7fJHONmJHkhgv2/w6lFxthdi7h1S4BQTVMeWqqHwipi2yWlUJg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 2:47=E2=80=AFPM Barry Song wrote= : > > On Wed, Jan 8, 2025 at 11:22=E2=80=AFAM Yosry Ahmed wrote: > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of th= e > > current CPU at the beginning of the operation is retrieved and used > > throughout. However, since neither preemption nor migration are disabl= ed, > > it is possible that the operation continues on a different CPU. > > > > If the original CPU is hotunplugged while the acomp_ctx is still in use= , > > we run into a UAF bug as the resources attached to the acomp_ctx are fr= eed > > during hotunplug in zswap_cpu_comp_dead(). > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to u= se > > crypto_acomp API for hardware acceleration") when the switch to the > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was > > retrieved using get_cpu_ptr() which disables preemption and makes sure = the > > CPU cannot go away from under us. Preemption cannot be disabled with t= he > > crypto_acomp API as a sleepable context is needed. > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to > > per-acomp_ctx") increased the UAF surface area by making the per-CPU > > buffers dynamic, adding yet another resource that can be freed from und= er > > zswap compression/decompression by CPU hotunplug. > > > > This cannot be fixed by holding cpus_read_lock(), as it is possible for > > code already holding the lock to fall into reclaim and enter zswap > > (causing a deadlock). It also cannot be fixed by wrapping the usage of > > acomp_ctx in an SRCU critical section and using synchronize_srcu() in > > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in > > CPU-hotplug notifiers (see > > Documentation/RCU/Design/Requirements/Requirements.rst). > > > > This can be fixed by refcounting the acomp_ctx, but it involves > > complexity in handling the race between the refcount dropping to zero i= n > > zswap_[de]compress() and the refcount being re-initialized when the CPU > > is onlined. > > > > Keep things simple for now and just disable migration while using the > > per-CPU acomp_ctx to block CPU hotunplug until the usage is over. > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardwa= re acceleration") > > Cc: > > Signed-off-by: Yosry Ahmed > > Reported-by: Johannes Weiner > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.o= rg/ > > Reported-by: Sam Sun > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7= Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > --- > > mm/zswap.c | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb236..ecd86153e8a32 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cpu, s= truct hlist_node *node) > > return 0; > > } > > > > +/* Remain on the CPU while using its acomp_ctx to stop it from going o= ffline */ > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_= ctx __percpu *acomp_ctx) > > +{ > > + migrate_disable(); > > I'm not entirely sure, but I feel it is quite unsafe. Allowing sleep > during migrate_disable() and > migrate_enable() would require the entire scheduler, runqueue, > waitqueue, and CPU > hotplug mechanisms to be aware that a task is pinned to a specific CPU. My understanding is that sleeping is already allowed when migration is disabled (unlike preemption). See delete_all_elements() in kernel/bpf/hashtab.c for example, or __bpf_prog_enter_sleepable() in kernel/bpf/trampoline.c. I am not sure exactly what you mean. > > If there is no sleep during this period, it seems to be only a > runqueue issue=E2=80=94CPU hotplug can > wait for the task to be unpinned while it is always in runqueue. > However, if sleep is involved, > the situation becomes significantly more complex. > > If static data doesn't consume much memory, it could be the simplest solu= tion. Do you mean allocating the buffers and requests for all possible CPUs instead of allocating them dynamically in CPU hotplug notifiers? I am not sure how much more memory this would be. Seems like it depends on CONFIG options and the firmware.