From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A16DE77197 for ; Tue, 7 Jan 2025 23:38:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 860CB6B0082; Tue, 7 Jan 2025 18:38:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 810B46B0083; Tue, 7 Jan 2025 18:38:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B19A6B0088; Tue, 7 Jan 2025 18:38:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 49AB06B0082 for ; Tue, 7 Jan 2025 18:38:33 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E7DFF44F1F for ; Tue, 7 Jan 2025 23:38:32 +0000 (UTC) X-FDA: 82982272464.24.2500F17 Received: from mail-vk1-f179.google.com (mail-vk1-f179.google.com [209.85.221.179]) by imf09.hostedemail.com (Postfix) with ESMTP id 075A8140006 for ; Tue, 7 Jan 2025 23:38:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736293111; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dVG5MMYsXJNeiUqHwxOiADsVxvbz1uR1TXbbkCWmeBs=; b=fsgpLY9rQnuk4cKK/mf+nSPm/+8Ux76J9Y6RXXruusH03D4/L08uyxnxGXFBpw3wXzZNMu qJGE9BBZh6+HzqJDr8Nf6+hoDvy+pL9UvWOVs0baYAZSsrg0Jbl7g1o35I2IZk4AOPPxYs I7mJyPO+7zsh3r9DStFr8vHXqX3N5GE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736293111; a=rsa-sha256; cv=none; b=71qi/peU3bcJ5gFkx7TvuP7RrRgjyRimMmsdYnykDjisp8yaOz9g5sWC8MCKOY6HBD6req hnio7XMOBwHR7KYVM7H9I/RLuXbzsX0PyQCqvHYXtnLLomKHcnImUaMMhwKsD4TZ1ueCZA 5m01CQJFB6dxcnJmaJW3aydUFC2xTU4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f179.google.com with SMTP id 71dfb90a1353d-51623968932so8632326e0c.1 for ; Tue, 07 Jan 2025 15:38:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736293110; x=1736897910; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dVG5MMYsXJNeiUqHwxOiADsVxvbz1uR1TXbbkCWmeBs=; b=jSjsSIOsfTX0awijOuzhaPwA3jJPXT+PHwnhzbJIMUL/Mo4LNxAkG8SYEBC7hVIDS1 aryIcHoUdg+B0cGgzVJ75p7IoMHdYm6SIgF915YdqkKdSZXZiL6pJTwPtrY4k6sMfjaW 1DHz9oBTbXKMhce2hyFFI0xwJwizSJ7qgIu3yQWNKCa9D1Grlu8HzCdvg48WNBdDjf1t 5s2sv8aBYMWJAdovrOb2FVVdg4EaCsh3UUSRpoXIN8uUGn9IzCQk3taPWgl9Xqzp7tlX dNmNYAKKySRXXaHXtPwScjH4MkpRNq8ECeZIJrzvk9kWQdwAYDJm2u4CK1dcSdsNsd1x K4UQ== X-Forwarded-Encrypted: i=1; AJvYcCUGyr9F1jMm/Z5f4ABhOiMQY/YAdfbLMBoKNVvDb5yQOfoF7W2MBkBaF+ibd0COytnVWsZIi5npTw==@kvack.org X-Gm-Message-State: AOJu0Yzw1k5Gi/gvRXIP/BBiwx97Q7Ve6nG1Mh3D7HmC+ZKTcmsO+Ocu DwL769xOZaHcB1LID3UnDzq/EKhSmWnfJeTLO01bmyH0Rsy8j0L2caz+m29a23xIezda4Q2aPdN 4U/S5Q/Vhq4bQwd4KuORxAddMRwU= X-Gm-Gg: ASbGncv1lAkBrvdzQtIJMPxr1ZIBRSVGM4l6xfE/H85/EASViiurOVmMbIXccZaNj9c wd9n2GFRFMWjCAeO3+ytaPuQJOwR4o462uPjbb1s8dhPx3CATWsVadxkDTZL6650RQgrTaLI= X-Google-Smtp-Source: AGHT+IGjvk9ah/tNJ4ANNx3UV1HlbtvQXIQoAJvoGcqH5EQYuQAJ+6Mq1pS/+peJcybL84RMbgRSZbcMFUBf4W5rsKM= X-Received: by 2002:a05:6122:7cf:b0:51c:27a0:25b8 with SMTP id 71dfb90a1353d-51c6c1fdc97mr869648e0c.2.1736293109936; Tue, 07 Jan 2025 15:38:29 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> <20250107222236.2715883-2-yosryahmed@google.com> In-Reply-To: From: Barry Song Date: Wed, 8 Jan 2025 12:38:18 +1300 X-Gm-Features: AbW1kvY4B5vetMcMduqj-X_Iak1T60IpVXH1Tr_iwkt3YeZYdfteT9CM37USdy0 Message-ID: Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 075A8140006 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: kydknwcfkjp3ucdqy9kra6946ccequu1 X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1736293110-445283 X-HE-Meta: U2FsdGVkX1905/slFpdOQsWG6IkDoXvlhAVlldvBt6wqpUBUy3MKfY4lDchuexcM+wX8QNqCpq3JCI+BEkyI9FAp08vwLiNudCzmcmFAXORNy5CFOyTdF7EtKTQFTqUy+awngDZ9RfIq9Rd3C85cKJVuOcwk0jmBIG2Xp8uV+nImGBzZ1ZzY6RfbP0jBhbPM85qflpNKuQ9XCYjKZQPsiY8feMiKuCJ7/7vp2MZkSSy1znusN9lMB4WGleu05IVAXvIq9RP+SYe1ITWYfy/DinDchzqzifr+p7fjNQz9hbgKSuzRMdq7mU/IZgFkituFKZM+x+S26V1OAWMT2lYV0qrVWoOu7FHLhhInKFCAP8ktENNDkRoqSk1dzdWootBZA4HqkIwLtZDeM+OfuwPFHcAXlKZnQ5veYYxCjiy9liNC43DHWn8WxYzFoLyCpoVZhqJN5Jtkax3qQ9CCewZHpCwvZVemParOQsVX4lGxjZhPOq4xEAEYf/7srsr+VyQoz+dWmc5Z20vbr2R+HEZOqx4CdUVnMFXYBOkBBCx+0siH69kzkKrRhqz+7AoS/Nkf90Lrp5Fzp8IYNzppaX/bOzIqq2sPrKaZ4AIvrCiaOs92vh9a0iPlv4qEkzSIlnfwiZplyjyJ2Ie92rluIS7s+/raOVYdT7XKx/HNfSy2NgmLKqZRMX1IwqFqRmnkmswkt1BMZ8NL6rh5qxhCepKnhnpeBQ/4HkxFnOy0QfIts4KYGji5yEvByAaJa3OMGTPeYUuj8Kz01RwQQn5cLxgWQo0UcMkwiUxE0KLtLCL0ZGbUS9xLpfhjp9qsrtIkYW8HV0agq7wUlFxg5KfEXn+ZLsBTdlQMY+CyS8XCjJozng67U6L1e1y5Lw/rnO2A8IFmpcrRAxRH4hULH1T6p29s0lqGGtzxO+vErtO9ce+55v4sPBHInl6tvjEFfDKyczLebLpdVhxEi3tfvhWMA1b 35T4qH6y 5Uy9OiZucuFWxjBOdgRmBMAHUEAy2kuBfQbDyBjKMW6Bo5rQKm9TdrQYLtTgZXigkz/IeWHFPwLmHLQuNGHRVyruB4yors8T7TxHre2GXO3DL8XeqKpoRP/jOd/FpOXdEP28rCtsO1srqM1byJxgy31fTTizIHPt0xcKBrmPBczUL8Rc9Wc64F+jDmO9A8zuBph8+SnCrR8VsnU5mYOOEE3HlGZV1AvINpx1Oo1dzZfi/t8LgoXIW4egOUyUlWTcVQZsYqEyRZ00mKinbbP+/OE/0AK8AqchwYjNK0NpTadMxeCxqV3lFt1bxh3JyjMkLlJ4YT5vPhA4yvt+ofrUr7FJdbCLXwRhrz2Pl3aHipKIuwUueT3nKBaody7vigh6zC/OTB07pgFxwXyWKoKOYtxjxPH1LRQk5p5PidcX3kafqZxbix2usAHHN8pbVGmhv6T9MAUCNOtynzCs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 12:26=E2=80=AFPM Yosry Ahmed = wrote: > > On Tue, Jan 7, 2025 at 2:47=E2=80=AFPM Barry Song wro= te: > > > > On Wed, Jan 8, 2025 at 11:22=E2=80=AFAM Yosry Ahmed wrote: > > > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of = the > > > current CPU at the beginning of the operation is retrieved and used > > > throughout. However, since neither preemption nor migration are disa= bled, > > > it is possible that the operation continues on a different CPU. > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still in u= se, > > > we run into a UAF bug as the resources attached to the acomp_ctx are = freed > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to= use > > > crypto_acomp API for hardware acceleration") when the switch to the > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp wa= s > > > retrieved using get_cpu_ptr() which disables preemption and makes sur= e the > > > CPU cannot go away from under us. Preemption cannot be disabled with= the > > > crypto_acomp API as a sleepable context is needed. > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to > > > per-acomp_ctx") increased the UAF surface area by making the per-CPU > > > buffers dynamic, adding yet another resource that can be freed from u= nder > > > zswap compression/decompression by CPU hotunplug. > > > > > > This cannot be fixed by holding cpus_read_lock(), as it is possible f= or > > > code already holding the lock to fall into reclaim and enter zswap > > > (causing a deadlock). It also cannot be fixed by wrapping the usage o= f > > > acomp_ctx in an SRCU critical section and using synchronize_srcu() in > > > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in > > > CPU-hotplug notifiers (see > > > Documentation/RCU/Design/Requirements/Requirements.rst). > > > > > > This can be fixed by refcounting the acomp_ctx, but it involves > > > complexity in handling the race between the refcount dropping to zero= in > > > zswap_[de]compress() and the refcount being re-initialized when the C= PU > > > is onlined. > > > > > > Keep things simple for now and just disable migration while using the > > > per-CPU acomp_ctx to block CPU hotunplug until the usage is over. > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hard= ware acceleration") > > > Cc: > > > Signed-off-by: Yosry Ahmed > > > Reported-by: Johannes Weiner > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg= .org/ > > > Reported-by: Sam Sun > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2= e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > --- > > > mm/zswap.c | 19 ++++++++++++++++--- > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index f6316b66fb236..ecd86153e8a32 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cpu,= struct hlist_node *node) > > > return 0; > > > } > > > > > > +/* Remain on the CPU while using its acomp_ctx to stop it from going= offline */ > > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acom= p_ctx __percpu *acomp_ctx) > > > +{ > > > + migrate_disable(); > > > > I'm not entirely sure, but I feel it is quite unsafe. Allowing sleep > > during migrate_disable() and > > migrate_enable() would require the entire scheduler, runqueue, > > waitqueue, and CPU > > hotplug mechanisms to be aware that a task is pinned to a specific CPU. > > My understanding is that sleeping is already allowed when migration is > disabled (unlike preemption). See delete_all_elements() in > kernel/bpf/hashtab.c for example, or __bpf_prog_enter_sleepable() in > kernel/bpf/trampoline.c. I am not sure exactly what you mean. Initially, I had some doubts about whether the scheduler handled this corre= ctly, but after reviewing the scheduler code again, it seems fine. While a task i= s dequeued, its cpus_ptr is updated. When the task is woken up, it uses its cpus_ptr instead of selecting a suitable CPU (e.g., idle or wake-affine CPU= s). Only after migrate_enable() restores this_rq()->nr_pinned to zero can CPU hotplug take down the CPU. Therefore, I sent another email to correct myself: https://lore.kernel.org/linux-mm/CAGsJ_4yb03yo6So-8wZwcy2fa-tURRrgJ+P-XhDL-= RHgg1DvVA@mail.gmail.com/ > > > > > If there is no sleep during this period, it seems to be only a > > runqueue issue=E2=80=94CPU hotplug can > > wait for the task to be unpinned while it is always in runqueue. > > However, if sleep is involved, > > the situation becomes significantly more complex. > > > > If static data doesn't consume much memory, it could be the simplest so= lution. > > Do you mean allocating the buffers and requests for all possible CPUs > instead of allocating them dynamically in CPU hotplug notifiers? I am > not sure how much more memory this would be. Seems like it depends on > CONFIG options and the firmware. Correct, the firmware/devicetree will help identify all possible CPUs. Even if CONFIG is large, only those possible CPUs will be allocated memory. However, if migrate_disable() and migrate_enable() work as expected, we don't need to consider this option. Thanks Barry