From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A95EE77197 for ; Tue, 7 Jan 2025 23:56:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9E686B0082; Tue, 7 Jan 2025 18:56:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B4E086B0083; Tue, 7 Jan 2025 18:56:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A152D6B0088; Tue, 7 Jan 2025 18:56:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 84E9C6B0082 for ; Tue, 7 Jan 2025 18:56:58 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0C172A043C for ; Tue, 7 Jan 2025 23:56:58 +0000 (UTC) X-FDA: 82982318916.01.93CAC13 Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by imf11.hostedemail.com (Postfix) with ESMTP id 2792B40005 for ; Tue, 7 Jan 2025 23:56:55 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736294216; a=rsa-sha256; cv=none; b=V7NCc/M+kvkj9iRXT5RhhrAiJNBabBTtOwsLB2RIS47/ZsgVqa1h5uiW4yHg9Ce19HonQj LtV5Lhat0N+lgnSZD6kKr89Oa9bJ/lZQbH4rrtsKkixaRTR4WCN0XHqFqg9FJIbiWpSZB+ eQL4WLO35NWLCZg7pnRg9RRmssTvJ9g= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf11.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736294216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q1cb1DUV4xv7Tq0nP/l08DLYi+sRYWSG7hM3AytBxfw=; b=xOgNhxLz6TXB8nX7SbYe4Jfejx/PLQMvBoVQDBj0LDTNz+BcA+Fc5vUA7WW0p0m/+tl42A 4tti1wQ/Sb626vrHTFNDd/+zVRpntbRIT5qY/3lCMsxKRMs6N8mdoCrGXEpNrAUfV1qvEE iX7KcVRQwhcce8tauffMKOney97Rf7o= Received: by mail-vs1-f54.google.com with SMTP id ada2fe7eead31-4afefc876c6so5021063137.2 for ; Tue, 07 Jan 2025 15:56:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736294215; x=1736899015; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q1cb1DUV4xv7Tq0nP/l08DLYi+sRYWSG7hM3AytBxfw=; b=M4+Xrz8FIHk1HiN9Jxf3XG6ZWj8GsM5gedJfFELtmvCT3OL2mVOYoa3niorRhp7xOa mj824qPsbFLnlLPAUsz7H/i3+uDRb1bDfeKAaLh3ysuA9qJIEtcIARnbsjGuRW5ir6HE 7tCIcTVmVU2OSV99Mbr5CQzO9uRWVEjwYin3m1YYLrwj/dd6yUOfP87j/qrDfjD23jZ2 rJmXg7xf0tSgLDYbPl0EA+YbmsnGy2ngcRGS5pJmb8oDzXF8grnVk5kUw+MwKtV6SeTI 8/zocuOgm32CUnrXtVrUT1EK7dyx+8A0sl9/9QOCEs4IzWFC4wsU1ZpZar0GKS3cVt86 +ayg== X-Forwarded-Encrypted: i=1; AJvYcCW1CnscE7RRMXhdkqjuu21dh7c8G57AYxTOE8EU046R3gCidBiqGSefiUGTGfOqOr5HyKsX0WV7UQ==@kvack.org X-Gm-Message-State: AOJu0YxXLtxT7UDIZ/3M/dicyF/P3whTj2anDu7Qk7QI+oyTymMwQOPB 4kN6roKb0v6FV0Hzprx/tmr3UuL3K4nG+WE5m14SJq+rwevYUng8FBJqtnus+MVFu79uFMdvfpI ChUDSSLQgw3x63G9eZv2GZ8wC7UQ= X-Gm-Gg: ASbGncvy4OlM+mWvyUNb9/WJcoBYwRq1rtg8MCUst2c3Dn/c+MHjHTPl/iRxRB0VyUm +yjAyE/yvw5kmXRPa7gnuHkgWq5A5p4pqUdt71L/v4cF4yh/plV91/5cIvpR/H0sRbO/fbrs= X-Google-Smtp-Source: AGHT+IFQF4rbOj57MKazphxT7zM4SmJ1w77ypufJOhWONq7MIAsLAJ+YyYuwY9/mzsLS9j2UZBSZZrr2gtNsmvm+M+8= X-Received: by 2002:a05:6102:4b12:b0:4af:f044:9381 with SMTP id ada2fe7eead31-4b3d0eb9426mr925956137.27.1736294215244; Tue, 07 Jan 2025 15:56:55 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> <20250107222236.2715883-2-yosryahmed@google.com> In-Reply-To: From: Barry Song Date: Wed, 8 Jan 2025 12:56:44 +1300 X-Gm-Features: AbW1kvZuTDLWSyxXcLXp8bF7iKv37kirT_Z5D8NSVJ1__fCGZXi3JDZjpui5amo Message-ID: Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2792B40005 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 3u6uxbgqu799dmu98am8ng3cjgp6zjxp X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1736294215-373601 X-HE-Meta: U2FsdGVkX1+OvCEoWU0q6ZYMBfydsXCV7U8f9dYXZvTEapgHEB5p3KXZQyvsy6PMA1640cpddBv04Kp9tbvYcwOKRXuJwfd0jkwc8IlJydfpvkyjkFLWjMOwcqbj9+SqpppTKKUl8+kcpPn5u3mzjkIarfS2vWllEZytfivgk/ycuE5nnyIjFTZyrb5y0UbTxbYWSCOb/qWXvu6F0tP1jq81Iw90COrBWMMgksp06snZTzoZwMlyDpsmtaOYesQziX9UUy4ymF0oqMbF7Dcdh8Rrd8iONX8o7OQ0jUoD5AcHbugmE/2VR+OSoonEzVpQRa9N2Ph/rb2laEDh0cahxUK660e339VfFgCxY6TmVR7Iyp4we9WR387fS1YpAfWSpZI+2bKrmafNl3svs+z8pTQV4PgQzMa2lsxT96jeMGnqPY1lX/DSGD82d6dVDTCEi/WKdUoEeAL4Rd4TzeK/s4i1RcAHKJ2FsUwgnnPO2GFE/BTTN8KufsfM4Z0KTjsS0yzx+PLSHU6SJSnUZtdIy9TS704IPepaVICRe6SEJf5D5Sw6O7ExaTGwaoqyUGUJ7w0mxsO0fEntOpLUyUGzj7LdhiU3YK/uVhlix1klynmQ+yCFeGi4hO67oy1rHLX5uXZcSWf51kIYy8nWpRtzRiyatBz4ysRFP4Kx21eimrjfI5iNkfbiLsDVxfndO2Iwago6FVGkftB7QT/S36TbqhThbzDfoAbqH1OmoYi1u6+8k4MxHULwrB3Zr1dsSokmu7Bum9B/8K7ludv1QGrmpc/WWJIj/GgvPJgJJ5WVO/zTsuhO7NzgEmLQG73RJ+R3p7rU7KdErZCSQFKLvhbCAFElBS+BZTGK+lkog+eEr6TrUDXlPTQzrOJjXdsKId994vZZFmBboOgIYYLzC6mTnclCfzU1TeHLObbU+m0onA29letpzwl1CiNwWeRfbQSk2zKWcoP7z499ICgIyev ivYyQ5BL K0/zoJ0UvDmiruLDeUMQisOPkkrTdhaMUE7p/tQ/muiyM2qStJcd8Skxyppc8fY0Rul6j19rPmZpN/gM2Vzbw6CdQfo7zKF5XbLfJcVsckPks8riTDJ/CfIc6z1R8Zs72ECGoQSEeuBTvSWqGxe0zuIz5FMMTNKVbiSdzhKkKhBg+YRf5DSq3t4T/Zeeo2z3kFyWXASFb4Zh5Cezpz2Hk+jg2T3ScWtkneNRFDPVBbCB9ijBSXy1D6U+A/QaWx9JP8IZjMGkd9x3tkgGoOXuvG2xreU3eOMqIdF44SyavFjnYlHT192M1iryv/FdzvhnjYmzM92qaAQM1dLM2oIqRobbcr0f3Ag0pJ+gA1U49HjBVbd9hYmsWKv1pCX4Jm/VVZyfSzlL/On9djMwFb78WCTc6mFclkL7QjLDIacK4lsKMmnai25vcYFtJk950LbHjBVKt8/5ToOrS7KWnPoYeFW7rxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 12:38=E2=80=AFPM Barry Song wrot= e: > > On Wed, Jan 8, 2025 at 12:26=E2=80=AFPM Yosry Ahmed wrote: > > > > On Tue, Jan 7, 2025 at 2:47=E2=80=AFPM Barry Song w= rote: > > > > > > On Wed, Jan 8, 2025 at 11:22=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx o= f the > > > > current CPU at the beginning of the operation is retrieved and used > > > > throughout. However, since neither preemption nor migration are di= sabled, > > > > it is possible that the operation continues on a different CPU. > > > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still in= use, > > > > we run into a UAF bug as the resources attached to the acomp_ctx ar= e freed > > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move = to use > > > > crypto_acomp API for hardware acceleration") when the switch to the > > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp = was > > > > retrieved using get_cpu_ptr() which disables preemption and makes s= ure the > > > > CPU cannot go away from under us. Preemption cannot be disabled wi= th the > > > > crypto_acomp API as a sleepable context is needed. > > > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to > > > > per-acomp_ctx") increased the UAF surface area by making the per-CP= U > > > > buffers dynamic, adding yet another resource that can be freed from= under > > > > zswap compression/decompression by CPU hotunplug. > > > > > > > > This cannot be fixed by holding cpus_read_lock(), as it is possible= for > > > > code already holding the lock to fall into reclaim and enter zswap > > > > (causing a deadlock). It also cannot be fixed by wrapping the usage= of > > > > acomp_ctx in an SRCU critical section and using synchronize_srcu() = in > > > > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in > > > > CPU-hotplug notifiers (see > > > > Documentation/RCU/Design/Requirements/Requirements.rst). > > > > > > > > This can be fixed by refcounting the acomp_ctx, but it involves > > > > complexity in handling the race between the refcount dropping to ze= ro in > > > > zswap_[de]compress() and the refcount being re-initialized when the= CPU > > > > is onlined. > > > > > > > > Keep things simple for now and just disable migration while using t= he > > > > per-CPU acomp_ctx to block CPU hotunplug until the usage is over. > > > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for ha= rdware acceleration") > > > > Cc: > > > > Signed-off-by: Yosry Ahmed > > > > Reported-by: Johannes Weiner > > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxc= hg.org/ > > > > Reported-by: Sam Sun > > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+= o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > > --- > > > > mm/zswap.c | 19 ++++++++++++++++--- > > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index f6316b66fb236..ecd86153e8a32 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cp= u, struct hlist_node *node) > > > > return 0; > > > > } > > > > > > > > +/* Remain on the CPU while using its acomp_ctx to stop it from goi= ng offline */ > > > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_ac= omp_ctx __percpu *acomp_ctx) > > > > +{ > > > > + migrate_disable(); > > > > > > I'm not entirely sure, but I feel it is quite unsafe. Allowing sleep > > > during migrate_disable() and > > > migrate_enable() would require the entire scheduler, runqueue, > > > waitqueue, and CPU > > > hotplug mechanisms to be aware that a task is pinned to a specific CP= U. > > > > My understanding is that sleeping is already allowed when migration is > > disabled (unlike preemption). See delete_all_elements() in > > kernel/bpf/hashtab.c for example, or __bpf_prog_enter_sleepable() in > > kernel/bpf/trampoline.c. I am not sure exactly what you mean. > > Initially, I had some doubts about whether the scheduler handled this cor= rectly, > but after reviewing the scheduler code again, it seems fine. While a task= is > dequeued, its cpus_ptr is updated. When the task is woken up, it uses its > cpus_ptr instead of selecting a suitable CPU (e.g., idle or wake-affine C= PUs). > Only after migrate_enable() restores this_rq()->nr_pinned to zero can CPU > hotplug take down the CPU. > > Therefore, I sent another email to correct myself: > https://lore.kernel.org/linux-mm/CAGsJ_4yb03yo6So-8wZwcy2fa-tURRrgJ+P-XhD= L-RHgg1DvVA@mail.gmail.com/ > > > > > > > > > If there is no sleep during this period, it seems to be only a > > > runqueue issue=E2=80=94CPU hotplug can > > > wait for the task to be unpinned while it is always in runqueue. > > > However, if sleep is involved, > > > the situation becomes significantly more complex. > > > > > > If static data doesn't consume much memory, it could be the simplest = solution. > > > > Do you mean allocating the buffers and requests for all possible CPUs > > instead of allocating them dynamically in CPU hotplug notifiers? I am > > not sure how much more memory this would be. Seems like it depends on > > CONFIG options and the firmware. > > Correct, the firmware/devicetree will help identify all possible CPUs. Ev= en > if CONFIG is large, only those possible CPUs will be allocated memory. > However, if migrate_disable() and migrate_enable() work as expected, > we don't need to consider this option. By the way, there could be a slight performance regression if the previous CPU is occupied with other tasks, as we would have to wait for preemption. However, when migrate_disable() is not in effect, the woken-up task can quickly select a neighboring idle CPU and continue execution. When the entire system is not busy, there might be no difference. However, = when the system is packed with tasks, it could make a difference. Hopefully, the regression is small enough to be negligible :-) Thanks Barry