From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86F8CE77197 for ; Wed, 8 Jan 2025 00:02:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0802F6B0088; Tue, 7 Jan 2025 19:02:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0071F6B0089; Tue, 7 Jan 2025 19:02:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9C3C6B008A; Tue, 7 Jan 2025 19:02:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B721A6B0088 for ; Tue, 7 Jan 2025 19:02:24 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4B764B0506 for ; Wed, 8 Jan 2025 00:02:24 +0000 (UTC) X-FDA: 82982332608.06.A7600A5 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by imf27.hostedemail.com (Postfix) with ESMTP id 651294000F for ; Wed, 8 Jan 2025 00:02:21 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PGPWMu4Y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736294541; a=rsa-sha256; cv=none; b=MD9zLkQITGEjx44dqgq+PpBGnjUDbLqGeNqQUOkfluJSDUAAIPYv1cIMuPfZWThWGKfcPV 3QkZOQxbcxmi8/z7MtNQHDayD4OGwZwpUHEsxPWT6NuxV3+Jmk1exZA58VmuvHY+ncfza1 4K7JGcYdCrD/D4yxxEllI/WptXJrMgA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PGPWMu4Y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736294541; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q5VcB/mTFuaRwJcsLEoa6X0skB2mYenl5uSlFRwJ3HY=; b=xG2gi276S6bG9cbI+UXCZBjqjGWh6AtueBEt5dkRKE837X94KDWU0NWEsDZGJJ9MZO4TDf +TorAKl9AYLiUE7Y7t7t08Ag201yXIikxryQqs4tN439CxmOSyA6bAR/hg6yILZmppXVgP mzvBj2/n/frVbHbBkfqZGFPJbI7tY7c= Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6dd16e1cfa1so139259496d6.1 for ; Tue, 07 Jan 2025 16:02:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736294540; x=1736899340; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q5VcB/mTFuaRwJcsLEoa6X0skB2mYenl5uSlFRwJ3HY=; b=PGPWMu4YjHCgeVHRq5EPUrYE9VwQj0+1IPvLVHwVv3MMULXgEZHrYjNJRkdjC3zYEP sPc9INIwIJHE/4xOAsdrf2oQF9TXRHRMaigqCgfLiK/T4+ysQxVUKvzkElAs52UnISEk zPqxUc34uILZSTd+ke4SRJCqM+TMqMKzSqccYo7VS3QVbgK8anS73FFd6h+iwUdMXC8s Z56Q+8i0vVK96dR+iUDGhwwSOn4kmuy+mGYhdT526MT/L5q2lMLZkoR1f/HWMZiVublN kocwD4Cbk7sySX1fyUX6NhU7etv/KVV4DXQz2fnfqKG/+hQFEoF3emNHhhh+sEDcAIqs EPSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736294540; x=1736899340; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q5VcB/mTFuaRwJcsLEoa6X0skB2mYenl5uSlFRwJ3HY=; b=G7Iu9QuiNzzKB+h5j+y2L2btnc+TREezH3MLZA3n6P/N2oNwJ9QtRAVQq2DIlGXtYy rraR5NEdgfFMttCg0a64oHFQ1DKoCKNN5oR+/fXacAPeNixHcjL1dYcG9AMjZZ5rRyO1 5AE0nrTSPRwgpI5XVTIFS1JyQKxvnbJ437AuEWIij0T0yjL8SuTk6FsC0CR0oUypJ2cl mgxoo3YYCWF40xMqZaUebt76cJabuMqgs7PzGs45JDBuNgc6Y+MJNXHlqE+2fmEgc+xS fEGETavRciVI+2FHiEj1YzKnbjxxYO7H8pNGVG0cZt09TH3Pm3e9sMNCCmh/gXu4aoZu /YZA== X-Forwarded-Encrypted: i=1; AJvYcCUkk4mRvH3QVS59O2kOppldFAC5Qtnc3CODprcWinv3bq8YlDaSHTpYHu8tSfnfP1nYG+QW5prumQ==@kvack.org X-Gm-Message-State: AOJu0YyoqaWj6Ok3LmKIMNhwHUCBXLh8tCoJM/8/1YDzaBjRc15d0Vff RMmJCiFoD1kZQfcHx9Ug8RZfvqSUIRKvDQ1V43L+T3COmBVxxQyhR7L3Z49clR6wluGD2RFHIgI ifHSgjVyH8HzmSF2qWFwER49S6PsBTbRlP6p3 X-Gm-Gg: ASbGncv8UnCsZbdVk0buLC51eM4RvrBKlyieNVnYzzUTtLO4+hlwCuiMWloFW7oZnyS HqgiKMRUppywf6VybYW2Y0sBIY1h3z4L6UPv99rSyFFzdtCXg1Smwvg0m8Hu7pVa/X0rL X-Google-Smtp-Source: AGHT+IHGWRaUqixp2cjxy4Eeujl4Tz+gAM8WomLT0kK8VB7P1yVycx0mNaR8UDUFGER6R/cVKFoAXXIrYASxCeGPlqw= X-Received: by 2002:a05:6214:29ca:b0:6d8:876e:ef41 with SMTP id 6a1803df08f44-6df9b1f6ae4mr15297536d6.21.1736294540149; Tue, 07 Jan 2025 16:02:20 -0800 (PST) MIME-Version: 1.0 References: <20250107222236.2715883-1-yosryahmed@google.com> <20250107222236.2715883-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 7 Jan 2025 16:01:43 -0800 X-Gm-Features: AbW1kvZzYTdQbxIAphcN788kl4e475ndXnkGPAgm2e1fT4WkEBR5Cem40-ed3pk Message-ID: Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx To: Barry Song Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Vitaly Wool , Sam Sun , Kanchana P Sridhar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 651294000F X-Stat-Signature: q667nerqh1iskzayfhi1abaxb41qon5z X-HE-Tag: 1736294541-645073 X-HE-Meta: U2FsdGVkX1+Li/PsJCR5vjIdhC0xDdcYgHLQ0tKtN+TjkJCVMa3bsVnfTDVxy4sgaM7iZ7Kvvk/NEkY3t1O68vJ0TBBRkO2pFaXKRkGBbaaieckIN6m4y3lcoXQ/CNVEbkftR6RP5ARlTHmz9hJ2Ma3w57MLrMeYRKeo67a2pIB5qmHxRsDC3LAVuKz4sb6N3k8K1rW2ll/a6mzF/FyN60qa9i8oKVdUBaaEEdP2bcmIqb5pRsE6FKYXDSwGy90hv8CbEb7uOkHj9W5WVJ2+j4Bx0Lxm+a0Gmu+GCZ+eq+JCqU7ABukr9bMvRYnR4fkwbOdZR1ZbRQXM40IudISMuhKXa0jXybmZnkQuW5zFO5NAf3rChlgWrT0a1AO/lf6PgyNNj/a6qXgCcnrV4GgOfwgJNeMUDtdPyvGnzgBmo1HmH81wM0JJG5gurBAuztMTPBrITOe07Af1UAwjQT9Yxfp8koS/zRfeTLXgMXnKrSFQUBIIeiliajxTR2wSsm7X2G94oz7Q/8GzpJXGoJtk0Vo7UQj61t2JSh9it0EKyDbhEJlBaPWjpaO9L33m6W1fZYRZ7wl85nph71sTfa4bMMSzcFN4PjwfF64HVlN9xBNGcHWxTA4Nwhbeu/9IzediLukDxzoBoZVoGBNmmPJ80t6BDtznVrBXB5j9SRBEeVNgAQXPFjbtzTyxLkr1VPcx8F+XGdlu7p3kah4UH0JVMBMkvFqiY+2Ju3CwWIG7BUvDbdKY8JNW2YmFtzaFnRRSZa/SzWeqlCs1JW8nByTP9qUsgzFDvQInqQ3nDPWuCo94NtuzfneFUXbD6Wpv+ylfi0Ar8HdyYPFjD2jPTWKqJD1j2PEJCuzwpfi0ILbLq5Y750I0igL8Ezo6xPfIl88V6aZAICfEwWplKrZS2hgTLjGQMBMr5tqfySRCNfOqJPaeK7MSihdO8hPKM4umW5K+Zx2nPVsI9HS9uw2Lm0o /b6ivG2a ogI+LAAJTnw58/Ufxr5R5CZY5Zofs35zHTtAbGoWcOkBWuNxmkUrf6dKE5k4d/c788/A9ZzYYe2uAUChJt8tAaCyhnqEBpOyviZnQ5fmDgO9b8E535eM86rGIAJgiX419vGhAvNn+rLFVBbXMATgbKabhHBppJDZI9MPMuOZAeKn4WLbcjbiCiIOpm1/h8nrwqBTebYcPKvxqlYJLrkX1lNoN90+MRCyfVMVxzwa7mZ5fIuFVE96aKXPRvYPkt9sPEOSdbwDbUG6Fx6PSWzSCJ/wyqsIAC3NRgpmM5koBOZCpbHbXEfLr8MrEyGPinh4OOKJ+mSNg4JOI2iP28aeDmaUXlvrtRwsqnq+MbDd/O983TbasRD+/L9yjXQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 3:56=E2=80=AFPM Barry Song wrote= : > > On Wed, Jan 8, 2025 at 12:38=E2=80=AFPM Barry Song wr= ote: > > > > On Wed, Jan 8, 2025 at 12:26=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Tue, Jan 7, 2025 at 2:47=E2=80=AFPM Barry Song = wrote: > > > > > > > > On Wed, Jan 8, 2025 at 11:22=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx= of the > > > > > current CPU at the beginning of the operation is retrieved and us= ed > > > > > throughout. However, since neither preemption nor migration are = disabled, > > > > > it is possible that the operation continues on a different CPU. > > > > > > > > > > If the original CPU is hotunplugged while the acomp_ctx is still = in use, > > > > > we run into a UAF bug as the resources attached to the acomp_ctx = are freed > > > > > during hotunplug in zswap_cpu_comp_dead(). > > > > > > > > > > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: mov= e to use > > > > > crypto_acomp API for hardware acceleration") when the switch to t= he > > > > > crypto_acomp API was made. Prior to that, the per-CPU crypto_com= p was > > > > > retrieved using get_cpu_ptr() which disables preemption and makes= sure the > > > > > CPU cannot go away from under us. Preemption cannot be disabled = with the > > > > > crypto_acomp API as a sleepable context is needed. > > > > > > > > > > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer t= o > > > > > per-acomp_ctx") increased the UAF surface area by making the per-= CPU > > > > > buffers dynamic, adding yet another resource that can be freed fr= om under > > > > > zswap compression/decompression by CPU hotunplug. > > > > > > > > > > This cannot be fixed by holding cpus_read_lock(), as it is possib= le for > > > > > code already holding the lock to fall into reclaim and enter zswa= p > > > > > (causing a deadlock). It also cannot be fixed by wrapping the usa= ge of > > > > > acomp_ctx in an SRCU critical section and using synchronize_srcu(= ) in > > > > > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed = in > > > > > CPU-hotplug notifiers (see > > > > > Documentation/RCU/Design/Requirements/Requirements.rst). > > > > > > > > > > This can be fixed by refcounting the acomp_ctx, but it involves > > > > > complexity in handling the race between the refcount dropping to = zero in > > > > > zswap_[de]compress() and the refcount being re-initialized when t= he CPU > > > > > is onlined. > > > > > > > > > > Keep things simple for now and just disable migration while using= the > > > > > per-CPU acomp_ctx to block CPU hotunplug until the usage is over. > > > > > > > > > > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for = hardware acceleration") > > > > > Cc: > > > > > Signed-off-by: Yosry Ahmed > > > > > Reported-by: Johannes Weiner > > > > > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmp= xchg.org/ > > > > > Reported-by: Sam Sun > > > > > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD= 5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ > > > > > --- > > > > > mm/zswap.c | 19 ++++++++++++++++--- > > > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > > index f6316b66fb236..ecd86153e8a32 100644 > > > > > --- a/mm/zswap.c > > > > > +++ b/mm/zswap.c > > > > > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int = cpu, struct hlist_node *node) > > > > > return 0; > > > > > } > > > > > > > > > > +/* Remain on the CPU while using its acomp_ctx to stop it from g= oing offline */ > > > > > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_= acomp_ctx __percpu *acomp_ctx) > > > > > +{ > > > > > + migrate_disable(); > > > > > > > > I'm not entirely sure, but I feel it is quite unsafe. Allowing slee= p > > > > during migrate_disable() and > > > > migrate_enable() would require the entire scheduler, runqueue, > > > > waitqueue, and CPU > > > > hotplug mechanisms to be aware that a task is pinned to a specific = CPU. > > > > > > My understanding is that sleeping is already allowed when migration i= s > > > disabled (unlike preemption). See delete_all_elements() in > > > kernel/bpf/hashtab.c for example, or __bpf_prog_enter_sleepable() in > > > kernel/bpf/trampoline.c. I am not sure exactly what you mean. > > > > Initially, I had some doubts about whether the scheduler handled this c= orrectly, > > but after reviewing the scheduler code again, it seems fine. While a ta= sk is > > dequeued, its cpus_ptr is updated. When the task is woken up, it uses i= ts > > cpus_ptr instead of selecting a suitable CPU (e.g., idle or wake-affine= CPUs). > > Only after migrate_enable() restores this_rq()->nr_pinned to zero can C= PU > > hotplug take down the CPU. > > > > Therefore, I sent another email to correct myself: > > https://lore.kernel.org/linux-mm/CAGsJ_4yb03yo6So-8wZwcy2fa-tURRrgJ+P-X= hDL-RHgg1DvVA@mail.gmail.com/ > > > > > > > > > > > > > If there is no sleep during this period, it seems to be only a > > > > runqueue issue=E2=80=94CPU hotplug can > > > > wait for the task to be unpinned while it is always in runqueue. > > > > However, if sleep is involved, > > > > the situation becomes significantly more complex. > > > > > > > > If static data doesn't consume much memory, it could be the simples= t solution. > > > > > > Do you mean allocating the buffers and requests for all possible CPUs > > > instead of allocating them dynamically in CPU hotplug notifiers? I am > > > not sure how much more memory this would be. Seems like it depends on > > > CONFIG options and the firmware. > > > > Correct, the firmware/devicetree will help identify all possible CPUs. = Even > > if CONFIG is large, only those possible CPUs will be allocated memory. > > However, if migrate_disable() and migrate_enable() work as expected, > > we don't need to consider this option. > > By the way, there could be a slight performance regression if the previou= s > CPU is occupied with other tasks, as we would have to wait for preemption= . > However, when migrate_disable() is not in effect, the woken-up task can > quickly select a neighboring idle CPU and continue execution. > > When the entire system is not busy, there might be no difference. However= , when > the system is packed with tasks, it could make a difference. Hopefully, t= he > regression is small enough to be negligible :-) Hopefully the robot will let us know if it isn't :)