From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23B8BC47258 for ; Tue, 23 Jan 2024 21:04:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD82A6B0080; Tue, 23 Jan 2024 16:04:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A88946B0083; Tue, 23 Jan 2024 16:04:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 951646B0085; Tue, 23 Jan 2024 16:04:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 839476B0080 for ; Tue, 23 Jan 2024 16:04:55 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 552B91A0566 for ; Tue, 23 Jan 2024 21:04:55 +0000 (UTC) X-FDA: 81711805350.29.BAE5B95 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf30.hostedemail.com (Postfix) with ESMTP id 780F080018 for ; Tue, 23 Jan 2024 21:04:53 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kTLQxjVS; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706043893; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mH6Q8mXoj5QFsHrSFXa0Fq0YxBOnIiyggYtErbKhYfQ=; b=VsbksRg+qCtjWju6cqTvPGc4mOx19OUm5ra98zdf5E7+lRx0bUzoRq6epgai810q6gastS +ULUptRk9zb1bHxgvCdP4iaeHXiluYqq+Sm+ZKITealkJNaUMd4vdc4KEwVNYeH2uffpPi 4JH6JbcG0Kf09z6GAmt03+MPu2svc9g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706043893; a=rsa-sha256; cv=none; b=0wNw3cLJmzOxBX94fXHtB5Trump0KMSQRr56U1YBFXVWnsdsqgnmw9kFXjCwInQjpEwjY4 tzIJBhRQJZU9TabF6roITC6bltF8PcUR3fDa63OnJdJnJjxR9jDxLzlCcTeJ3g3yZh6Tpw VcHhN6ioOYLZKaYDe9t34yd+sbvV3Rs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kTLQxjVS; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a30b2f032b1so226537266b.0 for ; Tue, 23 Jan 2024 13:04:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706043892; x=1706648692; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mH6Q8mXoj5QFsHrSFXa0Fq0YxBOnIiyggYtErbKhYfQ=; b=kTLQxjVSy+wrdRaOTgsczMaHzyGFQun8g4tLe/9HMPc71hG0Y/pN43W0UoLHTEw8HH E85jIbQ7quYa0mshToQ5Gu8dy2q7fYTwVGiYdt0y2SfzeLQ9Plh9Ff2D6dBbCTrveavw /EfS5uPcZMyXCHI/8QB3S1V6ztbJV6emV25ywqS8J/ND6mawn9B8bwZPMvIKnL07o3rH 6LzwK6v5NfRpffcNgYfevg/68wnM2pjCUHvZmNDXu4YUTiGcASX55bfa7hOcYgQXF3R6 Bjk6egYVti9Y7XaZkxPN+3ezxZcEAjYBSdg5J7r4qPsKFmpMUNpzagBabFic4oe6Th4v ociQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706043892; x=1706648692; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mH6Q8mXoj5QFsHrSFXa0Fq0YxBOnIiyggYtErbKhYfQ=; b=HdvOHov2Gtdox8SYnVQW806ipusbYIdXskt/utZnYQqrVpqoRhFTTryo04beiS0Sn9 nz78TSDc3O4xFjFenU7gflhf0HGOF3WFrp+5/BCuouiv7jp6XXOQ7LM/vtcNEDqDfOsW mAm9XMjUkpfZglD/ggcP3P6T/+NzYVerBEcVT8cYs2Tn8/1NObYlXdG6PSoOlogGo+Wt 6yvNAEo7+lM7l6xs+29Vtp1hIAnHNmCwiyi2r8eNUgLeLJXY0fFKDJtQUl0vBG9ZDCxS MysU4dYQCHaq/ifVoCQ6c8Xx23JCuIGshMI9nNKLs4d0X7YADwFzuNXyhxENiEI7dTeb Zk0Q== X-Gm-Message-State: AOJu0YyOCItqo7IFCe4Gr32S6BIGLN5sNEb49okpsbLSVA+TbqJ8Kqve 9qt4l3wDYLc2x4gJcpz+a+W3scPyCJ07R55ttkYDaeowgeNkj/hmTQyKtFmgay4PNLt7GEYab7q WQKwwff8nKBBmpkM6Do3SJURH39ifOjmVKXJm X-Google-Smtp-Source: AGHT+IEO8kzTGXcdOGqKIS/vbQryVHEh7kUyOxjZhCV3nLSsBO7SzKGk+BAwskEyqe16MD7245agZ71nqKBKsUlPjR4= X-Received: by 2002:a17:906:2a02:b0:a30:428c:b1b8 with SMTP id j2-20020a1709062a0200b00a30428cb1b8mr167644eje.42.1706043891672; Tue, 23 Jan 2024 13:04:51 -0800 (PST) MIME-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 23 Jan 2024 13:04:15 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() To: Nhat Pham Cc: Johannes Weiner , Andrew Morton , Chris Li , Chengming Zhou , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: x66kob4hdaft84qwyu3r78apwj57d4yw X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 780F080018 X-Rspam-User: X-HE-Tag: 1706043893-23748 X-HE-Meta: U2FsdGVkX1/l2rFOaGrTHFhjwfT80o9ylX4eaEPT1RbUwNeXhy9jzn1R8mCCgnGe14d5z7c2NIWYV+DZMz00nTADXYcRGTKfwuZZOL2BeN5hUonHO0X84lEAdMcZ0QndYrwaNqGNiziwIdgzh7HLkEXTVQFsO4acEM4q93+mx0GUn4ZJxBkwcibjqbkwMiJtaNE2mDsKOOfywKZvP+3PXeFaae+h0HB7EpTky5z61xEIoQpKXGalIrZ0U96zfOrWSNprHhd2/u6n7gfox/Bmd5nPg1+MtYqobQLcH3dLKpkf5/Z+TaCxatPLgqDvrn3z7M49hPUmGm5Oi3ThIZWUMwnx4eRD/rfk9El0xLi6voNyR6tkr+grnfRJDBz7ZKdh4GHzw9SIl6ZpJAA4EEAekuXe22UjevbLX4PyVCYman+V04QjLuIv/y87/349nLYt0WCLbCYZZE2PXcZpMDV23xwZ7n3M2rdaEZPOxTWnPqon81OvT3SoqUdqa5FZ6PDLNZeTApXNDIoFvKS76TZYsL9S3Q2qYaCGOfmPnocfgDlYtGFw2XJhSjxwGta2QRDV7ZuwjvpubHhGzsv/rZIUh1KeLtNpsgxDEtT1o0krJaAII+rASFbOQdvJGadMSv071TzaMH6eR6jG9UlQucIm+3QPeF36GkP9sG2Cx2U34W/F7Xgb8CjIKrc+UMMtvaIpu023ODD36z37tYCHo83RTnTjjU9PblvXvfFiSgJOJGUc0THWif5D3pwBWrnDID+PoJ4Znj4sI6jy39WChWczF+q9qFUXc0JPmUVARgFJvCrn81qST+DOdNTwse66jfqspRWevpb8bS3/H+Hn3iHagCfgiuH232fwhliP28i080EzJ/wBKGZR5aPk+CvcLct0uen0xtf6B+4IaTCXkT+qhhn2TxhfUNefn6ghBUlmJQ/Y8TjImTgLXf4GCN2UMJ3ld0wTi9rgT8a0VNjtK92 KpnSF02U nvAevtY2yacvVEIx4Ny/3/6zsO/RNXK5skvvw19EhoEb92xzIMDBkY10z2+L47QlyqnsOMNqvh72lgJ9HgOrlbQz59ZuyjQEa50diX7WV7/PmuBgO3couCLYYZe5qURtv0ZnqucRLpaiq3jT2UogzQ74cub1AH0b68q3Ljn7Pyg78gi3GTdxZLnwg3TPDuf4JRhkxS0A9suEGoRJL7OrK+Tj22iWSmNC61KezAupu3Gy6KTHDwC26H+1kpDJdXc+TBoqJVq5LGejBGkqrNS8LugFh5VLJ6ZbuUXodGbvADqtdV8Ci1gu2xM5rFn9jbtrRKWUP88GGC48Zr9Y0k2V59b2UCBbyBgnFQtZehsSijbj2qjDZO0Zx8rr44cJbmWlXoDme X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 12:30=E2=80=AFPM Nhat Pham wrot= e: > > On Tue, Jan 23, 2024 at 7:55=E2=80=AFAM Yosry Ahmed wrote: > > > > On Tue, Jan 23, 2024 at 7:38=E2=80=AFAM Johannes Weiner wrote: > > > > > > On Mon, Jan 22, 2024 at 12:39:16PM -0800, Yosry Ahmed wrote: > > > > On Mon, Jan 22, 2024 at 12:19=E2=80=AFPM Johannes Weiner wrote: > > > > > > > > > > On Sat, Jan 20, 2024 at 02:40:07AM +0000, Yosry Ahmed wrote: > > > > > > During swapoff, try_to_unuse() makes sure that zswap_invalidate= () is > > > > > > called for all swap entries before zswap_swapoff() is called. T= his means > > > > > > that all zswap entries should already be removed from the tree.= Simplify > > > > > > zswap_swapoff() by removing the tree cleanup loop, and leaving = an > > > > > > assertion in its place. > > > > > > > > > > > > Signed-off-by: Yosry Ahmed > > > > > > > > > > Acked-by: Johannes Weiner > > > > > > > > > > That's a great simplification. > > > > > > > > > > Removing the tree->lock made me double take, but at this point th= e > > > > > swapfile and its cache should be fully dead and I don't see how a= ny of > > > > > the zswap operations that take tree->lock could race at this poin= t. > > > > > > > > It took me a while staring at the code to realize this loop is poin= tless. > > > > > > > > However, while I have your attention on the swapoff path, there's a > > > > slightly irrelevant problem that I think might be there, but I am n= ot > > > > sure. > > > > > > > > It looks to me like swapoff can race with writeback, and there may = be > > > > a chance of UAF for the zswap tree. For example, if zswap_swapoff() > > > > races with shrink_memcg_cb(), I feel like we may free the tree as i= t > > > > is being used. For example if zswap_swapoff()->kfree(tree) happen > > > > right before shrink_memcg_cb()->list_lru_isolate(l, item). > > > > > > > > Please tell me that I am being paranoid and that there is some > > > > protection against zswap writeback racing with swapoff. It feels li= ke > > > > we are very careful with zswap entries refcounting, but not with th= e > > > > zswap tree itself. > > > > > > Hm, I don't see how. > > > > > > Writeback operates on entries from the LRU. By the time > > > zswap_swapoff() is called, try_to_unuse() -> zswap_invalidate() shoul= d > > > will have emptied out the LRU and tree. > > > > > > Writeback could have gotten a refcount to the entry and dropped the > > > tree->lock. But then it does __read_swap_cache_async(), and while > > > holding the page lock checks the tree under lock once more; if that > > > finds the entry valid, it means try_to_unuse() hasn't started on this > > > page yet, and would be held up by the page lock/writeback state. > > > > Consider the following race: > > > > CPU 1 CPU 2 > > # In shrink_memcg_cb() # In swap_off > > list_lru_isolate() > > zswap_invalidate() > > .. > > zswap_swapoff() -> kfree(tr= ee) > > spin_lock(&tree->lock); > > > > Isn't this a UAF or am I missing something here? > > I need to read this code closer. But this smells like a race to me as wel= l. > > Long term speaking, I think decoupling swap and zswap will fix this, > no? We won't need to kfree(tree) inside swapoff. IOW, if we have a > single zswap tree that is not tied down to any swapfile, then we can't > have this race. There might be other races introduced by the > decoupling that I might have not foreseen tho :) Ideally yes, it should make the zswap <-> swapfile interaction more straightforward. Ideally in that world, writeback is handled in the swap core and is just moving an entry from one backend to another, so synchronization with things like swapoff should be easier. Also, I think in that world we don't need a zswap tree to begin with, there's a global tree(s) that maps a swap index directly to a swapfile swap entry or a zswap entry. > > Short term, no clue hmm. Let me think a bit more about this. See my response to Johannes, I think we may be able to fix this with refcounting + RCU.