From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8934FC47258 for ; Tue, 23 Jan 2024 20:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7D046B0074; Tue, 23 Jan 2024 15:12:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2B206B0075; Tue, 23 Jan 2024 15:12:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1A116B0078; Tue, 23 Jan 2024 15:12:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8EF0E6B0074 for ; Tue, 23 Jan 2024 15:12:44 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 39A061A0B38 for ; Tue, 23 Jan 2024 20:12:44 +0000 (UTC) X-FDA: 81711673848.27.8B26EDB Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf09.hostedemail.com (Postfix) with ESMTP id F192C140026 for ; Tue, 23 Jan 2024 20:12:41 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="ItW/muY2"; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706040762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q24KVFfei+cia4T3WP6Kgi33KWd4bcIhvTEH02DjEqs=; b=x9DtRvG5M2eU2OfJoUENGidFXhDe8wprSxixy5DuC5UKNDW6H+OwraBY+x9fxX51C2OY/h 1nP8s0EhXhXhVo7tO6AsHImX7lmSkC+3JgAIWyiwPbu4mK+MCGsaE6JGRthb4biH2PwECZ YJhIpMPcmW62QjD/37UAonVNQgKxyD0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706040762; a=rsa-sha256; cv=none; b=F3UWm71ufIL+9HP4aKV4Tcppkq6x+wmpg8vy9jdrggqtEwddoPkiO+OgtIPkuM/Mtgn0wt bxfdmMkhSpj6G+tCTJsS2uA00JQIsqOIT32fefvjf0OcoRGDFx6K2oKnpI5O21V6JxYo2h bwQB2cLLzu7I7AQT9D/tNkIPL/JzlzY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="ItW/muY2"; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.210.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-6db599d5cb8so3739569b3a.0 for ; Tue, 23 Jan 2024 12:12:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1706040760; x=1706645560; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=q24KVFfei+cia4T3WP6Kgi33KWd4bcIhvTEH02DjEqs=; b=ItW/muY2qh4PyLN97rdllL8oOUuyEF/P/1SUnUT1Kigo2G+E46A6Oan6JnfyiV7UGg qJAyuEenKSo7w2WsVH5k/r4i7oGWo/68SdqydJAYi21xgu5xrvxCWewKTF2A/LDHUvXP s4kooKlTvN1xYcAW0C2ZfSwqAUl+rSRzsssbATe4ua5Admb5sdMwdKqHJRny1Xs6LlOC tVHPbVUd/nDOBRw3v3mY0lA7wtGbqXRShUYyOsmHpcUaiBFK7VpGOGRSWL+fP/lGbvkx h7dTqGw2Jz5B57FSzhRnRZS12Y5bSFEOCCIlNS9aOFP4elngrKdM75OMTkyiZhnslt0a UxbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706040760; x=1706645560; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q24KVFfei+cia4T3WP6Kgi33KWd4bcIhvTEH02DjEqs=; b=kSrYbp7bO0gVpzVY/oJGmlp+zVeyfgCzdMANAbFKXtjINLK7CHkkWgSg3bAdsQsbtM TsPLsXtZYNLKg7zj+1xhDgCKnLICl+f+ICtu7vQd8jA/LF4SP2v6RddC1mX8/1mfgdNN sKy8m9vPZoF1eTT8sYm8pEEx0I7h0ONwfWVmoEhX04v/HFL5MjTeqJgv3ADETF1LvmZo JfrkLOILNyyl9ls2I8MoZbDZa+bSyqobfnla4IYqYhxedk8jxJgK6ukXzLn992O0QxA6 p8vaHjvFC3sq3ICnI1AD/xr7gDKYIvmtJzw3e6FmRhYDBovRzuYBDxwl3LJucpGvGwZA 0hHg== X-Gm-Message-State: AOJu0Yxy6RYNvC7hTIyve8QpCNGy+MQInA+TR8JwgcI8rsvb4JCah365 Q189U8PUjW64BlQhEY1NfLhUzGiZhkyisL3Yqi6nGC0Pow0JgSADDPmbqX+nzqI= X-Google-Smtp-Source: AGHT+IEcQiYu55Hik8wDlo7Ps+NIXOV8U19eTqLxlMvYghAEPGspeoISpv6RbsjOhufVhuWEsZ5o0g== X-Received: by 2002:a05:6a20:28a3:b0:19a:3d26:d40f with SMTP id q35-20020a056a2028a300b0019a3d26d40fmr3266995pzf.39.1706040760622; Tue, 23 Jan 2024 12:12:40 -0800 (PST) Received: from localhost ([2620:10d:c090:400::4:e07e]) by smtp.gmail.com with ESMTPSA id x126-20020a626384000000b006dbd7e5bd1esm5508869pfb.52.2024.01.23.12.12.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 12:12:40 -0800 (PST) Date: Tue, 23 Jan 2024 15:12:34 -0500 From: Johannes Weiner To: Yosry Ahmed Cc: Andrew Morton , Nhat Pham , Chris Li , Chengming Zhou , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() Message-ID: <20240123201234.GC1745986@cmpxchg.org> References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: F192C140026 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: fry8m1jfg7q3m1w7f8er9k5b3kxonqn8 X-HE-Tag: 1706040761-883237 X-HE-Meta: U2FsdGVkX1/oQ0YfOwa3iuSTekx6UJwSd/5e3uFwrOTtyCWxF7tlYyXH07rG+YJ6dW8bBrC+zgUdsaHPm+a5nnVeWsP0mGpE5vmHn3bG5uLFJqFPCyXvTMRw0YfCtha3m6kPCA7C8brWeEWTJACP9GkWg3Ut1c/2Eks+CIJ5w1/kzthF2f48d0pyVPtJ4q2i+sViFPRYV/L4eieiC6L1uVnD/lUlDCgHrS/3+Ju1yiCwF0Iilo25acefHWRzkYAd6JbFHFIpKeqD9GFCCnraIR6LUXH+F4X+o2ZQkf/rVfZstOKJV9/nuJ9StL4wNyzTwe7TIIyAn7PahSkvX+bfR6KuSIOMLAfm2kK311qXJeH2jnyZpDKKkpSv6Orv/a9mzrmKanWVO9sxRjpay0pNNBwrhSoQtysjlEoRlX0axgR9Ubf3wqHuwt8ukua0JaOlNCTyrEdvdQAxtorZgH3Fq16dmCsRpCiFunLlqZru/DH5xow07RPdSC59hRY0V/jfc9K+38LHfFaLGzJQMv+gN0QpaouAt/iOXSjbWLY5G8zv4O68jZKx9bQn1L2MLCqEDAamVWyc2wmYARJb7nQFLq1n0j54nxZOvEyQKssEu2ZScXkUNX2RAvMH1D9nZnYOBl2Z2mTAzpUKhZX7BgNJaHM9srDCT30C533StO0bofnjx/WVAQTgE2+omGPk2frs/DkJIldrtYEPJWkp0DG75Taoh1cfEiktzmwut8gRFonAkaoqf6U7nq4xi94HaGAjFpeXYR9/xG1nk5OZqTTMzNlD1vft7RA/3pj2/8TLsc/Si2o/v5PpvzWO+ms28iTIuddrWaXoiYcXOV+5lY22gEnbsN8f4g7iN8CNLbYmLXePP+UF+GsyHvygsSlmTQo6kNQ/KFB1czFcgP4YfmMI/R8/8hPtWxouLXNwfpN/FsEsBTInvcn5bc4j1+BRYtcaSIrXCyndqUJm3Z5Z9/a jngoO5jZ +OKZRbQ70wI3wFhMClG9qYSUwMNi/Ws4itHPm1Fluw5LOuE2wcLu1fwiH3ai/g+Hu00jeICMQLhewMkQ/utpk6A59dYD/ROtdQYJ/t11/AtKkEA8yRt9lZCC15VAgPsLJw5hypPnvDqKprJOYTMnAXlbUks1jsGH1RKHQjCl4Ok9aQfvlbq9s5EOsmdU9ybonhR2HOHNgooPp5ycJ6PpPzjWJJTW2rZRLx7usc8Sr/HUtrxAMwjasBhW7v6nWXiSsJtBqN119ivCDEfBKMUV6RzvvAVTMvLfuKmoEeE8YIHLDP9iVpM0HWUi/1AnlZQOpLqNWpeFt+dn2f2YcPrrgfnXIzOzXrzMLBchQ29Y5a4wteD7f8y3hWDnnU8LbvdyR6fWFdsORfBzhCrQPzTwQq+fV/ElSuemY5cre1aYx1si+3c0E12u7nCVOphIihp3c0/lmr+T73EjGU2LRawvQ0hsGyLhfvtENcGsXCQ/6AsnZsDtN9FsYZRYvPcay/gbr77w/CyOm80/Z0izX4Ksl3jEtzM625yNMEFl0it9irrOoLB+7lRqp5iyMHTXPxTxKtAAl0jw9a+8FqrbsdcKMplZXSQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 07:54:49AM -0800, Yosry Ahmed wrote: > On Tue, Jan 23, 2024 at 7:38 AM Johannes Weiner wrote: > > > > On Mon, Jan 22, 2024 at 12:39:16PM -0800, Yosry Ahmed wrote: > > > On Mon, Jan 22, 2024 at 12:19 PM Johannes Weiner wrote: > > > > > > > > On Sat, Jan 20, 2024 at 02:40:07AM +0000, Yosry Ahmed wrote: > > > > > During swapoff, try_to_unuse() makes sure that zswap_invalidate() is > > > > > called for all swap entries before zswap_swapoff() is called. This means > > > > > that all zswap entries should already be removed from the tree. Simplify > > > > > zswap_swapoff() by removing the tree cleanup loop, and leaving an > > > > > assertion in its place. > > > > > > > > > > Signed-off-by: Yosry Ahmed > > > > > > > > Acked-by: Johannes Weiner > > > > > > > > That's a great simplification. > > > > > > > > Removing the tree->lock made me double take, but at this point the > > > > swapfile and its cache should be fully dead and I don't see how any of > > > > the zswap operations that take tree->lock could race at this point. > > > > > > It took me a while staring at the code to realize this loop is pointless. > > > > > > However, while I have your attention on the swapoff path, there's a > > > slightly irrelevant problem that I think might be there, but I am not > > > sure. > > > > > > It looks to me like swapoff can race with writeback, and there may be > > > a chance of UAF for the zswap tree. For example, if zswap_swapoff() > > > races with shrink_memcg_cb(), I feel like we may free the tree as it > > > is being used. For example if zswap_swapoff()->kfree(tree) happen > > > right before shrink_memcg_cb()->list_lru_isolate(l, item). > > > > > > Please tell me that I am being paranoid and that there is some > > > protection against zswap writeback racing with swapoff. It feels like > > > we are very careful with zswap entries refcounting, but not with the > > > zswap tree itself. > > > > Hm, I don't see how. > > > > Writeback operates on entries from the LRU. By the time > > zswap_swapoff() is called, try_to_unuse() -> zswap_invalidate() should > > will have emptied out the LRU and tree. > > > > Writeback could have gotten a refcount to the entry and dropped the > > tree->lock. But then it does __read_swap_cache_async(), and while > > holding the page lock checks the tree under lock once more; if that > > finds the entry valid, it means try_to_unuse() hasn't started on this > > page yet, and would be held up by the page lock/writeback state. > > Consider the following race: > > CPU 1 CPU 2 > # In shrink_memcg_cb() # In swap_off > list_lru_isolate() > zswap_invalidate() > .. > zswap_swapoff() -> kfree(tree) > spin_lock(&tree->lock); > > Isn't this a UAF or am I missing something here? Oof. You're right, it looks like there is a bug. Digging through the history, I think this is actually quite old: the original backend shrinkers would pluck something off their LRU, drop all locks, then try to acquire tree->lock. There is no protection against swapoff. The lock that is supposed to protect this is the LRU lock. That's where reclaim and invalidation should synchronize. But it's not right: 1. We drop the LRU lock before acquiring the tree lock. We should instead trylock the tree while still holding the LRU lock to make sure the tree is safe against swapoff. 2. zswap_invalidate() acquires the LRU lock when refcount hits 0. But it must always cycle the LRU lock before freeing the tree for that synchronization to work. Once we're holding a refcount to the entry, it's safe to drop all locks for the next step because we'll then work against the swapcache and entry: __read_swap_cache_async() will try to pin and lock whatever swap entry is at that type+offset. This also pins the type's current tree. HOWEVER, if swapoff + swapon raced, this could be a different tree than what we had in @tree, so 3. we shouldn't pass @tree to zswap_writeback_entry(). It needs to look up zswap_trees[] again after __read_swap_cache_async() succeeded to validate the entry. Once it succeeded, we can validate the entry. The entry is valid due to our refcount. The zswap_trees[type] is valid due to the cache pin. However, if validation failed and we have a non-zero writeback_result, there is one last bug: 4. the original entry's tree is no longer valid for the entry put. The current scheme handles invalidation fine (which is good because that's quite common). But it's fundamentally unsynchronized against swapoff (which has probably gone undetected because that's rare). I can't think of an immediate solution to this, but I wanted to put my analysis out for comments.