From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B73D8C47258 for ; Tue, 23 Jan 2024 15:55:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D1A06B0082; Tue, 23 Jan 2024 10:55:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 382646B0083; Tue, 23 Jan 2024 10:55:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24AA26B0085; Tue, 23 Jan 2024 10:55:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 15E006B0082 for ; Tue, 23 Jan 2024 10:55:33 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9F6E116074A for ; Tue, 23 Jan 2024 15:55:32 +0000 (UTC) X-FDA: 81711025704.23.27419EF Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf12.hostedemail.com (Postfix) with ESMTP id B1BEE40005 for ; Tue, 23 Jan 2024 15:55:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JQXBelEt; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706025329; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tpTrcf8Pjdlxs3p3QQII3oZRaQKfL8NQPq9NL8TsssI=; b=llQXNCztY6vEgDVP14b3j+50jObyr0v+2t2h7BImnhOA314bExbzsLCHvOPmFljAs21Qar fSgCYSo6xlvmsP23S97jq2GU3jP9B6vk4IZXrGLgj3/FRLDsMc2w/eow2bd1wyPl7msvat 3FCv6G3+/So5yk7lizBqQaQHTSvo0X8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706025329; a=rsa-sha256; cv=none; b=u3SwdXXEtp4dip0SwWMEMt+DndNgJIpiNYp5eBxj0lqzYGQJ5kZS/5kiTD8g7Ax9WDI85K n5PSpLoC0NHcW9hsN7h+HSMqdOjsuD5gUK4bEMCZEDAKRvJNk7Djr3SOMVu5phBOYRAnAD gTAxBcSXNJMu2Dp5JRLOdMS+c2JlabQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JQXBelEt; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a30799d6aa0so200105566b.3 for ; Tue, 23 Jan 2024 07:55:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706025328; x=1706630128; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tpTrcf8Pjdlxs3p3QQII3oZRaQKfL8NQPq9NL8TsssI=; b=JQXBelEtvHSf97s1PjsWVG7bdkQgDEwVrfJCFtTcZeZz17KnzT0Ag5eXjuf6/r5UpJ HqF/4PbffVM8+pkuCmvdjpk2q3TS2iZSaYEN41l9q8nttMi7v3ANODHcNEnmIsl7Ngow 4B4gJSHYEoETSDf03Qy9bbYxz06tpJE/svo8kVJd5Hm1nNTrDspBcBSjKXcK9AyPVU/+ QpoRmLQTHVCow0BpwB0RIZ+3WQYyrMw4f0kZqPWWn/qiIUp/gEa5R9NhVFuq7Il+a5n6 eXBZ9aEMMFrWB9+1Ftmui7YPDLnLESmeaFJ2N00jLZ5BgKC5Glhbf271j+TzcSCudNFl 1uEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706025328; x=1706630128; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tpTrcf8Pjdlxs3p3QQII3oZRaQKfL8NQPq9NL8TsssI=; b=iHrv6mV7mnnueku9urov8nnUyNU76XmLr6JYGe/lYHqisrEutqi1rXWgqJpGlbnKv5 JUe9JyHv/UTu8BTKUwPU/vrI6UH45ltIWP+T0zHL1BZfFjlyvW4w5ZAq+UK0jE1BoECs Lp4Lc3B6Kqa/m+ncvc7JGjJTZ8C552IaQGCzCUuba6E+gTrkjZGhLMqhk0gLet6ttkWy HTXhKHqz6jwoVMBcPBd4RNgwxkTJOHPVKYXbhiVdIQxU36oMjuyY4kwV4GTU6hwXUcQ6 LNVmaddZlYdqPaXSUokiBk32RXnXpjcE+IA6zkew9x0NrROeI3tMgMXi+hz3neFli0ut QB3Q== X-Gm-Message-State: AOJu0YzaF5fjq5xHkj2YnK90/FY32XTTyAMrVwJquLPU8GLoCnG0E3Up 3B0jZVp13pGx6ioAPZXmi8gViFHhBt8QQ8Hwy3DlEzQWBNwIpuaQXOoTK3vq9ECbJP4KoUMUaap raa2B4Z32UsOnPe+8RI7x/fWPQaqOVZP1Ia72 X-Google-Smtp-Source: AGHT+IGk8v5B/HHFrp+7GSfwtJ8LCHem3Bf/VDYildiIzU95CdyFbAZ5Ll7ejvV4C4yLz+NGKv0w+myj7Ph4E6QOLCA= X-Received: by 2002:a17:906:46d7:b0:a2e:cac9:97ca with SMTP id k23-20020a17090646d700b00a2ecac997camr37174ejs.85.1706025327705; Tue, 23 Jan 2024 07:55:27 -0800 (PST) MIME-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> In-Reply-To: <20240123153851.GA1745986@cmpxchg.org> From: Yosry Ahmed Date: Tue, 23 Jan 2024 07:54:49 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() To: Johannes Weiner Cc: Andrew Morton , Nhat Pham , Chris Li , Chengming Zhou , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: a1h36m45e6t5jxoadz4hiu33qsegnoh7 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B1BEE40005 X-Rspam-User: X-HE-Tag: 1706025329-770210 X-HE-Meta: U2FsdGVkX19x5q9UlucvczE9QI1KeM41qtyB0232gpCoOKJz/oNG5wSVYIQdQBChl2nHBENwmW1O4fPGarjct0Z8Lv2dwyC85bbM6SJDZYu6204ttMYW0T7UdIw7w3RoERi6EXwyun/UhvwmOBaH8ejqfS3Lb0OLy3ubL2rq19NkQwwlrhX3+qFywVL4Xfqa+VodT3Hp9jYA87WcZf/xhRvILR/J8JAx+3W1Iysh7TmVGi0/tBbV/JinDKHBa6TuT0gz5g3ccW3SKW+UZ2MSx4/wgWQumaIyWEauRCFKofVRho+/uqGyPi1jyG3c7tnXtDaBLG2fxY/+KV0FcPd02cpDK/GMJdKUhjuNOqz+VWZYWbaXpXT/Wunj9C/RjCUfAQoBF9I6x7LdxLmgXL7ZfwGS6eO7HtBRNf/SYUf2oeM9zlyT4VqbtihCWQO9NjPLvJlsWvQ+r7xd5DazYSt77FhUmEfVgVTF2ZYg/jsfzXcSF+WD7S7N/5qOhun3MEl5T7Reg99MJmHjzGrYNe5dMP9JFuZyXpU3pKQNTK67LuM+f4ZGUotZonN16+oaBRbaWvylTATBbDcdh0dX+xMvTTIlLXQj1REeVuSklXbV4lw8H1vOPyvsIFDXS+shIXW/gSdjYDdfIvqFLt3gjJIe3WG6R3UC2w0I0SHxGZ3xm2V5fOtyQ2pcl6LtME8ibjzL54ay7IUZCfIUGLRhYlEAIoWM+pYFJzcFav1rHVN3mACOKgTXfcl5tRVB8BQW23WE6BIIAeuHh3iSIQVcKeLYNpR+nJ00RTpww9tay11GNMhq1/ua6TJxZw5Yn7WqJ9MHo62LKpvKnFUNBq7x5AA2DgdI8T6J59KCbtPLNtwn1bX+oiyyc5QLBS0glREKzUUmtxXONacokYedaB/6gPWTvmsAaR/F5paTbR4xDWQsQf3xLefDzk4KvPBZ9dxZrsipCcRbwqBb336kjgxl5z9 YPNhXlBy sJnP/lAZ4juk+KPFgrzF4BbWbwH3Aiu5ERuimQa38APrzFOUf08oiaquCsThkpcKctA3cOraxr/tIhD5TjYinGCYbXEGVLZa+E3f8Wl/MfHBRHdfbXS6u9pw3GFbE2+Qe6oPBkZ57eRivx8CNt1mscDjrFyscul5Bia5MqPrWiQPt6vTpf6sOd7jaJPYRvCQp/NJaAKLVrBi5X1ZAuvZHKO6xKvuzXX78lA6FdGjRbVsIAbWMUUYpqIfjkyiZBhyC5UePKQAdFg35HZrBpvFYE30fXV96FOtNrGgQaM34eunuzS5jUJCEn41XqdeG1JaJLO+MFFNOSAXqSrJx5WDj8qoyYMFID1JNm8ub X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 7:38=E2=80=AFAM Johannes Weiner wrote: > > On Mon, Jan 22, 2024 at 12:39:16PM -0800, Yosry Ahmed wrote: > > On Mon, Jan 22, 2024 at 12:19=E2=80=AFPM Johannes Weiner wrote: > > > > > > On Sat, Jan 20, 2024 at 02:40:07AM +0000, Yosry Ahmed wrote: > > > > During swapoff, try_to_unuse() makes sure that zswap_invalidate() i= s > > > > called for all swap entries before zswap_swapoff() is called. This = means > > > > that all zswap entries should already be removed from the tree. Sim= plify > > > > zswap_swapoff() by removing the tree cleanup loop, and leaving an > > > > assertion in its place. > > > > > > > > Signed-off-by: Yosry Ahmed > > > > > > Acked-by: Johannes Weiner > > > > > > That's a great simplification. > > > > > > Removing the tree->lock made me double take, but at this point the > > > swapfile and its cache should be fully dead and I don't see how any o= f > > > the zswap operations that take tree->lock could race at this point. > > > > It took me a while staring at the code to realize this loop is pointles= s. > > > > However, while I have your attention on the swapoff path, there's a > > slightly irrelevant problem that I think might be there, but I am not > > sure. > > > > It looks to me like swapoff can race with writeback, and there may be > > a chance of UAF for the zswap tree. For example, if zswap_swapoff() > > races with shrink_memcg_cb(), I feel like we may free the tree as it > > is being used. For example if zswap_swapoff()->kfree(tree) happen > > right before shrink_memcg_cb()->list_lru_isolate(l, item). > > > > Please tell me that I am being paranoid and that there is some > > protection against zswap writeback racing with swapoff. It feels like > > we are very careful with zswap entries refcounting, but not with the > > zswap tree itself. > > Hm, I don't see how. > > Writeback operates on entries from the LRU. By the time > zswap_swapoff() is called, try_to_unuse() -> zswap_invalidate() should > will have emptied out the LRU and tree. > > Writeback could have gotten a refcount to the entry and dropped the > tree->lock. But then it does __read_swap_cache_async(), and while > holding the page lock checks the tree under lock once more; if that > finds the entry valid, it means try_to_unuse() hasn't started on this > page yet, and would be held up by the page lock/writeback state. Consider the following race: CPU 1 CPU 2 # In shrink_memcg_cb() # In swap_off list_lru_isolate() zswap_invalidate() .. zswap_swapoff() -> kfree(tree) spin_lock(&tree->lock); Isn't this a UAF or am I missing something here? > > > > > Chengming, Chris, I think this should make the tree split and the x= array > > > > conversion patches simpler (especially the former). If others agree= , > > > > both changes can be rebased on top of this. > > > > > > The resulting code is definitely simpler, but this patch is not a > > > completely trivial cleanup, either. If you put it before Chengming's > > > patch and it breaks something, it would be difficult to pull out > > > without affecting the tree split. > > > > Are you suggesting I rebase this on top of Chengming's patches? I can > > definitely do this, but the patch will be slightly less > > straightforward, and if the tree split patches break something it > > would be difficult to pull out as well. If you feel like this patch is > > more likely to break things, I can rebase. > > Yeah I think it's more subtle. I'd only ask somebody to rebase an > already tested patch on a newer one if the latter were an obvious, > low-risk, prep-style patch. Your patch is good, but it doesn't quite > fit into this particular category, so I'd say no jumping the queue ;) My intention was to reduce the diff in both this patch and the tree split patches, but I do understand this is more subtle. I can rebase on top of Chengming's patches instead.