From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 843FDC27C53 for ; Fri, 7 Jun 2024 21:14:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B602D6B00A1; Fri, 7 Jun 2024 17:14:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B10726B00A2; Fri, 7 Jun 2024 17:14:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FE0A6B00A3; Fri, 7 Jun 2024 17:14:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 82D966B00A1 for ; Fri, 7 Jun 2024 17:14:15 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2A79916197E for ; Fri, 7 Jun 2024 21:14:15 +0000 (UTC) X-FDA: 82205345670.12.28A1D4A Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf11.hostedemail.com (Postfix) with ESMTP id 979C14000E for ; Fri, 7 Jun 2024 21:14:12 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=rFbKTBnQ; spf=pass (imf11.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717794853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aKF6wYya6Q22raEpor6hnc5YhkNmlQnz7zwC+f3FYs0=; b=g7AumxHE+0L5vZJsxfZpJideuJEgNOdm2KKH77XkUHJNWmFmT3bZr11WCgAV+DZWzMOZ6r 3p6FbfQlWElpFr9iaOMjK4wIuEpDxi01UNp7UQZU5EhEeLwAsEiisf9ParEgofInzVN7zR ffClczZtyfVtWZRYO1uEeG+LtkIBFzY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=rFbKTBnQ; spf=pass (imf11.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717794853; a=rsa-sha256; cv=none; b=8RXpjQmMUFH8LIaWgxnfH9gyD0xmntG9SXPFhOLJ4fZF2JMcf1aSWl3OYGNaLyIzASrFdT EfZu6nqeccRvuSFkJBZHXMQW5GM/QpRzKqTBIUehye0rTmTHDPoyjpm9cUuZ5C/2WW6XrA Sh/U1ldaJbQxzO6+UrtOuL9oSutTF7s= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 0CAEFCE1D25 for ; Fri, 7 Jun 2024 21:14:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F6B7C4AF07 for ; Fri, 7 Jun 2024 21:14:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717794847; bh=+F1gR/dF9qEEeQEG7p+hJ6wJAsGgF2g2/pyS9AE0bOA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=rFbKTBnQOqZ1IL0pKtYSkXZfZRC7/3LMGtJtC2ljhd8mslHEuwYpCmMtIkRD03pEc op5rZlVeAGRQnAtvwTM0GEZFCXEKfiHEGuAd9UFZAYsJ/CwGCnu3xUY8PdrbIg7t95 h4GQu30T4bpPKaBFzi5Ne5QHw/5cmkj4pVamGiEeJo3OuyDVJJ7mudoX2i4XhAz9Oe K2oC5JhgXSipPglBGlvTYGDQE9sCxmcLyleIwvv3ehP4VMgyhYgUH4L7LcbtQicutk XSlBXMV5YFUHXP97nj1auiVSOJ72e8Byl821lgJymc+Z3XAEcmeVn/eiVeKNKONSj4 puFQE4R2yc7Bg== Received: by mail-lj1-f170.google.com with SMTP id 38308e7fff4ca-2eaae2a6dc1so50101401fa.0 for ; Fri, 07 Jun 2024 14:14:07 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVZIV8P07JQcY67ol3a7OvA/TVMZfrjASRyatcdPFY8XlmHxRyM5/kHKFBD2mNRc9RSjGb0b/hBqAxOBkJvKbFBYXc= X-Gm-Message-State: AOJu0YykvSkVX8nhq9RzfhzrLSlahA70jX3uVsteC0p6iaWI0AqruAJ2 qWodElACo6IQEmYu9Q4JyvVfwTBPaak/pGuHPVRfLieoIiEra/yLG7gjPkYoRzI5J+taVS5wOMA PuLLiBWBpg+cgs2MDNfhoVOEVKg== X-Google-Smtp-Source: AGHT+IHHbwv2ef+TwD1olLISbgN32rX4SJw+PxqN0i4K6h3fv6l1Rz9SwNIbpusaKtmI6pg1LWA/DtkMtXIdngK0iZw= X-Received: by 2002:a2e:3612:0:b0:2eb:17fe:a144 with SMTP id 38308e7fff4ca-2eb17fea1c5mr7866711fa.34.1717794845911; Fri, 07 Jun 2024 14:14:05 -0700 (PDT) MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> <9374758d-9f81-4e4f-8405-1f972234173e@redhat.com> <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> In-Reply-To: From: Chris Li Date: Fri, 7 Jun 2024 14:13:54 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: Yosry Ahmed Cc: David Hildenbrand , Barry Song <21cnbao@gmail.com>, Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 979C14000E X-Stat-Signature: 79wc1nohx6kujhd8jipe3bzohqcz9fqc X-Rspam-User: X-HE-Tag: 1717794852-655303 X-HE-Meta: U2FsdGVkX18kzYwFb1C8ZPUDt7QePvJyMmMQuOoE4y1IrIQBMix+O0uUyB25hKZPDluZ3dpey8t3xu6GOX51zc8qbMuGMxaqEvt/1uytAhydXgODiDm/uVBtnzG/bkGS1QKvhFMhDT6qLYGjZBiaQWiWTCyVpQ9wuehUN606anjHvz3WJ992QWGB4UA3f3Xt09u3MIzPvUqf9atfDT1+UTiNz1VGFDqrIAsDM2wTn1pGUV68QUUQnM65lT8SkTp5O1WI4Xzewj9srPwtMwEQonQQHSrKAJWX7222J+XvF1U3w3oeiAjBNhThEFu/s2f1svn6RHwn1kFfjkHgebqw6mFJ4iBwq+ViqIcpGdwNoVULdesCy4melpn5Uod6lxPNvitDC0ddJYW840n1HQMRAgjMnAlK10KTF7jx/EoWdCySw1O7kShbpP3CDyqq2MO6iJXyo4cneSb3T3J+x6WaDoPDi87S9cy7vFfR7E/Vsgh6CuDmTPGt6jZGrlEnYSMNwNNZU+SbC6PlveJqL5ynx2vFoOc8EMLt5Mwm3Ey6StzclNnQKfmwHHDi0q/KIFVs/8XKFB1Exm0HDe3I18BRuKfDO57ZFeD291y7YHvO8xcKJ0PblVOTsmr6irKDCKB0ZQ9YqJDnO1sIoK7fZSlJvy2HYanS9RMYPVoBkLy7BXdU2QiFI9jc6Z1hM5PntqbioTIRHPssDFwJSHbWHh35hjJvU2jc2hCQGmONRO+in7j1FY1Beej4kj0YQEBfQNWDnpTVJPFSNQeesVF3petWzfmOqDKIf7zhHwBi/ef1sbxubbemY+fuOMdCCvO2FZpNVgYG+PlAgII5YAhuM2WuqG7zjCRXxauiF7PNL8xvdpWw7PpsgqAjBFSsYhjYaH4yojzPcpe1w836y33/UhFWY0end2280EL6ACwyHGCR6TRKfZqAEUqnbAUhvz9M+UV1PCx5NPLcgjzdPZ5M6dK fK0waQwl nG/mMLxI8m0w6J8eDOzn0NiDqfsvgjgbGAAhcsnNzfMO52cmQmBe4FJsHvThLuEbWImMsiPEXtLsHYccVECMlPJBAdu41VTLieR0mTBjkZZcuXQYN8HhWSUQ1H5bYLVpYX9ixOWIziRTDiVJon6BLowL/WYvKMl13oFMr0MATVpZ81cuVeBK6CEO0utODxktsGW4YXF6RyUvang3N5107cSeJXWtKLTISlPPbg3MojTPRyJwZ7grtRjdVa3lFjU7U0OaWGW+nxDY0Gf0vlsAVLbFiHf2saI6yVlFD6ta9gJsRavvkGteGMd/9ew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 11:58=E2=80=AFAM Yosry Ahmed = wrote: > > On Fri, Jun 7, 2024 at 11:52=E2=80=AFAM David Hildenbrand wrote: > > > > >> I have no strong opinion on this one, but likely a VM_WARN_ON would = also > > >> be sufficient to find such issues early during testing. No need to c= rash > > >> the machine. > > > > > > I thought VM_BUG_ON() was less frowned-upon than BUG_ON(), but after > > > some digging I found your patches to checkpatch and Linus clearly > > > stating that it isn't. > > > > :) yes. > > > > VM_BUG_ON is not particularly helpful IMHO. If you want something to be > > found early during testing, VM_WARN_ON is good enough. > > > > Ever since Fedora stopped enabling CONFIG_DEBUG_VM, VM_* friends are > > primarily reported during early/development testing only. But maybe som= e > > distro out there still sets it. > > > > > > > > How about something like the following (untested), it is the minimal > > > recovery we can do but should work for a lot of cases, and does > > > nothing beyond a warning if we can swapin the large folio from disk: > > > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > > index f1a9cfab6e748..8f441dd8e109f 100644 > > > --- a/mm/page_io.c > > > +++ b/mm/page_io.c > > > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct > > > swap_iocb **plug) > > > delayacct_swapin_start(); > > > > > > if (zswap_load(folio)) { > > > - folio_mark_uptodate(folio); > > > folio_unlock(folio); > > > } else if (data_race(sis->flags & SWP_FS_OPS)) { > > > swap_read_folio_fs(folio, plug); > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 6007252429bb2..cc04db6bb217e 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio) > > > > > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > > > + /* > > > + * Large folios should not be swapped in while zswap is being= used, as > > > + * they are not properly handled. > > > + * > > > + * If any of the subpages are in zswap, reading from disk wou= ld result > > > + * in data corruption, so return true without marking the fol= io uptodate > > > + * so that an IO error is emitted (e.g. do_swap_page() will s= igfault). > > > + * > > > + * Otherwise, return false and read the folio from disk. > > > + */ > > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > > + if (xa_find(tree, &offset, offset + > > > folio_nr_pages(folio) - 1, 0)) > > > + return true; > > > + return false; > > > + } > > > + > > > /* > > > * When reading into the swapcache, invalidate our entry. Th= e > > > * swapcache can be the authoritative owner of the page and > > > @@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio) > > > zswap_entry_free(entry); > > > folio_mark_dirty(folio); > > > } > > > - > > > + folio_mark_uptodate(folio); > > > return true; > > > } > > > > > > One problem is that even if zswap was never enabled, the warning will > > > be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable or > > > static key if zswap was "ever" enabled. > > > > We should use WARN_ON_ONCE() only for things that cannot happen. So if > > there are cases where this could be triggered today, it would be > > problematic -- especially if it can be triggered from unprivileged user > > space. But if we're concerned of other code messing up our invariant in > > the future (e.g., enabling large folios without taking proper care abou= t > > zswap etc), we're good to add it. > > Right now I can't see any paths allocating large folios for swapin, so > I think it cannot happen. Once someone tries adding it, the warning > will fire if CONFIG_ZSWAP is used, even if zswap is disabled. > At this point we will have several options: Here is my take on this: > - Make large folios swapin depend on !CONFIG_ZSWAP for now. I think a WARON or BUG_ON is better. I would need to revert this change when I am working on 3). It is a make up rule, not a real dependency any way. > - Keep track if zswap was ever enabled and make the warning > conditional on it. We should also always fallback to order-0 if zswap > was ever enabled. IMHO, falling back to order-0 inside zswap is not desired because it complicates the zswap code. We should not pass large folio to zswap if zswap is not ready to handle large folio. The core swap already has the fall back to order-0. If we get to 3), then this fall back in zswap needs to be removed. It is a transitional thing then maybe not introduce it in the first place. > - Properly handle large folio swapin with zswap. That obviously is ideal. Chris