From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF67DC27C53 for ; Fri, 7 Jun 2024 23:14:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A5686B008C; Fri, 7 Jun 2024 19:14:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12E656B0093; Fri, 7 Jun 2024 19:14:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEA2E6B0096; Fri, 7 Jun 2024 19:14:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CF9B46B008C for ; Fri, 7 Jun 2024 19:14:55 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 58466A0460 for ; Fri, 7 Jun 2024 23:14:55 +0000 (UTC) X-FDA: 82205649750.21.B9A3F47 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf25.hostedemail.com (Postfix) with ESMTP id 82BF9A000A for ; Fri, 7 Jun 2024 23:14:53 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="sC/UJmtQ"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717802093; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y/NFoZRrQ3Oe99jHYpj59voH3AVeMqlDyxoL8EDvsTQ=; b=dY5n61kvTQH2gaPXNTA1HYHaB45gaT6Tf/mCTMLooH9qt+sgkglb6zuVJDIxuuXxapgglt lMMD5i5rutC7SdvtaHiNqVECLZo0HmD3K/CYcDQ+KZDPN4BOFWGyDNImUiPvb4a2/ypV5k PK81XwIebNrO0rofcWmWDN9k2fYElak= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717802093; a=rsa-sha256; cv=none; b=RkdtA/ugtMN5VJwNbHQ1L9IZXeSQ6HjTll1Kiqf0rjKyYmEqW5xpfS1hsm9GKcq4gwmtN8 Zhku4YfAkUtx6bEWYTA7s7xw3QMjiT4+DAQMsos6LpCu8a2VEsfXDcZ7xE4y9h4KQMSIzN 5S/NmT+vz0zt3Ru/YFtKay5bN04BtDU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="sC/UJmtQ"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-57a50ac2ca1so3361568a12.0 for ; Fri, 07 Jun 2024 16:14:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717802092; x=1718406892; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Y/NFoZRrQ3Oe99jHYpj59voH3AVeMqlDyxoL8EDvsTQ=; b=sC/UJmtQf8Nrkb8ur6DGkQyA8a4tFMwQEco+F4BYodmEHuKH0I4JdlWZi7W4clLPGm w48d86S3eBtMBua6t9ZIFylkDM5XicPPdNqXxcv8mbXwq3C//hMKMbH0gljP6dDu8QOz NEcgC05YaZwyFHu3lwmtZYqZJh9nyVou+RI7b1MgQngLw4aP1jq0KsRyFU8UM4XnFnfY rB8aNVE4VjDuAgZbmcgAEtqUJY2oHqBLeOto8x/heNf9cKo4qzc52/2ua8t9xdJL5Wag WhCwXA1x7jdOZh/c/ZErr7POPw4fqwx62HpfCQ8vN/aJtsNQl+Sj53n5VAxPWoJwnxhz mrUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717802092; x=1718406892; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Y/NFoZRrQ3Oe99jHYpj59voH3AVeMqlDyxoL8EDvsTQ=; b=A3FvZ3HdO4/c+KmRiJSBuJYvvzOdWNXK5+iWWC2dFiUTi56Xp56/TxoQj0cX8xGC8B jI5e+9ORJvL7UVsTE8t+HVongWJFuVH5K4Mqply6TNJpvc9qxXwraHHGP9ujZOOoL6dn /x9cRI62Nk8/L/KIwb6rqx5zhGZExUjDwtWZdTx2+wyCcZP3coY2BarDzDACoxZiSj6n eOymgK9PTZGI7a+fTFTGifgzUyyhr15k+hn7uYS2GOSxc+r9Tnv3ap7OFGQo2IXo0vhX 4s0+fDlX5b2IxPEJ/aBHNP4Z+mGfeIR2JuHbxD9/Fa29PMqTP/09yoZtRLevGQ5CCYVL Wyng== X-Forwarded-Encrypted: i=1; AJvYcCUC6AULUhkxtfcYIpJaT0i8/MHb4MQdjHfewqeYXD+qX1ZhnslbsXzMHUeWD3Q4TLICRYJMUU2spvQLTKAH8XQBxKU= X-Gm-Message-State: AOJu0Ywb5qOV3kYDonyM/ptCqQ6cajgmSZsWsZEO+8OeQNvw7ngmfNCr GxaZqXIb+jpvf1rFReoFycoJjX94essCAlRK5fU6dW2Dw7MzDpXCVrYFtSZV3jmkcm3f/5n4oqB 4uamgtMx5rUoD89xxJv/froILhG9DqN2fpXlr X-Google-Smtp-Source: AGHT+IGPrEWf1WL85cQqKtIbRvg/yPmsdJLHLXreQl2erm5kBz9tMNMEwFX/Ccjrv9JfJcfXifhcai345+M447MIZto= X-Received: by 2002:a17:906:7705:b0:a6e:b1f:5e17 with SMTP id a640c23a62f3a-a6e0b1f5e6bmr154507366b.54.1717802091519; Fri, 07 Jun 2024 16:14:51 -0700 (PDT) MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> <9374758d-9f81-4e4f-8405-1f972234173e@redhat.com> <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 7 Jun 2024 16:14:13 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: Chris Li Cc: David Hildenbrand , Barry Song <21cnbao@gmail.com>, Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: a7s974jzipmyqfipcyr3spzwtqs5hq1p X-Rspamd-Queue-Id: 82BF9A000A X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717802093-264051 X-HE-Meta: U2FsdGVkX19gBYNkxTkjWBFz4qR3IZrxzDX0ySzJFhBUoJYS9gSiYYA386ynMA3SGCTo1mUa79WdIauh6Gj+yUohSHRCeN4AzDDpgB9NTMb3u5g11aeEbEEyyBBo4MxQABGyI1MQzQEXd66OSCb1rvhg4K81r1ph7yrEDk5J9WAZ1swUfkcyiBnWOrKktMGjzHLjFi/mweY0chPOtZ7HHf3DG2ZC0y/R8nkG6xz29BItzRFnUNwTUoOMBxFCoYKiQavTCMzVWKCTi0H8f9Sswhx7/3JU2XOKJD8oI9NvIbXOvODMzJJEO/JbV0VuHCyFlibTZISJLuoVaJ7N9bB9dN/Hmp9owESE6wlBMlDn6agJIE10TyyoJ5yBWSUm376/fAG960XDyi/lxGjYy2+TuReWHB2iYPjhGQG9mWsehSIpuRTem/9b1kXI7d2LoS4VMR9cIcVoP5GokaurW19VEnZ9ucgThSLYfflOWGbx4M3TYJ310P5k7RgrTzej4fbSd28LJck4mNI1AqXXZLCS1JPPynqW2HUEeBzG+w0ZRn8H6BpSuTBdVnzXSOjybzscgJvpMB07yIK3J4/fSXXsMgA7jYaEnuYW3HN1quHfuwya8pm+YjDxGRW8xlZRTziuHQWkP2TXfPzqyuwTR/IBE0NNaIblO7IwCSnPWOBVfhA1aQaOfRgwEvlpkru6cT9l17K+v2ToDi1+jXYLDouygZjrPXa9JC8qwZjh/8fVz64ttz3tdFfTCxt9WHim4Mo6pSftK0Cy7GkydPQcYPLRVE6WNt0myLO47veZy9yQ0n4NNKaL6BHkKYI2q8GhKeNl5h6fJdBCQNAoLJ/8JNRzMOq3lqFv1lZ7+iT/NdxKbOOAm0yRNipLbdgOStKgu5/kaYwOZX/dCQOapy5vC1O8pgTnfKsUwAH9fhTSH8YSy3MoLFPHbb5ILgcgFGpTBS05ZOQzmR2NZ4co2zvfHyG AAIRqGDp op5OOfC59OAdnon3L3DvxJrZDedX2tCFUUxndpvF10Av3Rl9HD+babdKsp/4RaKILqCuaY8iBVzIa2m62WNgiorN5W5w2Q7gCwPFvn/90ayFzytzc1YrOuNoUl/KACjzqardAFYXWJpqPGfsw0/aizTba+Zig5GWJTF1WItnNeKWIG0Qcy/Dh8i6mV1Wwle8iRSmj90at9Q+8tV9axVjTnij2i/4dgA74f/oRyIK4zmhGnGDL8sRk6QvzodI4db95Npa8wlAmOOrFBzhBYc5kTVvSdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > > > > > > How about something like the following (untested), it is the minimal > > > > recovery we can do but should work for a lot of cases, and does > > > > nothing beyond a warning if we can swapin the large folio from disk: > > > > > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > > > index f1a9cfab6e748..8f441dd8e109f 100644 > > > > --- a/mm/page_io.c > > > > +++ b/mm/page_io.c > > > > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct > > > > swap_iocb **plug) > > > > delayacct_swapin_start(); > > > > > > > > if (zswap_load(folio)) { > > > > - folio_mark_uptodate(folio); > > > > folio_unlock(folio); > > > > } else if (data_race(sis->flags & SWP_FS_OPS)) { > > > > swap_read_folio_fs(folio, plug); > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index 6007252429bb2..cc04db6bb217e 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio) > > > > > > > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > > > > > + /* > > > > + * Large folios should not be swapped in while zswap is being used, as > > > > + * they are not properly handled. > > > > + * > > > > + * If any of the subpages are in zswap, reading from disk would result > > > > + * in data corruption, so return true without marking the folio uptodate > > > > + * so that an IO error is emitted (e.g. do_swap_page() will sigfault). > > > > + * > > > > + * Otherwise, return false and read the folio from disk. > > > > + */ > > > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > > > + if (xa_find(tree, &offset, offset + > > > > folio_nr_pages(folio) - 1, 0)) > > > > + return true; > > > > + return false; > > > > + } > > > > + > > > > /* > > > > * When reading into the swapcache, invalidate our entry. The > > > > * swapcache can be the authoritative owner of the page and > > > > @@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio) > > > > zswap_entry_free(entry); > > > > folio_mark_dirty(folio); > > > > } > > > > - > > > > + folio_mark_uptodate(folio); > > > > return true; > > > > } > > > > > > > > One problem is that even if zswap was never enabled, the warning will > > > > be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable or > > > > static key if zswap was "ever" enabled. > > > > > > We should use WARN_ON_ONCE() only for things that cannot happen. So if > > > there are cases where this could be triggered today, it would be > > > problematic -- especially if it can be triggered from unprivileged user > > > space. But if we're concerned of other code messing up our invariant in > > > the future (e.g., enabling large folios without taking proper care about > > > zswap etc), we're good to add it. > > > > Right now I can't see any paths allocating large folios for swapin, so > > I think it cannot happen. Once someone tries adding it, the warning > > will fire if CONFIG_ZSWAP is used, even if zswap is disabled. > > At this point we will have several options: > > Here is my take on this: > > > - Make large folios swapin depend on !CONFIG_ZSWAP for now. > > I think a WARON or BUG_ON is better. I would need to revert this > change when I am working on 3). It is a make up rule, not a real > dependency any way. I am intending to send a new version with WARN_ON_ONCE() and some attempt to recover. It is not a rule, it is just that we don't have the support for it today. > > > - Keep track if zswap was ever enabled and make the warning > > conditional on it. We should also always fallback to order-0 if zswap > > was ever enabled. > > IMHO, falling back to order-0 inside zswap is not desired because it > complicates the zswap code. We should not pass large folio to zswap if > zswap is not ready to handle large folio. The core swap already has > the fall back to order-0. If we get to 3), then this fall back in > zswap needs to be removed. It is a transitional thing then maybe not > introduce it in the first place. We cannot split the folio inside zswap. What I meant is that the swapin code should fallback to order-0 if zswap is being used, to avoid passing large folios to zswap. > > > - Properly handle large folio swapin with zswap. > That obviously is ideal. > > Chris