From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 248E7C27C53 for ; Fri, 7 Jun 2024 18:58:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACBBD6B00BD; Fri, 7 Jun 2024 14:58:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7AF06B00BE; Fri, 7 Jun 2024 14:58:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 942E76B00BF; Fri, 7 Jun 2024 14:58:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 73BA06B00BD for ; Fri, 7 Jun 2024 14:58:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 23A6E1219B8 for ; Fri, 7 Jun 2024 18:58:51 +0000 (UTC) X-FDA: 82205004462.04.77B5372 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 48F7B120015 for ; Fri, 7 Jun 2024 18:58:49 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=m7QBbOSX; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717786729; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v2r17fv1NH+dhrx4S8f8edNGHFWcnHzqCYgrcf7L5As=; b=S/x8BHVjuodlrVNwLkWxWTx8PbZRbbenUTmvQE38Ou4VbeHt40UW1/ls8chXibRazuUVHY 9zeLpVH6zUB+yJqlc6zHBPVkS2kp0tTLXcQL1LndW9ruXdBcn+ijSxfGAfV8vX46G6+XI/ vHEGEoHYlmXbnui9h5NPt5TDEtgIOCY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=m7QBbOSX; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717786729; a=rsa-sha256; cv=none; b=OrzYvcUEahul0lhtoILrpW8tOu7GpHQhJ2FUQ/ExtYYjb+5VWI5XZNgupFVRtoLrMjCi4I FOwbzkFE247k56kWXcOmcAUJexTGRZc1qADfYYy1v671xu/QoGOU0QXuykE4RT/5cTig1r 3XZpr8w16wPe+Kx0EBqFE0+ChD1uJLg= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a68b54577aaso321775566b.3 for ; Fri, 07 Jun 2024 11:58:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717786728; x=1718391528; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=v2r17fv1NH+dhrx4S8f8edNGHFWcnHzqCYgrcf7L5As=; b=m7QBbOSX0F3f/27EmqBF7HMMU1sqtS2KpOxAYrKcO5eA42V8bXIhCGRZUTJ+LqHeFM M6LmsD6kaj9Rm+q5YW4TvmqJRh2BP0cxtbJXMQ8zCuFGxOmTxZC5U21pB1UfgQ7G9TWk dnwA358b3xvLzRzU+PNjlJcmQ8u8LK1tTrxZNU+6Aqyc33TGeulhg4umGiwi32bD/Bir JSy/uShVuQ8gwlqvKkrNVHA92lyZijcIbgAVO6R/W9+bS4mEJzLNREeuLO2wEZQDPmu2 HKa8GyM8WVma1MS2ds5AIWeuPPZVF0h5IP9QtEJt3AO1tELrTYfEjGtQU7HKKNigj/6s T0QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717786728; x=1718391528; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v2r17fv1NH+dhrx4S8f8edNGHFWcnHzqCYgrcf7L5As=; b=bJhEvOnGM0a3lGw+QtQcJmv+nkQfOQ72hdbWXC4W5ZgGfqtIuWkCxYO/fNVfcK/Ibc cg0SSrh5bRe0jVufqPAwJbgZLq4K4LhsvzYY+l5xj0lgBxn8RH79dT/fjB8oT1IUCgxq 2Kr1lgW77x+wWjK5qJAipeq8IfEevptU+RP/tWVCWaG/LmwOrZBICnTCuJPWRmWD+BXi gsTAeqG5w6PAqrbEIjNDtsF0irwNmlBlgdDKfjrMgr7ZWJ6xlH206Wm3aOPzsFlouweI 6ZRRv31m3zStyYlifanXhvJbK8sohRmcrzKXPWiEhp4mgGPqeLK9WCnzMMmLzyXlCKve Ye8w== X-Forwarded-Encrypted: i=1; AJvYcCU+ItPEtx2zQp+pK4I4rRm7jzy4W/OHDm0Z5HKdNd9nQUxegYeuWDShXq2nw/NhkHjUP0yDLYPx/4sAi26uFYtM22k= X-Gm-Message-State: AOJu0YzHN0eRypQxx++aM5nGpH4ot00SG9Cc11BJBDXkGstFRiTC6p+p sVL7NnxQITKsQt9ImdGTo4bkzTsfcFz2S0Q5YS7uvlUObVsJD7iiLH6e1fxn5lOdPDIGXzPzyw1 F86zB/zMLo1Tqnrp6qjF2LrSg1NxaylBeEMlE X-Google-Smtp-Source: AGHT+IE3dUM3t2Ff2JFikj4TrNJdz/uC/CYKzj23GqNW4FtILLLjxMVHow3ICl2yIJu8TAutvqVQAV5kWGVgkDmgoBM= X-Received: by 2002:a17:906:368d:b0:a68:a52b:7554 with SMTP id a640c23a62f3a-a6cdbef76a3mr221167966b.66.1717786727342; Fri, 07 Jun 2024 11:58:47 -0700 (PDT) MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> <9374758d-9f81-4e4f-8405-1f972234173e@redhat.com> <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> In-Reply-To: <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> From: Yosry Ahmed Date: Fri, 7 Jun 2024 11:58:08 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: David Hildenbrand Cc: Barry Song <21cnbao@gmail.com>, Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Chris Li , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 48F7B120015 X-Stat-Signature: yzqcc8h4rb3gnoxqqggy9rb4sczpi4jx X-Rspam-User: X-HE-Tag: 1717786729-943829 X-HE-Meta: U2FsdGVkX1/P3dgnKmM2vk5V3FgrZunH/uQF45Ac2MDnmifY6lzaj1XUXfYmCceL4UUgwqX7uc9zNy6NW1NmfNaTJRIeeSaeRvMLSCt0xpJWm9xnSMfHQsOcBTEchghG7HZ3Zv+6tZ9wQ2uKQtxUBIHoHMBevZLzJ5FaoG81JZETzbYBHp2qB+obMQgUWxdVrDOXUC61H+Scp0C61/P7Tn2krwyoUv/esQLfECOPg5RB3Rd82AzXONHJlN2Qj6NO+s1L31NJkzqIcOUG9TetEn8wo3xR8s+RVxH3m/EYYmqeDZbP/sEgs2SfpAH2KeedwkvL9KaxTXonMduzzJ/+Y2yhch8/a9xX0YPHT8TYA9fX2loY0nOlCJ1NnYxTyJMlQSpEtCbc7jg6hF4sfvhHNLyKiZhELwqbsUMRTn8c00gxBqVsLE6MGh6tpYJ6zwKfV0+VS7dShainST0j7Pf6w00DK2AXSZI5C70T/vAp3ag4IvhrkRUaeKCXBQA3uvfZ3PGn6QSeSS4Dvna+AHmI7+4Hw14OAQ/VZRIrqC2VWkGpN+GKQsOQFnIz6UfBcrWemYcPQmqFCIxONXLxxPjIdrKptjT3EFLa/kk6DUQ9YMj2E0fUI5SqyFGbkMxwIv1byqSZGBKyrJ+E6riyOtYKjYp8KYA/T44jnQio4PBPgHDeMXcYclkl2UEizClhtrqpZ9kMdsWJtiGjgN1Jmq1xtfrxEK55T4D3T2GtK1+Op74R/LdcdAD5RN0sr5WZp9Q3kgBfNO56SAoOj/S8x3vp821StYFNBl4LYQbjGXj/2lK+CjEA1064K/Ouc84PRxT9t9iugJjpVYSeRJlcLpr75oyYY89Xo2Nvtq8g9ZgaFzc4ZAphIBeitjxGxWKNnb9zRF9/hRju9tCDbIFWVaDyMhndsliY4Yo05vx+WG/CC2Cv9ElJ2j6gmFKHEz1DGNlwGCqJ+yKlSuh3sOreMuV dPYsZt8E nl85h9w+9tF/zTTbHGJWLHusPHqc8fOZB0YGztmcdQU62kFkRzcF+H18uizPNF74bO4piqIvWWiSMqPf1pvAWEt+M2DrS3lbWK5BeZzogPfa3vUDDporMn4evFoiHmQMzxqI9CyernIzoCRWaWyvfVAvRUctht+Xb5gKPFqidHPTOuUVO85LCdN65VfBJZrHHRExaPTjkHMr+iO1CnTDcqC/bIDcIGocIKw0rH/7HnOZ8AGM/eM5dBr/lE0Qp346Va9xJ73HU0cAa5DGiikHe4Bi8SvOHBLSbB31br5iCGbOzSFY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 11:52=E2=80=AFAM David Hildenbrand wrote: > > >> I have no strong opinion on this one, but likely a VM_WARN_ON would al= so > >> be sufficient to find such issues early during testing. No need to cra= sh > >> the machine. > > > > I thought VM_BUG_ON() was less frowned-upon than BUG_ON(), but after > > some digging I found your patches to checkpatch and Linus clearly > > stating that it isn't. > > :) yes. > > VM_BUG_ON is not particularly helpful IMHO. If you want something to be > found early during testing, VM_WARN_ON is good enough. > > Ever since Fedora stopped enabling CONFIG_DEBUG_VM, VM_* friends are > primarily reported during early/development testing only. But maybe some > distro out there still sets it. > > > > > How about something like the following (untested), it is the minimal > > recovery we can do but should work for a lot of cases, and does > > nothing beyond a warning if we can swapin the large folio from disk: > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > index f1a9cfab6e748..8f441dd8e109f 100644 > > --- a/mm/page_io.c > > +++ b/mm/page_io.c > > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct > > swap_iocb **plug) > > delayacct_swapin_start(); > > > > if (zswap_load(folio)) { > > - folio_mark_uptodate(folio); > > folio_unlock(folio); > > } else if (data_race(sis->flags & SWP_FS_OPS)) { > > swap_read_folio_fs(folio, plug); > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 6007252429bb2..cc04db6bb217e 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio) > > > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > + /* > > + * Large folios should not be swapped in while zswap is being u= sed, as > > + * they are not properly handled. > > + * > > + * If any of the subpages are in zswap, reading from disk would= result > > + * in data corruption, so return true without marking the folio= uptodate > > + * so that an IO error is emitted (e.g. do_swap_page() will sig= fault). > > + * > > + * Otherwise, return false and read the folio from disk. > > + */ > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > + if (xa_find(tree, &offset, offset + > > folio_nr_pages(folio) - 1, 0)) > > + return true; > > + return false; > > + } > > + > > /* > > * When reading into the swapcache, invalidate our entry. The > > * swapcache can be the authoritative owner of the page and > > @@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio) > > zswap_entry_free(entry); > > folio_mark_dirty(folio); > > } > > - > > + folio_mark_uptodate(folio); > > return true; > > } > > > > One problem is that even if zswap was never enabled, the warning will > > be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable or > > static key if zswap was "ever" enabled. > > We should use WARN_ON_ONCE() only for things that cannot happen. So if > there are cases where this could be triggered today, it would be > problematic -- especially if it can be triggered from unprivileged user > space. But if we're concerned of other code messing up our invariant in > the future (e.g., enabling large folios without taking proper care about > zswap etc), we're good to add it. Right now I can't see any paths allocating large folios for swapin, so I think it cannot happen. Once someone tries adding it, the warning will fire if CONFIG_ZSWAP is used, even if zswap is disabled. At this point we will have several options: - Make large folios swapin depend on !CONFIG_ZSWAP for now. - Keep track if zswap was ever enabled and make the warning conditional on it. We should also always fallback to order-0 if zswap was ever enabled. - Properly handle large folio swapin with zswap. Does this sound reasonable to you?