From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72B94C27C53 for ; Sat, 8 Jun 2024 00:29:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B522E6B0085; Fri, 7 Jun 2024 20:29:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFFA06B008C; Fri, 7 Jun 2024 20:29:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A0256B0092; Fri, 7 Jun 2024 20:29:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7B87A6B0085 for ; Fri, 7 Jun 2024 20:29:13 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 210F3141AAE for ; Sat, 8 Jun 2024 00:29:13 +0000 (UTC) X-FDA: 82205836986.21.847CCE8 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf16.hostedemail.com (Postfix) with ESMTP id 44B6C18000A for ; Sat, 8 Jun 2024 00:29:11 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xGmYXKq7; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717806551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; b=IUL5cdOWt0Rlp86QjREnp2RJM8q+RXMWpNAHAUdaFSw7GDkDR10L0Q+V+1TTINGDZlhY/g DP/83E1AGhZdsn7P12BQB9TKaIR+4VCsygGeYcMfjDbQH8m0NU2LWYZSy3ZEttWNK9leXe 7TWMX3mhh70pzJ2zwXaqBEes9mmCkGQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xGmYXKq7; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717806551; a=rsa-sha256; cv=none; b=ASD+oEX4IxB+IzIv3bUvAth05oCQVAt08Y6kORQ8/98tFpSrRtRWNdg7y/bGSGjK1vAuE1 XTP5WondCSVV4kMIuCY8oiVF638ERXw45EVu2RlIhPvp76yyQBhVrTL1SyK5VUOOeh6XWn ODdzJjsPn8jwkSjxr523K3n0MK1uvpc= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-57c681dd692so546553a12.3 for ; Fri, 07 Jun 2024 17:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717806549; x=1718411349; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; b=xGmYXKq7EiX2t22s//DDgnfNJFVVvMDmIjX5NPGrrR5DsOz4Sj4qElyjMHLIR9vbev Cw2GI1f1Or0/r2c1v6xS0j4ZfhIrQATruRYil5//dIDuJUa6R4vcrtVAC22v7UtYiIuo 4I9FvjEGdynOk8n5Gx9SjkS+qEj2kSoXXqjIDlwypfAhskWvrPQiwNaKaaApbsRhDj+s dl2/Su5T7ho1nBb5NQambvV2p6m6scwV0CiTc2ZwIcGpRFTXvj+eFq03tkupKzsErrYo tsbiDcT2xJkDlGQ6B1/oL3K5jZDvuv7Ak5UYn9evG7XhV3DZJjKYjx6nWWt9a3thKnhZ ZUsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717806549; x=1718411349; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; b=qBcXtqtwkFfN1aV9J2ELEb1rIxC7iqPorqKCNCtZxYzlIrCaR65SSFYpv4Dntphy6c WvwuhSt2PVyqGBlr5WhH4EitsFr975vjSBWf883R8XQuEay/AbeeLEmwZMYDEYzbGNuu Hxb2DdkzM3SFD6sSETVSVc7Bv7NnKruKG3qZJjiztPMvSDdcqMf77Vg3KU2dSD4cw6CU ktJs5321R45hXTNF/Ck1WRtMQ9v88NIcPJnjf2tJclMjzWHpV6xKNO2isAyoSBdSR080 4V+UrJ0WsYW3avVaL5eurjr0x/nW2vFBzGgDZfzGVDSDIasrXUDvpor6719jTPfE3+En UOWw== X-Forwarded-Encrypted: i=1; AJvYcCVlM49NxmHLqGhs15x5OsbmKPqBUWl3+iet45Cd4zlsMXSGHkaKjUNUhmUHJt7NmVVNkqYjxHM0v51vHhIRp9WJ4zc= X-Gm-Message-State: AOJu0YzXBuPDRiNSu2yFH22GP8gT+LukuuiZWj4Kx5BGjl+ljD1DdbUt i60v+xdJEN675Qun5vidvAN62LGSPp1lQppBYI6KVaxGVP1faprT04TIxyhYs2wPXkjLF4Q9iwW 0mFlRxd4XnHuZrI2lg3dDa26BYrFH57wtnh4h X-Google-Smtp-Source: AGHT+IHQQu/Y3GsH2EmT7P8sedBCKxXKqo26tmnnUS0qb5g5cbMQgeK4Co23t0sVkzAq9TvzjnJth/vuu8mkfUVo+NY= X-Received: by 2002:a17:906:54e:b0:a6c:8be4:7f25 with SMTP id a640c23a62f3a-a6cdaa0f4f6mr269861366b.56.1717806549286; Fri, 07 Jun 2024 17:29:09 -0700 (PDT) MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> <9374758d-9f81-4e4f-8405-1f972234173e@redhat.com> <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 7 Jun 2024 17:28:30 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand , Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Chris Li , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 44B6C18000A X-Stat-Signature: zwxh9dzud4w8xgib6anqte1j8ji3kg1f X-HE-Tag: 1717806551-491006 X-HE-Meta: U2FsdGVkX182Lq6KS0yk3KCdkiqWSbup8zPSnHdMWJZ33E0dywXInPpkXXEbxoUavUPTsnYOOSEtFY4RVWmywIXsu0KonUOtod7EOHshPXa16wp2K2D1okMLfsd6Wthy1t0WMd9/7kp7NIFQ0sUecT4bV7aCqr6cgcWs2d9HvqD9nymKL6LyhO2FXu2ZNB56xGedaRw8ZEL/1W4CctAScn0OetMlqaqY+lDfmIBKU7c39lY3z9BcvxJ7gX2v2YhFqPPbOZtJYWx5deYpGOCYUj5BOARA6yn6ktDvhLZ+2BHKNEtCDgClPv9Rh37/upi9bVP1qwtT8BbQPmhFhgmbO4hw1lvCfyed6X07HpVRhAmT09m0uYlM2Tmx3O/aNhPhfbExyXh1bNsBc7e7DwxfkjIJEUu5jcy/Rtmn3C3DXBvUGRdLk4/2pPSAKZvJM9ttEXHBsh0PvQveMTkSBmYe8QsIeAF0CG6UTxznheDv9egaNvfMsSPlBXZo4dQZkW3q/9/Gp0pFep8QxbZeaMhwdgpCzWahRWseAldf8qWFRtMJY+n25QuhRW4pMzaPTW8muZ94BC1dH19roVpUjq8v7WvD11s1+kwe98rVshnbcGxhedg09nv/eHPxqDWbG5ASl0afpNTrbTHltKta3/gK4RD1e8zhVDVI5G91zN7vlwldBmkkFy8/SppjPEpkNUWxmbu5254K5qpn2BERe+VB93uKhXIa43aClti1Tbjx+obl2LVweGMV5lKxSivrT6Oa2qiqi5+EE1hWFip3VE+HMM7ywP/bsQHlfwngiRXnnWVzcYQUs+hKhHdc4v8BQxvMiKx/giJOTLoRJYW3a2A09fNYxqy9ZeFUHqFFsWnuhhKYktxNM6JdX0APT7xcVSXfGZhzh/168/Ytzj9yG8CC4Wli3SD0u8zFPGZsSo/ynoSV0KP5cw8m7X6kgoCllzmAy8YVXxVvyEm8OMBkD4v eMC7sHWS zGVJBctpt2P2a6KJ5lNbOoaQ+HQhtHVvT3Ad0Hp7ZwQCSG4RyEYfG/FA96C3G4P4LYRx/KOqp3ni1CMe6w6jQsOYAKgWW4PFApllafnxETX+3tTv9+hxoa6ZYJ9cfANokwGRTCut/AYUXFQktq1RAYS4bOzGxND/kHYDkPlX5Z87y7mQIx2iJr58NxUW38WcLnKGV00awSRubIWrgdbgtTxw4qJPTHzXqKlQLkzQXn75kP5OBXtYouWixIKoROBT9Hdn1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 3:09=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrote= : > > On Sat, Jun 8, 2024 at 6:58=E2=80=AFAM Yosry Ahmed wrote: > > > > On Fri, Jun 7, 2024 at 11:52=E2=80=AFAM David Hildenbrand wrote: > > > > > > >> I have no strong opinion on this one, but likely a VM_WARN_ON woul= d also > > > >> be sufficient to find such issues early during testing. No need to= crash > > > >> the machine. > > > > > > > > I thought VM_BUG_ON() was less frowned-upon than BUG_ON(), but afte= r > > > > some digging I found your patches to checkpatch and Linus clearly > > > > stating that it isn't. > > > > > > :) yes. > > > > > > VM_BUG_ON is not particularly helpful IMHO. If you want something to = be > > > found early during testing, VM_WARN_ON is good enough. > > > > > > Ever since Fedora stopped enabling CONFIG_DEBUG_VM, VM_* friends are > > > primarily reported during early/development testing only. But maybe s= ome > > > distro out there still sets it. > > > > > > > > > > > How about something like the following (untested), it is the minima= l > > > > recovery we can do but should work for a lot of cases, and does > > > > nothing beyond a warning if we can swapin the large folio from disk= : > > > > > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > > > index f1a9cfab6e748..8f441dd8e109f 100644 > > > > --- a/mm/page_io.c > > > > +++ b/mm/page_io.c > > > > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struc= t > > > > swap_iocb **plug) > > > > delayacct_swapin_start(); > > > > > > > > if (zswap_load(folio)) { > > > > - folio_mark_uptodate(folio); > > > > folio_unlock(folio); > > > > } else if (data_race(sis->flags & SWP_FS_OPS)) { > > > > swap_read_folio_fs(folio, plug); > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index 6007252429bb2..cc04db6bb217e 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio) > > > > > > > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > > > > > + /* > > > > + * Large folios should not be swapped in while zswap is bei= ng used, as > > > > + * they are not properly handled. > > > > + * > > > > + * If any of the subpages are in zswap, reading from disk w= ould result > > > > + * in data corruption, so return true without marking the f= olio uptodate > > > > + * so that an IO error is emitted (e.g. do_swap_page() will= sigfault). > > > > + * > > > > + * Otherwise, return false and read the folio from disk. > > > > + */ > > > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > > > + if (xa_find(tree, &offset, offset + > > > > folio_nr_pages(folio) - 1, 0)) > > > > + return true; > > > > + return false; > > > > + } > > > > + > > > > /* > > > > * When reading into the swapcache, invalidate our entry. = The > > > > * swapcache can be the authoritative owner of the page an= d > > > > @@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio) > > > > zswap_entry_free(entry); > > > > folio_mark_dirty(folio); > > > > } > > > > - > > > > + folio_mark_uptodate(folio); > > > > return true; > > > > } > > > > > > > > One problem is that even if zswap was never enabled, the warning wi= ll > > > > be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable o= r > > > > static key if zswap was "ever" enabled. > > > > > > We should use WARN_ON_ONCE() only for things that cannot happen. So i= f > > > there are cases where this could be triggered today, it would be > > > problematic -- especially if it can be triggered from unprivileged us= er > > > space. But if we're concerned of other code messing up our invariant = in > > > the future (e.g., enabling large folios without taking proper care ab= out > > > zswap etc), we're good to add it. > > > > Right now I can't see any paths allocating large folios for swapin, so > > I think it cannot happen. Once someone tries adding it, the warning > > will fire if CONFIG_ZSWAP is used, even if zswap is disabled. > > > > At this point we will have several options: > > - Make large folios swapin depend on !CONFIG_ZSWAP for now. > > It appears quite problematic. We lack control over whether the kernel bui= ld > will enable CONFIG_ZSWAP, particularly when aiming for a common > defconfig across all platforms to streamline configurations. For instance= , > in the case of ARM, this was once a significant goal. > > Simply trigger a single WARN or BUG if an attempt is made to load > large folios in zswap_load, while ensuring that zswap_is_enabled() > remains unaffected. In the mainline code, large folio swap-in support > is absent, so this warning is intended for debugging purposes and > targets a very small audience=E2=80=94perhaps fewer than five individuals > worldwide. Real users won=E2=80=99t encounter this warning, as it remains > hidden from their view. I can make the warning only fire if any part of the folio is in zswap to avoid getting warnings from zswap_load() if we never actually use zswap, that's reasonable. I wanted to warn if we reach zswap_load() with any large folio at all for higher coverage only. I will send something out in the next week or so. > > > - Keep track if zswap was ever enabled and make the warning > > conditional on it. We should also always fallback to order-0 if zswap > > was ever enabled. > > - Properly handle large folio swapin with zswap. > > > > Does this sound reasonable to you? > > Thanks > Barry