From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B697C27C52 for ; Thu, 6 Jun 2024 21:30:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D75716B00A9; Thu, 6 Jun 2024 17:30:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D251A6B00AD; Thu, 6 Jun 2024 17:30:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC60C6B00B0; Thu, 6 Jun 2024 17:30:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9F8D76B00A9 for ; Thu, 6 Jun 2024 17:30:46 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4E0CC140B80 for ; Thu, 6 Jun 2024 21:30:46 +0000 (UTC) X-FDA: 82201758492.25.2439D27 Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by imf02.hostedemail.com (Postfix) with ESMTP id 76E4080008 for ; Thu, 6 Jun 2024 21:30:44 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ap9kTumk; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717709444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9dcy8tpFWcvRLoLZii3SRyMdBoRRi8/R0gDiOrgpUeI=; b=79Rs6lCu4OmP4VVRhKaQRmT+yA6qAnsnCSEvInB9NK+jPLGCkBAnNWec8Od4anDUc3Gt2h tYUcriqo2Vb9SVg7EozQSpn8br2+oMmUnJzNsdwF11VdvqR2VC67dXypA8nW2yib57coac niJP22Mch9KsNZ1CMcx6W8lB+Pqb95E= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ap9kTumk; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717709444; a=rsa-sha256; cv=none; b=1Q71uEPxLWS8fdhslOQnXNIt58vRVNJ26PxAiDaD6AfNIAJaNFfdGqwd4bYsltmp+lu2Zn Rzn7FVItUsr8To0zTI50aB1P+zxwZJEi+4MfLWnJrj21+8UNvlLv1OEIkL0q2LYyvXWnVV rzA1osA4NYnDdy4a75MgjQD8UHgKjeg= Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-80ac7385672so427100241.2 for ; Thu, 06 Jun 2024 14:30:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717709443; x=1718314243; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9dcy8tpFWcvRLoLZii3SRyMdBoRRi8/R0gDiOrgpUeI=; b=Ap9kTumksBraChqkjYZdu+ofevuvfUYHO/5BJtk90/i89Pi8ObQV1Wj26TLVPpwEqo 1mydJvczO3g8BsXQ4sT6hbvqyix1JxJA3j/tXhUeqSUgxUkur5u91BOq0Kr63QOqMSm/ BxlXSutwwcgexDpqpdz22n/1Vk+p4hiNOsVAZMaC3cftBylNfKs8H/Y8MFxXTxiM6NT7 ThP0WFdDVe1MSt/3Cj1KnZvYR3Z2rnKNuhoPSduAfPbmyhnQ5stPeIOcOoTSPVEFS1Y+ CBj7c6LOOPIe0BcB+7A9+nHW0x8ybsOHmgRSnj60vFhVTfFCiMTvbuLz5+1/BkbY4MHA QDCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717709443; x=1718314243; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9dcy8tpFWcvRLoLZii3SRyMdBoRRi8/R0gDiOrgpUeI=; b=ibg4vBHUFvyR/miAdJWvffpvcbAdaqkvxRB4MJHKI7fcWGhExA4wZgULpeasT5D1W4 6p0fQ2ficdCQTu/HufttwfvwwABRPEfscCu0tBZ2U49nBRuexbEPsPY46sJKml/+2CUY Z/jZfSgn9fmRUCIe1zq7+3S4Rvdz/evwAcTY2ITRxNfqJc2JbG6QzVmApqRA7Kyh+62P 2+OewXkqCRJzCnDlJAV3+iRRbu4W7WeWv7WG+aEcZrsrkNYSYbRxdfvpEP6ca1UPKb4o N2UP66N/rzPEr1y2LwsZHK9Bb0tAjJrBR1tXSf5Awynop1NjwKdQAbOig1Kd8mrEAdLL N2xw== X-Forwarded-Encrypted: i=1; AJvYcCWAFqNzgYtkiUk9Y5Ki2HL0ZPzLwe8TTblumQIz4tOVvJobvqk0wvoqD6yXMWFWXmVP6E+PyZLrk0ZdR/Nwb0ZImW0= X-Gm-Message-State: AOJu0Ywn7b9eR9qtLWwn8EqIWv8nzilZ1SxVXjzw4K9n8nQ4n3slXqA9 R/VCP1b30uyl5ye09F4HyY3l2e4dxONllDSznzf1CkMFFnofJgnCM1B4Uh97gJAsDumiDV5p4yt rPB6Y86vVuYLBXYzTNWNAVYUK7zM= X-Google-Smtp-Source: AGHT+IG4PaBA65BT5QX3MFC0AYUK0+sOcbkTAIJSiPdPkj1KsbpbjbQrUFT6MGHg0PeF1/oTxw3FXsmSKQPOisZpvyQ= X-Received: by 2002:a67:f744:0:b0:47b:cd96:6d3d with SMTP id ada2fe7eead31-48c27568ae3mr589121137.3.1717709443083; Thu, 06 Jun 2024 14:30:43 -0700 (PDT) MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> In-Reply-To: <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 7 Jun 2024 09:30:31 +1200 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: David Hildenbrand Cc: Yosry Ahmed , Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Chris Li , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ncmw5dpxjsf6xwyxms3jf89mnwxk4d5g X-Rspamd-Queue-Id: 76E4080008 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717709444-104913 X-HE-Meta: U2FsdGVkX19i3EVvEH+t2ueL5XRawWOD8d7Sqvz8nKdsqveYMXSzfZIv1oxb8tvhLZ2i754xe30pFsYmjqsKvPL/ZYJSlD7F8M1pJqmckTBrjPNJxf41wyT7LLnlrbqFqelGeIq6wb56gyKUJTOW1r5dIk/rRrcX6NNcYFC9Q9vTRhXOC3Vx8uL+kpbbg26tTp8/ERpgbGN0Yo9DVIO6FNVWBP1iWKKMrl628s0k91LnjJmumsZTELkfRHiKW8A80d3edGNLz12tE+cmHqacRwstUhVgP9gPC6r6cSAR7ilgq66Ybw+cv9/DcylIRRxQBHVGvsj53i8KfkfycHYLUjChboY0LOe8F9u4Rw94n/DWlAiAxUIg1x+2lfYJiaoDYikRoGffqi0H0n7K13AHE73qEPeXdkm1/6qbC+iAOZvOTAnVs4BHi63r4ngO5JmBLqDtPdMBY4g+lozMS5rwHC/KEaC+qfFVHibiuE9xNRnTOemBoBD9DvY12NZmSwAQNtdDSPZAPyQHM2ukB8hL/Z+fZ9x/n4ooQmeIHPpcZ4/jR0BcLaQ9UV0VKiqqg48dJn1Y61FOoU1KfnFADV8pgLmMnocTlAmZlqC2zWLelUydg6wPjkr6CpdvIcnWW+K0vEMGLe+QKuStg4Vm/+iDlo962XmV/+xjRmHNWuptkVeKD5XGC8NIijM232tjlyvnlAoAVjRltqAo8Ib+wcUpz3hRj0bdHnhBLL9a4QuNFqdu3Fu26unu1ph5JglMCCS5O1AaSnTlvBpMYgKoGYJ6EMpNFoEYgwwEyYBySEG2NcVMTuqfv7VD8PXDMm1BqSEnpU+xtWEy7KMFDIV4CzVS9ohRDnf6lF6UmZ+qv1LpexFMvdC+FEsiYqc96ys4CyLXRpu/PvpCRr02wmKgv9zr1Jvg4MbumV3SN7QBGWyXMOpUMcgkUXaaB01NqxQ7+WVQX7FqlApxjN6pptBr0E6 v0gTzm52 HwGlC3PLCHAw7bs0PZwyNDEHq+wGatgyO0A05uCLDJllNC1aalJ7F1df88ruvTBzUwe0GYOVToVIwnxkYl+/B4bkw+RXlXF4nmvoF4zhsWWdC83GXbtqSJu5e4sZKw4E6RxC4Ui9SV/E6YWah7hd1OaXrpLzX4KtYpp3NYxcmO3i/1VJ7TXgxzcBwFDH4qDQ0nhOTtRNhSQ7qNC+vCfg0AJRMjA3YN4dbir7SYcY3lcKwtMGEOOr7IynVDInwXN3aBTGP+xNDP1t3GQ99zPFbEgw5nbVygEPw+HYbO4M/hge4mcZPnVBMXU9mruQ1a7bMLLPJQiZn1v78LIS2SFJnNVr3CFezaHDT9ARu2Hy2zeYLVp6KAliPJAS2FUpbupyBnagsFxI/BI8gY+UMKgKZNBGuSN2Kmfnt2ORiRdZUHYNF/3wN05pAZ+7zAWuzeHwQzsSe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 9:17=E2=80=AFAM David Hildenbrand = wrote: > > On 06.06.24 22:31, Yosry Ahmed wrote: > > On Thu, Jun 6, 2024 at 1:22=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 06.06.24 20:48, Yosry Ahmed wrote: > >>> With ongoing work to support large folio swapin, it is important to m= ake > >>> sure we do not pass large folios to zswap_load() without implementing > >>> proper support. > >>> > >>> For example, if a swapin fault observes that contiguous PTEs are > >>> pointing to contiguous swap entries and tries to swap them in as a la= rge > >>> folio, swap_read_folio() will pass in a large folio to zswap_load(), = but > >>> zswap_load() will only effectively load the first page in the folio. = If > >>> the first page is not in zswap, the folio will be read from disk, eve= n > >>> though other pages may be in zswap. > >>> > >>> In both cases, this will lead to silent data corruption. > >>> > >>> Proper large folio swapin support needs to go into zswap before zswap > >>> can be enabled in a system that supports large folio swapin. > >>> > >>> Looking at callers of swap_read_folio(), it seems like they are eithe= r > >>> allocated from __read_swap_cache_async() or do_swap_page() in the > >>> SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so we > >>> are fine for now. > >>> > >>> Add a VM_BUG_ON() in zswap_load() to make sure that we detect changes= in > >>> the order of those allocations without proper handling of zswap. > >>> > >>> Alternatively, swap_read_folio() (or its callers) can be updated to h= ave > >>> a fallback mechanism that splits large folios or reads subpages > >>> separately. Similar logic may be needed anyway in case part of a larg= e > >>> folio is already in the swapcache and the rest of it is swapped out. > >>> > >>> Signed-off-by: Yosry Ahmed > >>> --- > >>> > >>> Sorry for the long CC list, I just found myself repeatedly looking at > >>> new series that add swap support for mTHPs / large folios, making sur= e > >>> they do not break with zswap or make incorrect assumptions. This debu= g > >>> check should give us some peace of mind. Hopefully this patch will al= so > >>> raise awareness among people who are working on this. > >>> > >>> --- > >>> mm/zswap.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/mm/zswap.c b/mm/zswap.c > >>> index b9b35ef86d9be..6007252429bb2 100644 > >>> --- a/mm/zswap.c > >>> +++ b/mm/zswap.c > >>> @@ -1577,6 +1577,9 @@ bool zswap_load(struct folio *folio) > >>> if (!entry) > >>> return false; > >>> > >>> + /* Zswap loads do not handle large folio swapins correctly yet = */ > >>> + VM_BUG_ON(folio_test_large(folio)); > >>> + > >> > >> There is no way we could have a WARN_ON_ONCE() and recover, right? > > > > Not without making more fundamental changes to the surrounding swap > > code. Currently zswap_load() returns either true (folio was loaded > > from zswap) or false (folio is not in zswap). > > > > To handle this correctly zswap_load() would need to tell > > swap_read_folio() which subpages are in zswap and have been loaded, > > and then swap_read_folio() would need to read the remaining subpages > > from disk. This of course assumes that the caller of swap_read_folio() > > made sure that the entire folio is swapped out and protected against > > races with other swapins. > > > > Also, because swap_read_folio() cannot split the folio itself, other > > swap_read_folio_*() functions that are called from it should be > > updated to handle swapping in tail subpages, which may be questionable > > in its own right. > > > > An alternative would be that zswap_load() (or a separate interface) > > could tell swap_read_folio() that the folio is partially in zswap, > > then we can just bail and tell the caller that it cannot read the > > large folio and that it should be split. > > > > There may be other options as well, but the bottom line is that it is > > possible, but probably not something that we want to do right now. > > > > A stronger protection method would be to introduce a config option or > > boot parameter for large folio swapin, and then make CONFIG_ZSWAP > > depend on it being disabled, or have zswap check it at boot and refuse > > to be enabled if it is on. > > Right, sounds like the VM_BUG_ON() really is not that easily avoidable. > > I was wondering, if we could WARN_ON_ONCE and make the swap code detect > this like a read-error from disk. > > I think do_swap_page() detects that by checking if the folio is not > uptodate: > > if (unlikely(!folio_test_uptodate(folio))) { > ret =3D VM_FAULT_SIGBUS; > goto out_nomap; > } > > So maybe WARN_ON_ONCE() + triggering that might be a bit nicer to the > system (but the app would crash either way, there is no way around it). > I'd rather fallback to small folios swapin instead crashing apps till we fi= x the large folio swapin in zswap :-) +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + ... + + if (is_zswap_enabled()) + goto fallback; > -- > Cheers, > > David / dhildenb Thanks Barry