From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0F17C27C65 for ; Tue, 11 Jun 2024 21:57:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DCFB6B00D1; Tue, 11 Jun 2024 17:57:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7678C6B00D2; Tue, 11 Jun 2024 17:57:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E03C6B00D4; Tue, 11 Jun 2024 17:57:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 391376B00D1 for ; Tue, 11 Jun 2024 17:57:03 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A9FDF140835 for ; Tue, 11 Jun 2024 21:57:02 +0000 (UTC) X-FDA: 82219968684.29.99C0FDD Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf08.hostedemail.com (Postfix) with ESMTP id E1904160011 for ; Tue, 11 Jun 2024 21:57:00 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VFlgN0jL; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718143020; a=rsa-sha256; cv=none; b=fEVedTuFMHOErG6hwbDiQwlGu3TP574t3uk76QZbmrb6wZYs+/QTjaMz257SQqK6Kj4CM3 Ureqja9tqMXfXuU5WI/0VlX19b/aguwfWURZo7vsZDpwUGzrdj6OdwCNdZGZU7Mm4e6G8h Qjj9CvM7raiRPheRy5aD1zAGLAyK/Xw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VFlgN0jL; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718143020; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vFPzpbdZgLD1mZkKCkz1VVk0Pl1CoNVTvT/bK6W2iwY=; b=lQYQ39u8Cg7CLAXxsUzVVe9eguM501Afv8DCT8UjcmsD6Rt09OlSTU2XoBvYc+H+AaOPNb FP9Yi5pWDXfcMUjIYHtkvSXXIszzAOOmucOFbX8I304MO3lJgowE4yLzG7Y8KO4AVvKCMB FWSVTxJRZTpNTAI39QdNXuZDRVk2LcY= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-62fbbaf5615so236547b3.2 for ; Tue, 11 Jun 2024 14:57:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718143020; x=1718747820; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vFPzpbdZgLD1mZkKCkz1VVk0Pl1CoNVTvT/bK6W2iwY=; b=VFlgN0jLfqDk6Cndk1YmcvukYHJshzdS9EEjHrxu+kWj7cQsDu6nakASBGFHzYiH+B RyimaRp8g7gbW2QUgvUoC244N96txJ68dSwVWBL7mJH0ksua9I2vBczd2azAUXff7qdE Go3EPUyjuMJCw6f3z6sCg4RYIk6bECSXePBzdFk4BNkcgLQLawmq5bGZKGC4FS6CSuQD 540v90uBlaXKFquE1m2Ir2cnuAMM6o+PBoB0RbSCifMjJNICDMubmOGvwFOqd/73nQSj 8P2/AYITcU8fOuZFI+zQsGhxT9BuvX8uc8mPnt8UJypMTVAtz5mu1iDJcCWoBC8Yapih dRYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718143020; x=1718747820; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vFPzpbdZgLD1mZkKCkz1VVk0Pl1CoNVTvT/bK6W2iwY=; b=M+Htq/rm/NAE/JWNvA1Y/PhxSJJNYvkRTsHldTk7/xstL4GSKBws4JwwzWgSBziAmO 8xhtET1STZ9bIwRKmqJZfOZfWenTlma9ARWmQL31KpeeVbfcZFCRaTuoel95/vBV+OUb VlvTLsc4O7ek6WmwCF7TcXElp5yHGyvSk7wkrP5BEVRIZyRrcz3RvzOhB4NA2JdNcGA3 yeO6UOUI3YtCT8gKPNQA88xhBdnXZIOK8zxuXjw4gWDr5g1puFpvoD/A+bspf+lkUjxB e2K5bIIpo89P2Mr+aIBTzz/QrLXHd6RyTSONMa/++S8S38Gn6DPK9KkSIu6CtCy7ZsPV aXZw== X-Forwarded-Encrypted: i=1; AJvYcCXqXEGonC/ZDzkmg+YV41LvKJhyP89ls3BBZ5kc/yfxR1qwUU2ioIkycwriC63YAXmPXg5TgHFJXL3mRtMO4bM4tMQ= X-Gm-Message-State: AOJu0YylhNBGTC8iimlMIlUCli3bI8vCd9q2VpWbqxjsAvDvt2lU9HZx gi5Wh/GuG3ky3Moa1P8xI6LuVK4YGtTMe/U97xZAhFKT97lxbZALJqUMIxdlN+m9U6q7CQ+/hZj NGqbgm8/aXjFl69wtz3mCzLhQ7Cg= X-Google-Smtp-Source: AGHT+IEURE6BigixBTYPVVFvCEaxX5ps7Gkee44s3uyzU6SE32C414iu30bvkV4dlMcNqK52ptWcSKgWj6oQI4Rk5H4= X-Received: by 2002:a81:48d2:0:b0:60c:c31c:4f71 with SMTP id 00721157ae682-62fba567702mr600827b3.42.1718143019422; Tue, 11 Jun 2024 14:56:59 -0700 (PDT) MIME-Version: 1.0 References: <20240611024516.1375191-1-yosryahmed@google.com> <20240611024516.1375191-3-yosryahmed@google.com> In-Reply-To: <20240611024516.1375191-3-yosryahmed@google.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 12 Jun 2024 09:56:48 +1200 Message-ID: Subject: Re: [PATCH v3 3/3] mm: zswap: handle incorrect attempts to load large folios To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Chris Li , David Hildenbrand , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: dzsrwrj9h1kkugp4f7dbsr3nztmp573y X-Rspamd-Queue-Id: E1904160011 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1718143020-863052 X-HE-Meta: U2FsdGVkX1/eBMbLTqKbwniRv0D2aDWpct52BtgFznY7ozuXQ5vEJRWmOR6gPDlarUcnhbHxyLSxA54OqD+HwXcwKcy8v/8H7omj7qlk4R5gN0gCXnr6Rp1gj+NqDPa+GtLKeRM1je6vSfpHxckndcq82WLyyCUPo/Y5YxBg4XGvesRsTAsP1t6yTSYSq9MdTUjP7InwaBxid++zwCqrlkD/C3aQ7ONuwq4+qfhNFEMovSuxCh5HXCmMxZM3Hd7IhzShe6Kn7AjvcgiDGtamGtfwSgs4/w1OaX1wHqXTmQKLce6jXyTXpYvzVNi4w8sYIs35+37Csa3Z7ElCD0MyyU3CUSLf/H957Nq/asLx8AUGgINr/FQQWouHGkpPMb9WYSrQ2JzGeAhqJqjPKIA8deC8xEVg26bQ6lxFJCur01SOadjT2BdvnoHi7SeKSTacR9zFYJgc/bUSdqygtZAjr8Zee7PeJGaSlpApsGVBGWaI3oC7IH9wxQY4b99eAgtPdUF1Y11jRUXMWb0Za+9DP0woUhtp3Kb/xB94QEuwZzAQt5BaxIDF6AVa1UTOLjBzjb46oPFETL0PKMm27dDJZ6fnjwhMSDWeBwEUjlm6ofTlqMbQ4HnRNYfvblmEno/r1rWlvkfLLx7pqTOSNOqiv4teHjHRtBk9yU5Ny2aRJP4OjJmfwSTi8Rk6KwKgJp3t5Gm9eDBC3g+dUCZvjNQ1jfMawrkMoHNFVg1Pq2mjSEeOkqo8/CEpg3Fn/OETP9zlZxUC1zkqRljji2RQraGKKaBlH+/0kU8FofC+AaDGBj6U4NPFDcqp794Wr8KBu4P5oFXVwgph3omEetL6Tp6QO6rMHpEE5bjEcXtLQWH4jejOEB81S4cym3dSv4FZQ6mYAJREsRdDZNc4eWIXALCX5OAqPL/muAAiGB4nQr2IsF88MUdwvELjhQwE5GjQQuIMDu1xzHLQXO1FFaZiA9c xJa8AUk5 6E36SfQDAWnA9UibtRMR85GMWgciCOuD2+OcOfmpPsEb2FIOVqgG+WxpFFcInS0GuGgPbia6d0E50VIGf0NfmdsMZp/dP87dwDVGwfdIb7RC1+P9/aviywKQUUhxNAT77HFZSVfLBjTHD2qza4W95Eqq7+DXVHQUUGffRB/gpP+3L4rM3ufks6TLIBGdGhEj06+92kFdKkRE1xi1St0IIRnYLlt+oULIsNbWQaOk24WM1U0F5LXL2amcs+btDh2knhk/+MsR1Yb/cad42uSj+HhudHREtd0i6NyGVJQmnmUGAWNtkb8NlgtKuIFwPmipEFiFTn7MLbfVLN6e2xVHnzLTiMRLfYlOn9NRavBqntcRk7HuPFva0ZBTZ2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 2:45=E2=80=AFPM Yosry Ahmed = wrote: > > Zswap does not support storing or loading large folios. Until proper > support is added, attempts to load large folios from zswap are a bug. > > For example, if a swapin fault observes that contiguous PTEs are > pointing to contiguous swap entries and tries to swap them in as a large > folio, swap_read_folio() will pass in a large folio to zswap_load(), but > zswap_load() will only effectively load the first page in the folio. If > the first page is not in zswap, the folio will be read from disk, even > though other pages may be in zswap. > > In both cases, this will lead to silent data corruption. Proper support > needs to be added before large folio swapins and zswap can work > together. > > Looking at callers of swap_read_folio(), it seems like they are either > allocated from __read_swap_cache_async() or do_swap_page() in the > SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so > everything is fine for now. > > However, there is ongoing work to add to support large folio swapins > [1]. To make sure new development does not break zswap (or get broken by > zswap), add minimal handling of incorrect loads of large folios to > zswap. > > First, move the call folio_mark_uptodate() inside zswap_load(). > > If a large folio load is attempted, and zswap was ever enabled on the > system, return 'true' without calling folio_mark_uptodate(). This will > prevent the folio from being read from disk, and will emit an IO error > because the folio is not uptodate (e.g. do_swap_fault() will return > VM_FAULT_SIGBUS). It may not be reliable recovery in all cases, but it > is better than nothing. > > This was tested by hacking the allocation in __read_swap_cache_async() > to use order 2 and __GFP_COMP. > > In the future, to handle this correctly, the swapin code should: > (a) Fallback to order-0 swapins if zswap was ever used on the machine, > because compressed pages remain in zswap after it is disabled. > (b) Add proper support to swapin large folios from zswap (fully or > partially). > > Probably start with (a) then followup with (b). > > [1]https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail= .com/ > > Signed-off-by: Yosry Ahmed Acked-by: Barry Song > --- > mm/page_io.c | 1 - > mm/zswap.c | 12 ++++++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index f1a9cfab6e748..8f441dd8e109f 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct swap= _iocb **plug) > delayacct_swapin_start(); > > if (zswap_load(folio)) { > - folio_mark_uptodate(folio); > folio_unlock(folio); > } else if (data_race(sis->flags & SWP_FS_OPS)) { > swap_read_folio_fs(folio, plug); > diff --git a/mm/zswap.c b/mm/zswap.c > index 7fcd751e847d6..505f4b9812891 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1566,6 +1566,17 @@ bool zswap_load(struct folio *folio) > if (zswap_never_enabled()) > return false; > > + /* > + * Large folios should not be swapped in while zswap is being use= d, as > + * they are not properly handled. Zswap does not properly load la= rge > + * folios, and a large folio may only be partially in zswap. > + * > + * Return true without marking the folio uptodate so that an IO e= rror is > + * emitted (e.g. do_swap_page() will sigbus). > + */ > + if (WARN_ON_ONCE(folio_test_large(folio))) > + return true; > + > /* > * When reading into the swapcache, invalidate our entry. The > * swapcache can be the authoritative owner of the page and > @@ -1600,6 +1611,7 @@ bool zswap_load(struct folio *folio) > folio_mark_dirty(folio); > } > > + folio_mark_uptodate(folio); > return true; > } > > -- > 2.45.2.505.gda0bf45e8d-goog >