From: Yosry Ahmed <yosryahmed@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Barry Song <21cnbao@gmail.com>, Chris Li <chrisl@kernel.org>,
David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Yosry Ahmed <yosryahmed@google.com>
Subject: [PATCH v3 3/3] mm: zswap: handle incorrect attempts to load large folios
Date: Tue, 11 Jun 2024 02:45:16 +0000 [thread overview]
Message-ID: <20240611024516.1375191-3-yosryahmed@google.com> (raw)
In-Reply-To: <20240611024516.1375191-1-yosryahmed@google.com>
Zswap does not support storing or loading large folios. Until proper
support is added, attempts to load large folios from zswap are a bug.
For example, if a swapin fault observes that contiguous PTEs are
pointing to contiguous swap entries and tries to swap them in as a large
folio, swap_read_folio() will pass in a large folio to zswap_load(), but
zswap_load() will only effectively load the first page in the folio. If
the first page is not in zswap, the folio will be read from disk, even
though other pages may be in zswap.
In both cases, this will lead to silent data corruption. Proper support
needs to be added before large folio swapins and zswap can work
together.
Looking at callers of swap_read_folio(), it seems like they are either
allocated from __read_swap_cache_async() or do_swap_page() in the
SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so
everything is fine for now.
However, there is ongoing work to add to support large folio swapins
[1]. To make sure new development does not break zswap (or get broken by
zswap), add minimal handling of incorrect loads of large folios to
zswap.
First, move the call folio_mark_uptodate() inside zswap_load().
If a large folio load is attempted, and zswap was ever enabled on the
system, return 'true' without calling folio_mark_uptodate(). This will
prevent the folio from being read from disk, and will emit an IO error
because the folio is not uptodate (e.g. do_swap_fault() will return
VM_FAULT_SIGBUS). It may not be reliable recovery in all cases, but it
is better than nothing.
This was tested by hacking the allocation in __read_swap_cache_async()
to use order 2 and __GFP_COMP.
In the future, to handle this correctly, the swapin code should:
(a) Fallback to order-0 swapins if zswap was ever used on the machine,
because compressed pages remain in zswap after it is disabled.
(b) Add proper support to swapin large folios from zswap (fully or
partially).
Probably start with (a) then followup with (b).
[1]https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
mm/page_io.c | 1 -
mm/zswap.c | 12 ++++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index f1a9cfab6e748..8f441dd8e109f 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug)
delayacct_swapin_start();
if (zswap_load(folio)) {
- folio_mark_uptodate(folio);
folio_unlock(folio);
} else if (data_race(sis->flags & SWP_FS_OPS)) {
swap_read_folio_fs(folio, plug);
diff --git a/mm/zswap.c b/mm/zswap.c
index 7fcd751e847d6..505f4b9812891 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1566,6 +1566,17 @@ bool zswap_load(struct folio *folio)
if (zswap_never_enabled())
return false;
+ /*
+ * Large folios should not be swapped in while zswap is being used, as
+ * they are not properly handled. Zswap does not properly load large
+ * folios, and a large folio may only be partially in zswap.
+ *
+ * Return true without marking the folio uptodate so that an IO error is
+ * emitted (e.g. do_swap_page() will sigbus).
+ */
+ if (WARN_ON_ONCE(folio_test_large(folio)))
+ return true;
+
/*
* When reading into the swapcache, invalidate our entry. The
* swapcache can be the authoritative owner of the page and
@@ -1600,6 +1611,7 @@ bool zswap_load(struct folio *folio)
folio_mark_dirty(folio);
}
+ folio_mark_uptodate(folio);
return true;
}
--
2.45.2.505.gda0bf45e8d-goog
next prev parent reply other threads:[~2024-06-11 2:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 2:45 [PATCH v3 1/3] mm: zswap: rename is_zswap_enabled() to zswap_is_enabled() Yosry Ahmed
2024-06-11 2:45 ` [PATCH v3 2/3] mm: zswap: add zswap_never_enabled() Yosry Ahmed
2024-06-11 16:32 ` Nhat Pham
2024-06-11 21:53 ` Barry Song
[not found] ` <CAJD7tkY6h1RkbYHbaQcTuVXOsY-t=arytf5HtcKfx7A75x06bg@mail.gmail.com>
2024-06-11 22:19 ` Barry Song
2024-06-11 23:37 ` Yosry Ahmed
2024-06-11 2:45 ` Yosry Ahmed [this message]
2024-06-11 21:56 ` [PATCH v3 3/3] mm: zswap: handle incorrect attempts to load large folios Barry Song
2024-06-11 2:59 ` [PATCH v3 1/3] mm: zswap: rename is_zswap_enabled() to zswap_is_enabled() Barry Song
2024-06-11 15:58 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240611024516.1375191-3-yosryahmed@google.com \
--to=yosryahmed@google.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox