From: Yosry Ahmed <yosryahmed@google.com>
To: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Chris Li <chrisl@kernel.org>,
Ryan Roberts <ryan.roberts@arm.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted
Date: Thu, 6 Jun 2024 14:00:47 -0700 [thread overview]
Message-ID: <CAJD7tkZru-hD4300-KHPrBFM3km3osXtaQf6AauUKhwALzXn3g@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4zTJgOgkXE=4a=w=ZsuC4WuwoBNHf0v0BwP7tatkkJZqA@mail.gmail.com>
On Thu, Jun 6, 2024 at 1:55 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Fri, Jun 7, 2024 at 8:32 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Thu, Jun 6, 2024 at 1:22 PM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 06.06.24 20:48, Yosry Ahmed wrote:
> > > > With ongoing work to support large folio swapin, it is important to make
> > > > sure we do not pass large folios to zswap_load() without implementing
> > > > proper support.
> > > >
> > > > For example, if a swapin fault observes that contiguous PTEs are
> > > > pointing to contiguous swap entries and tries to swap them in as a large
> > > > folio, swap_read_folio() will pass in a large folio to zswap_load(), but
> > > > zswap_load() will only effectively load the first page in the folio. If
> > > > the first page is not in zswap, the folio will be read from disk, even
> > > > though other pages may be in zswap.
> > > >
> > > > In both cases, this will lead to silent data corruption.
> > > >
> > > > Proper large folio swapin support needs to go into zswap before zswap
> > > > can be enabled in a system that supports large folio swapin.
> > > >
> > > > Looking at callers of swap_read_folio(), it seems like they are either
> > > > allocated from __read_swap_cache_async() or do_swap_page() in the
> > > > SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so we
> > > > are fine for now.
> > > >
> > > > Add a VM_BUG_ON() in zswap_load() to make sure that we detect changes in
> > > > the order of those allocations without proper handling of zswap.
> > > >
> > > > Alternatively, swap_read_folio() (or its callers) can be updated to have
> > > > a fallback mechanism that splits large folios or reads subpages
> > > > separately. Similar logic may be needed anyway in case part of a large
> > > > folio is already in the swapcache and the rest of it is swapped out.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> Acked-by: Barry Song <baohua@kernel.org>
Thanks!
>
> this has been observed by me[1], that's why you can find the below
> code in my swapin patch
Thanks for the pointer. I suppose if we add more generic swapin
support we'll have to add a similar check in
__read_swap_cache_async(), unless we are planning to reuse
alloc_swap_folio() there.
It would be nice if we can have a global check for this rather than
add it to all different swapin paths (although I suspect there are
only two allocations paths right now). We can always disable zswap if
mTHP swapin is enabled, or always disable mTHP swapin if zswap is
enabled. At least until proper support is added.
>
> +static struct folio *alloc_swap_folio(struct vm_fault *vmf)
> +{
> + ...
> + /*
> + * a large folio being swapped-in could be partially in
> + * zswap and partially in swap devices, zswap doesn't
> + * support large folios yet, we might get corrupted
> + * zero-filled data by reading all subpages from swap
> + * devices while some of them are actually in zswap
> + */
> + if (is_zswap_enabled())
> + goto fallback;
>
> [1] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/
next prev parent reply other threads:[~2024-06-06 21:01 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-06 18:48 Yosry Ahmed
[not found] ` <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com>
2024-06-06 20:31 ` Yosry Ahmed
2024-06-06 20:54 ` Barry Song
2024-06-06 21:00 ` Yosry Ahmed [this message]
[not found] ` <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com>
2024-06-06 21:30 ` Barry Song
2024-06-06 21:37 ` Yosry Ahmed
2024-06-06 21:53 ` Barry Song
2024-06-07 7:11 ` David Hildenbrand
2024-06-07 17:43 ` Yosry Ahmed
2024-06-07 18:52 ` David Hildenbrand
2024-06-07 18:58 ` Yosry Ahmed
2024-06-07 19:03 ` David Hildenbrand
2024-06-07 21:13 ` Chris Li
2024-06-07 23:14 ` Yosry Ahmed
2024-06-07 22:09 ` Barry Song
2024-06-08 0:28 ` Yosry Ahmed
2024-06-07 21:20 ` Barry Song
2024-06-07 23:17 ` Yosry Ahmed
2024-06-07 22:28 ` Barry Song
2024-06-08 0:32 ` Yosry Ahmed
2024-06-06 21:36 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkZru-hD4300-KHPrBFM3km3osXtaQf6AauUKhwALzXn3g@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox