linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: usamaarif642@gmail.com, akpm@linux-foundation.org,
	 chengming.zhou@linux.dev, david@redhat.com, hannes@cmpxchg.org,
	 hughd@google.com, kernel-team@meta.com,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	nphamcs@gmail.com, shakeel.butt@linux.dev,  willy@infradead.org,
	ying.huang@intel.com, hanchuanhua@oppo.com
Subject: Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap
Date: Wed, 4 Sep 2024 19:54:24 +1200	[thread overview]
Message-ID: <CAGsJ_4w4woc6st+nPqH7PnhczhQZ7j90wupgX28UrajobqHLnw@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkYqk_raVy07cw9cz=RWo=6BpJc0Ax84MkXLRqCjYvYpeA@mail.gmail.com>

On Wed, Sep 4, 2024 at 7:22 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Sep 4, 2024 at 12:17 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, Sep 4, 2024 at 7:12 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > [..]
> > > > > @@ -426,6 +515,26 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
> > > > >         mempool_free(sio, sio_pool);
> > > > >  }
> > > > >
> > > > > +static bool swap_read_folio_zeromap(struct folio *folio)
> > > > > +{
> > > > > +       unsigned int idx = swap_zeromap_folio_test(folio);
> > > > > +
> > > > > +       if (idx == 0)
> > > > > +               return false;
> > > > > +
> > > > > +       /*
> > > > > +        * Swapping in a large folio that is partially in the zeromap is not
> > > > > +        * currently handled. Return true without marking the folio uptodate so
> > > > > +        * that an IO error is emitted (e.g. do_swap_page() will sigbus).
> > > > > +        */
> > > > > +       if (WARN_ON_ONCE(idx < folio_nr_pages(folio)))
> > > > > +               return true;
> > > >
> > > > Hi Usama, Yosry,
> > > >
> > > > I feel the warning is wrong as we could have the case where idx==0
> > > > is not zeromap but idx=1 is zeromap. idx == 0 doesn't necessarily
> > > > mean we should return false.
> > >
> > > Good catch. Yeah if idx == 0 is not in the zeromap but other indices
> > > are we will mistakenly read the entire folio from swap.
> > >
> > > >
> > > > What about the below change which both fixes the warning and unblocks
> > > > large folios swap-in?
> > >
> > > But I don't see how that unblocks the large folios swap-in work? We
> > > still need to actually handle the case where a large folio being
> > > swapped in is partially in the zeromap. Right now we warn and unlock
> > > the folio without calling folio_mark_uptodate(), which emits an IO
> > > error.
> >
> > I placed this in mm/swap.h so that during swap-in, it can filter out any large
> > folios where swap_zeromap_entries_count() is greater than 0 and less than
> > nr.
> >
> > I believe this case would be quite rare, as it can only occur when some small
> > folios that are swapped out happen to have contiguous and aligned swap
> > slots.
>
> I am assuming this would be near where the zswap_never_enabled() check
> is today, right?

The code is close to the area, but it doesn't rely on zeromap being
disabled.

>
> I understand the point of doing this to unblock the synchronous large
> folio swapin support work, but at some point we're gonna have to
> actually handle the cases where a large folio being swapped in is
> partially in the swap cache, zswap, the zeromap, etc.
>
> All these cases will need similar-ish handling, and I suspect we won't
> just skip swapping in large folios in all these cases.

I agree that this is definitely the goal. `swap_read_folio()` should be a
dependable API that always returns reliable data, regardless of whether
`zeromap` or `zswap` is involved. Despite these issues, mTHP swap-in shouldn't
be held back. Significant efforts are underway to support large folios in
`zswap`, and progress is being made. Not to mention we've already allowed
`zeromap` to proceed, even though it doesn't support large folios.

It's genuinely unfair to let the lack of mTHP support in `zeromap` and
`zswap` hold swap-in hostage.

Nonetheless, `zeromap` and `zswap` are distinct cases. With `zeromap`, we
permit almost all mTHP swap-ins, except for those rare situations where
small folios that were swapped out happen to have contiguous and aligned
swap slots.

swapcache is another quite different story, since our user scenarios begin from
the simplest sync io on mobile phones, we don't quite care about swapcache.

Thanks
Barry


  reply	other threads:[~2024-09-04  7:54 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12 12:43 [PATCH v4 0/2] " Usama Arif
2024-06-12 12:43 ` [PATCH v4 1/2] " Usama Arif
2024-06-12 20:13   ` Yosry Ahmed
2024-06-13 11:37     ` Usama Arif
2024-06-13 16:38       ` Yosry Ahmed
2024-06-13 19:21         ` Usama Arif
2024-06-13 19:26           ` Yosry Ahmed
2024-06-13 19:38             ` Usama Arif
2024-09-04  5:55   ` Barry Song
2024-09-04  7:12     ` Yosry Ahmed
2024-09-04  7:17       ` Barry Song
2024-09-04  7:22         ` Yosry Ahmed
2024-09-04  7:54           ` Barry Song [this message]
2024-09-04 17:40             ` Yosry Ahmed
2024-09-05  7:03               ` Barry Song
2024-09-05  7:55                 ` Yosry Ahmed
2024-09-05  8:49                   ` Barry Song
2024-09-05 10:10                     ` Barry Song
2024-09-05 10:33                       ` Barry Song
2024-09-05 10:53                         ` Usama Arif
2024-09-05 11:00                           ` Barry Song
2024-09-05 19:19                             ` Usama Arif
2024-09-05 17:36                         ` Yosry Ahmed
2024-09-05 19:28                         ` Yosry Ahmed
2024-09-06 10:22                           ` Barry Song
2024-09-05 10:37                       ` Usama Arif
2024-09-05 10:42                         ` Barry Song
2024-09-05 10:50                           ` Usama Arif
2024-09-04 11:14     ` Usama Arif
2024-09-04 23:44       ` Barry Song
2024-09-04 23:47         ` Barry Song
2024-09-04 23:57         ` Yosry Ahmed
2024-09-05  0:29           ` Barry Song
2024-09-05  7:38             ` Yosry Ahmed
2024-06-12 12:43 ` [PATCH v4 2/2] mm: remove code to handle same filled pages Usama Arif
2024-06-12 15:09   ` Nhat Pham
2024-06-12 16:34     ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4w4woc6st+nPqH7PnhczhQZ7j90wupgX28UrajobqHLnw@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox