linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
	cerasuolodomenico@gmail.com,  sjenning@redhat.com,
	ddstreet@ieee.org, vitaly.wool@konsulko.com,  hughd@google.com,
	corbet@lwn.net, konrad.wilk@oracle.com,
	 senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org,
	 kernel-team@meta.com, linux-kernel@vger.kernel.org,
	david@ixit.cz
Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure
Date: Tue, 17 Oct 2023 12:04:37 -0700	[thread overview]
Message-ID: <CAKEwX=OpVojN27yxQRXmOJcHYnHSoQqTW3dzxOveTzvbEu7vaA@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=Npoa5pxeqOHqNTreqVEKpnFQ2rg=yN6eOgFQngBp6X+w@mail.gmail.com>

On Tue, Oct 17, 2023 at 12:03 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Mon, Oct 16, 2023 at 5:58 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > >
> > > Currently, when a zswap store attempt fails, the page is immediately
> > > swapped out. This could happen for a variety of reasons. For instance,
> > > the compression algorithm could fail (such as when the data is not
> > > compressible), or the backend allocator might not be able to find a
> > > suitable slot for the compressed page. If these pages are needed
> > > later on, users will incur IOs from swapins.
> > >
> > > This issue prevents the adoption of zswap for potential users who
> > > cannot tolerate the latency associated with swapping. In many cases,
> > > these IOs are avoidable if we just keep in memory the pages that zswap
> > > fail to store.
> > >
> > > This patch series add two new features for zswap that will alleviate
> > > the risk of swapping:
> > >
> > > a) When a store attempt fail, keep the page untouched in memory
> > > instead of swapping it out.
> >
> > What about writeback when the zswap limit is hit? I understand the
> > problem, but I am wondering if this is the correct way of fixing it.
> > We really need to make zswap work without a backing swapfile, which I
> > think is the correct way to fix all these problems. I was working on
> > that, but unfortunately I had to pivot to something else before I had
> > something that was working.
> >
> > At Google, we have "ghost" swapfiles that we use just to use zswap
> > without a swapfile. They are sparse files, and we have internal kernel
> > patches to flag them and never try to actually write to them.
> >
> > I am not sure how many bandaids we can afford before doing the right
> > thing. I understand it's a much larger surgery, perhaps there is a way
> > to get a short-term fix that is also a step towards the final state we
> > want to reach instead?
>
> Regarding the writeback - I'll make sure to also short-circuit writeback
> when the bypass_swap option is enabled in v2 :) I'll probably send out
> v2 after

... I gather all these feedbacks.

> I absolutely agree that we must decouple zswap and swap (and would
> be happy to help out in any capacity I could - we have heard similar
> concerns/complaints about swap wastage from internal parties as well).
>
> However, as Johannes has pointed out, this feature still has its place,
> given our already existing swapfile deployments. I do agree that a
> global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that
> we can enable/disable this feature on a per-workload basis.
>
> >
> > >
> > > b) If the store attempt fails at the compression step, allow the page
> > > to be stored in its uncompressed form in the zswap pool. This maintains
> > > the LRU ordering of pages, which will be helpful for accurate
> > > memory reclaim (zswap writeback in particular).
> >
> > This is dangerous. Johannes and I discussed this before. This means
> > that reclaim can end up allocating more memory instead of freeing.
> > Allocations made in the reclaim path are made under the assumption
> > that we will eventually free memory. In this case, we won't. In the
> > worst case scenario, reclaim can leave the system/memcg in a worse
> > state than before it started.
> >
> > Perhaps there is a way we can do this without allocating a zswap entry?
> >
> > I thought before about having a special list_head that allows us to
> > use the lower bits of the pointers as markers, similar to the xarray.
> > The markers can be used to place different objects on the same list.
> > We can have a list that is a mixture of struct page and struct
> > zswap_entry. I never pursued this idea, and I am sure someone will
> > scream at me for suggesting it. Maybe there is a less convoluted way
> > to keep the LRU ordering intact without allocating memory on the
> > reclaim path.
>
> Hmm yeah you're right about these concerns. That seems like a lot more
> involved than what I envisioned initially.
>
> Let's put this aside for now. I'll just send the first patch in v2, and we can
> work on + discuss more about uncompressed pages storing later on.
>
> >
> > >
> > > These features could be enabled independently via two new zswap module
> > > parameters.
> > >
> > > Nhat Pham (2):
> > >   swap: allows swap bypassing on zswap store failure
> > >   zswap: store uncompressed pages when compression algorithm fails
> > >
> > >  Documentation/admin-guide/mm/zswap.rst | 16 +++++++
> > >  include/linux/zswap.h                  |  9 ++++
> > >  mm/page_io.c                           |  6 +++
> > >  mm/shmem.c                             |  8 +++-
> > >  mm/zswap.c                             | 64 +++++++++++++++++++++++---
> > >  5 files changed, 95 insertions(+), 8 deletions(-)
> > >
> > > --
> > > 2.34.1


  reply	other threads:[~2023-10-17 19:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17  0:35 Nhat Pham
2023-10-17  0:35 ` [PATCH 1/2] swap: allows swap bypassing " Nhat Pham
2023-10-17  0:35 ` [PATCH 2/2] zswap: store uncompressed pages when compression algorithm fails Nhat Pham
2023-10-17  0:57 ` [PATCH 0/2] minimize swapping on zswap store failure Yosry Ahmed
2023-10-17  4:47   ` Johannes Weiner
2023-10-17  5:33     ` Yosry Ahmed
2023-10-17 14:51       ` Johannes Weiner
2023-10-17 15:51         ` Yosry Ahmed
2023-10-17 19:24     ` Nhat Pham
2023-10-17 19:03   ` Nhat Pham
2023-10-17 19:04     ` Nhat Pham [this message]
2025-04-02 20:06   ` Joshua Hahn
2025-04-03 20:38     ` Nhat Pham
2025-04-04  1:46       ` Sergey Senozhatsky
2025-04-04 14:06         ` Joshua Hahn
2025-04-04 15:29           ` Nhat Pham
2025-04-08  3:33           ` Sergey Senozhatsky
2025-04-04 15:39     ` Nhat Pham
2025-04-22 11:27     ` Yosry Ahmed
2025-04-22 15:00       ` Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=OpVojN27yxQRXmOJcHYnHSoQqTW3dzxOveTzvbEu7vaA@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=corbet@lwn.net \
    --cc=david@ixit.cz \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@meta.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rppt@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox