From: Yosry Ahmed <yosryahmed@google.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
cerasuolodomenico@gmail.com, sjenning@redhat.com,
ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com,
corbet@lwn.net, konrad.wilk@oracle.com,
senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org,
kernel-team@meta.com, linux-kernel@vger.kernel.org,
david@ixit.cz
Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure
Date: Mon, 16 Oct 2023 17:57:31 -0700 [thread overview]
Message-ID: <CAJD7tka6XRyzYndRNEFZmi0Zj4DD2KnVzt=vMGhfF4iN2B4VKw@mail.gmail.com> (raw)
In-Reply-To: <20231017003519.1426574-1-nphamcs@gmail.com>
On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> Currently, when a zswap store attempt fails, the page is immediately
> swapped out. This could happen for a variety of reasons. For instance,
> the compression algorithm could fail (such as when the data is not
> compressible), or the backend allocator might not be able to find a
> suitable slot for the compressed page. If these pages are needed
> later on, users will incur IOs from swapins.
>
> This issue prevents the adoption of zswap for potential users who
> cannot tolerate the latency associated with swapping. In many cases,
> these IOs are avoidable if we just keep in memory the pages that zswap
> fail to store.
>
> This patch series add two new features for zswap that will alleviate
> the risk of swapping:
>
> a) When a store attempt fail, keep the page untouched in memory
> instead of swapping it out.
What about writeback when the zswap limit is hit? I understand the
problem, but I am wondering if this is the correct way of fixing it.
We really need to make zswap work without a backing swapfile, which I
think is the correct way to fix all these problems. I was working on
that, but unfortunately I had to pivot to something else before I had
something that was working.
At Google, we have "ghost" swapfiles that we use just to use zswap
without a swapfile. They are sparse files, and we have internal kernel
patches to flag them and never try to actually write to them.
I am not sure how many bandaids we can afford before doing the right
thing. I understand it's a much larger surgery, perhaps there is a way
to get a short-term fix that is also a step towards the final state we
want to reach instead?
>
> b) If the store attempt fails at the compression step, allow the page
> to be stored in its uncompressed form in the zswap pool. This maintains
> the LRU ordering of pages, which will be helpful for accurate
> memory reclaim (zswap writeback in particular).
This is dangerous. Johannes and I discussed this before. This means
that reclaim can end up allocating more memory instead of freeing.
Allocations made in the reclaim path are made under the assumption
that we will eventually free memory. In this case, we won't. In the
worst case scenario, reclaim can leave the system/memcg in a worse
state than before it started.
Perhaps there is a way we can do this without allocating a zswap entry?
I thought before about having a special list_head that allows us to
use the lower bits of the pointers as markers, similar to the xarray.
The markers can be used to place different objects on the same list.
We can have a list that is a mixture of struct page and struct
zswap_entry. I never pursued this idea, and I am sure someone will
scream at me for suggesting it. Maybe there is a less convoluted way
to keep the LRU ordering intact without allocating memory on the
reclaim path.
>
> These features could be enabled independently via two new zswap module
> parameters.
>
> Nhat Pham (2):
> swap: allows swap bypassing on zswap store failure
> zswap: store uncompressed pages when compression algorithm fails
>
> Documentation/admin-guide/mm/zswap.rst | 16 +++++++
> include/linux/zswap.h | 9 ++++
> mm/page_io.c | 6 +++
> mm/shmem.c | 8 +++-
> mm/zswap.c | 64 +++++++++++++++++++++++---
> 5 files changed, 95 insertions(+), 8 deletions(-)
>
> --
> 2.34.1
next prev parent reply other threads:[~2023-10-17 0:58 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-17 0:35 Nhat Pham
2023-10-17 0:35 ` [PATCH 1/2] swap: allows swap bypassing " Nhat Pham
2023-10-17 0:35 ` [PATCH 2/2] zswap: store uncompressed pages when compression algorithm fails Nhat Pham
2023-10-17 0:57 ` Yosry Ahmed [this message]
2023-10-17 4:47 ` [PATCH 0/2] minimize swapping on zswap store failure Johannes Weiner
2023-10-17 5:33 ` Yosry Ahmed
2023-10-17 14:51 ` Johannes Weiner
2023-10-17 15:51 ` Yosry Ahmed
2023-10-17 19:24 ` Nhat Pham
2023-10-17 19:03 ` Nhat Pham
2023-10-17 19:04 ` Nhat Pham
2025-04-02 20:06 ` Joshua Hahn
2025-04-03 20:38 ` Nhat Pham
2025-04-04 1:46 ` Sergey Senozhatsky
2025-04-04 14:06 ` Joshua Hahn
2025-04-04 15:29 ` Nhat Pham
2025-04-08 3:33 ` Sergey Senozhatsky
2025-04-04 15:39 ` Nhat Pham
2025-04-22 11:27 ` Yosry Ahmed
2025-04-22 15:00 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tka6XRyzYndRNEFZmi0Zj4DD2KnVzt=vMGhfF4iN2B4VKw@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=corbet@lwn.net \
--cc=david@ixit.cz \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@meta.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=rppt@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox