linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org, tj@kernel.org,
	 lizefan.x@bytedance.com, cerasuolodomenico@gmail.com,
	sjenning@redhat.com,  ddstreet@ieee.org,
	vitaly.wool@konsulko.com, mhocko@kernel.org,
	 roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev,  hughd@google.com, corbet@lwn.net,
	konrad.wilk@oracle.com,  senozhatsky@chromium.org,
	rppt@kernel.org, linux-mm@kvack.org,  kernel-team@meta.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 david@ixit.cz, chrisl@kernel.org, Wei Xu <weixugc@google.com>,
	 Yu Zhao <yuzhao@google.com>
Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling
Date: Wed, 20 Dec 2023 16:24:22 -0800	[thread overview]
Message-ID: <CAJD7tkbmWcEvsfF8i+HrRetTVu6v4fKFn2WL0RLsHNheu=5wVw@mail.gmail.com> (raw)
In-Reply-To: <20231220145025.GC23822@cmpxchg.org>

On Wed, Dec 20, 2023 at 6:50 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Dec 20, 2023 at 12:59:15AM -0800, Yosry Ahmed wrote:
> > On Tue, Dec 19, 2023 at 9:15 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote:
> > > > > > Taking a step back from all the memory.swap.tiers vs.
> > > > > > memory.zswap.writeback discussions, I think there may be a more
> > > > > > fundamental problem here. If the zswap store failure is recurrent,
> > > > > > pages can keep going back to the LRUs and then sent back to zswap
> > > > > > eventually, only to be rejected again. For example, this can if zswap
> > > > > > is above the acceptance threshold, but could be even worse if it's the
> > > > > > allocator rejecting the page due to not compressing well enough. In
> > > > > > the latter case, the page can keep going back and forth between zswap
> > > > > > and LRUs indefinitely.
> > > > > >
> > > > > > You probably did not run into this as you're using zsmalloc, but it
> > > > > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic
> > > > > > version can happen if zswap is above its acceptance threshold.
> > > > > >
> > > > > > This can cause thrashing and ineffective reclaim. We have an internal
> > > > > > implementation where we mark incompressible pages and put them on the
> > > > > > unevictable LRU when we don't have a backing swapfile (i.e. ghost
> > > > > > swapfiles), and something similar may work if writeback is disabled.
> > > > > > We need to scan such incompressible pages periodically though to
> > > > > > remove them from the unevictable LRU if they have been dirited.
> > > > >
> > > > > I'm not sure this is an actual problem.
> > > > >
> > > > > When pages get rejected, they rotate to the furthest point from the
> > > > > reclaimer - the head of the active list. We only get to them again
> > > > > after we scanned everything else.
> > > > >
> > > > > If all that's left on the LRU is unzswappable, then you'd assume that
> > > > > remainder isn't very large, and thus not a significant part of overall
> > > > > scan work. Because if it is, then there is a serious problem with the
> > > > > zswap configuration.
> > > > >
> > > > > There might be possible optimizations to determine how permanent a
> > > > > rejection is, but I'm not sure the effort is called for just
> > > > > yet. Rejections are already failure cases that screw up the LRU
> > > > > ordering, and healthy setups shouldn't have a lot of those. I don't
> > > > > think this patch adds any sort of new complications to this picture.
> > > >
> > > > We have workloads where a significant amount (maybe 20%? 30% not sure
> > > > tbh) of the memory is incompressible. Zswap is still a very viable
> > > > option for those workloads once those pages are taken out of the
> > > > picture. If those pages remain on the LRUs, they will introduce a
> > > > regression in reclaim efficiency.
> > > >
> > > > With the upstream code today, those pages go directly to the backing
> > > > store, which isn't ideal in terms of LRU ordering, but this patch
> > > > makes them stay on the LRUs, which can be harmful. I don't think we
> > > > can just assume it is okay. Whether we make those pages unevictable or
> > > > store them uncompressed in zswap, I think taking them out of the LRUs
> > > > (until they are redirtied), is the right thing to do.
> > >
> > > This is how it works with zram as well, though, and it has plenty of
> > > happy users.
> >
> > I am not sure I understand. Zram does not reject pages that do not
> > compress well, right? IIUC it acts as a block device so it cannot
> > reject pages. I feel like I am missing something.
>
> zram_write_page() can fail for various reasons - compression failure,
> zsmalloc failure, the memory limit. This results in !!bio->bi_status,
> __end_swap_bio_write redirtying the page, and vmscan rotating it.
>
> The effect is actually more pronounced with zram, because the pages
> don't get activated and thus cycle faster.
>
> What you're raising doesn't seem to be a dealbreaker in practice.

For the workloads using zram, yes, they are exclusively using zsmalloc
which can store incompressible pages anyway.

>
> > If we already want to support taking pages away from the LRUs when
> > rejected by zswap (e.g. Nhat's proposal earlier), doesn't it make
> > sense to do that first so that this patch can be useful for all
> > workloads?
>
> No.
>
> Why should users who can benefit now wait for a hypothetical future
> optimization that isn't relevant to them? And by the looks of it, is
> only relevant to a small set of specialized cases?
>
> And the optimization - should anybody actually care to write it - can
> be transparently done on top later, so that's no reason to change
> merge order, either.

We can agree to disagree here, I am not trying to block this anyway.
But let's at least document this in the commit message/docs/code
(wherever it makes sense) -- that recurrent failures (e.g.
incompressible memory) may keep going back to zswap only to get
rejected, so workloads prone to this may observe some reclaim
inefficiency.


  reply	other threads:[~2023-12-21  0:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07 19:24 Nhat Pham
2023-12-07 19:26 ` Yosry Ahmed
2023-12-07 22:11 ` Andrew Morton
2023-12-08  0:42   ` Nhat Pham
2023-12-08  1:14     ` Nhat Pham
2023-12-08 19:58       ` Andrew Morton
2023-12-08 19:57     ` Andrew Morton
2023-12-08  0:19 ` Chris Li
2023-12-08  1:03   ` Nhat Pham
2023-12-08  1:12     ` Yosry Ahmed
2023-12-08 16:34       ` Johannes Weiner
2023-12-08 20:08         ` Yosry Ahmed
2023-12-09  2:02         ` Chris Li
2023-12-09  0:09       ` Chris Li
2023-12-08 23:55     ` Chris Li
2023-12-09  3:42       ` Johannes Weiner
2023-12-09 17:39         ` Chris Li
2023-12-11 22:55         ` Minchan Kim
2023-12-12  2:43           ` [External] " Zhongkun He
2023-12-12 23:57           ` Chris Li
2023-12-20 10:22             ` Kairui Song
2023-12-14 17:11           ` Johannes Weiner
2023-12-14 17:23             ` Yu Zhao
2023-12-14 18:00               ` Fabian Deutsch
2023-12-14 23:22                 ` Chris Li
2023-12-15  7:42                   ` Fabian Deutsch
2023-12-15  9:40                     ` Chris Li
2023-12-15  9:50                       ` Fabian Deutsch
2023-12-15  9:18                   ` Fabian Deutsch
2023-12-14 18:03               ` Fabian Deutsch
2023-12-14 17:34             ` Christopher Li
2023-12-14 22:11               ` Johannes Weiner
2023-12-14 22:54                 ` Chris Li
2023-12-15  2:19                   ` Nhat Pham
2023-12-12 21:36         ` Nhat Pham
2023-12-13  0:29           ` Chris Li
2023-12-11  9:31       ` Kairui Song
2023-12-12 23:39         ` Chris Li
2023-12-20 10:21           ` Kairui Song
2023-12-15 21:21 ` Yosry Ahmed
2023-12-18 14:44   ` Johannes Weiner
2023-12-18 19:21     ` Nhat Pham
2023-12-18 21:54       ` Yosry Ahmed
2023-12-18 21:52     ` Yosry Ahmed
2023-12-20  5:15       ` Johannes Weiner
2023-12-20  8:59         ` Yosry Ahmed
2023-12-20 14:50           ` Johannes Weiner
2023-12-21  0:24             ` Yosry Ahmed [this message]
2023-12-21  0:50               ` Nhat Pham
2023-12-21  0:57 ` [PATCH v6] zswap: memcontrol: implement zswap writeback disabling (fix) Nhat Pham
2023-12-24 17:17   ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkbmWcEvsfF8i+HrRetTVu6v4fKFn2WL0RLsHNheu=5wVw@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@ixit.cz \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@meta.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=shakeelb@google.com \
    --cc=sjenning@redhat.com \
    --cc=tj@kernel.org \
    --cc=vitaly.wool@konsulko.com \
    --cc=weixugc@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox