linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosryahmed@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Chengming Zhou <chengming.zhou@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 6/9] mm: zswap: drop support for non-zero same-filled pages handling
Date: Thu, 28 Mar 2024 16:19:14 -0700	[thread overview]
Message-ID: <CAKEwX=OPDLxH-0-3F+xOc2SL5Ouj-R-HEC5QQrW+Q9Fn8pyeRg@mail.gmail.com> (raw)
In-Reply-To: <20240328210709.GH7597@cmpxchg.org>

On Thu, Mar 28, 2024 at 2:07 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Mar 28, 2024 at 01:23:42PM -0700, Yosry Ahmed wrote:
> > On Thu, Mar 28, 2024 at 12:31 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Mon, Mar 25, 2024 at 11:50:14PM +0000, Yosry Ahmed wrote:
> > > > The current same-filled pages handling supports pages filled with any
> > > > repeated word-sized pattern. However, in practice, most of these should
> > > > be zero pages anyway. Other patterns should be nearly as common.
> > > >
> > > > Drop the support for non-zero same-filled pages, but keep the names of
> > > > knobs exposed to userspace as "same_filled", which isn't entirely
> > > > inaccurate.
> > > >
> > > > This yields some nice code simplification and enables a following patch
> > > > that eliminates the need to allocate struct zswap_entry for those pages
> > > > completely.
> > > >
> > > > There is also a very small performance improvement observed over 50 runs
> > > > of kernel build test (kernbench) comparing the mean build time on a
> > > > skylake machine when building the kernel in a cgroup v1 container with a
> > > > 3G limit:
> > > >
> > > >               base            patched         % diff
> > > > real          70.167          69.915          -0.359%
> > > > user          2953.068        2956.147        +0.104%
> > > > sys           2612.811        2594.718        -0.692%
> > > >
> > > > This probably comes from more optimized operations like memchr_inv() and
> > > > clear_highpage(). Note that the percentage of zero-filled pages during
> > > > this test was only around 1.5% on average, and was not affected by this
> > > > patch. Practical workloads could have a larger proportion of such pages
> > > > (e.g. Johannes observed around 10% [1]), so the performance improvement
> > > > should be larger.
> > > >
> > > > [1]https://lore.kernel.org/linux-mm/20240320210716.GH294822@cmpxchg.org/
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > >
> > > This is an interesting direction to pursue, but I actually thinkg it
> > > doesn't go far enough. Either way, I think it needs more data.
> > >
> > > 1) How frequent are non-zero-same-filled pages? Difficult to
> > >    generalize, but if you could gather some from your fleet, that
> > >    would be useful. If you can devise a portable strategy, I'd also be
> > >    more than happy to gather this on ours (although I think you have
> > >    more widespread zswap use, whereas we have more disk swap.)
> >
> > I am trying to collect the data, but there are.. hurdles. It would
> > take some time, so I was hoping the data could be collected elsewhere
> > if possible.
> >
> > The idea I had was to hook a BPF program to the entry of
> > zswap_fill_page() and create a histogram of the "value" argument. We
> > would get more coverage by hooking it to the return of
> > zswap_is_page_same_filled() and only updating the histogram if the
> > return value is true, as it includes pages in zswap that haven't been
> > swapped in.
> >
> > However, with zswap_is_page_same_filled() the BPF program will run in
> > all zswap stores, whereas for zswap_fill_page() it will only run when
> > needed. Not sure if this makes a practical difference tbh.
> >
> > >
> > > 2) The fact that we're doing any of this pattern analysis in zswap at
> > >    all strikes me as a bit misguided. Being efficient about repetitive
> > >    patterns is squarely in the domain of a compression algorithm. Do
> > >    we not trust e.g. zstd to handle this properly?
> >
> > I thought about this briefly, but I didn't follow through. I could try
> > to collect some data by swapping out different patterns and observing
> > how different compression algorithms react. That would be interesting
> > for sure.
> >
> > >
> > >    I'm guessing this goes back to inefficient packing from something
> > >    like zbud, which would waste half a page on one repeating byte.
> > >
> > >    But zsmalloc can do 32 byte objects. It's also a batching slab
> > >    allocator, where storing a series of small, same-sized objects is
> > >    quite fast.
> > >
> > >    Add to that the additional branches, the additional kmap, the extra
> > >    scanning of every single page for patterns - all in the fast path
> > >    of zswap, when we already know that the vast majority of incoming
> > >    pages will need to be properly compressed anyway.
> > >
> > >    Maybe it's time to get rid of the special handling entirely?
> >
> > We would still be wasting some memory (~96 bytes between zswap_entry
> > and zsmalloc object), and wasting cycling allocating them. This could
> > be made up for by cycles saved by removing the handling. We will be
> > saving some branches for sure. I am not worried about kmap as I think
> > it's a noop in most cases.
>
> Yes, true.
>
> > I am interested to see how much we could save by removing scanning for
> > patterns. We may not save much if we abort after reading a few words
> > in most cases, but I guess we could also be scanning a considerable
> > amount before aborting. On the other hand, we would be reading the
> > page contents into cache anyway for compression, so maybe it doesn't
> > really matter?
> >
> > I will try to collect some data about this. I will start by trying to
> > find out how the compression algorithms handle same-filled pages. If
> > they can compress it efficiently, then I will try to get more data on
> > the tradeoff from removing the handling.
>
> I do wonder if this could be overthinking it, too.
>
> Double checking the numbers on our fleet, a 96 additional bytes for
> each same-filled entry would result in a
>
> 1) p50 waste of 0.008% of total memory, and a
>
> 2) p99 waste of 0.06% of total memory.
>
> And this is without us having even thought about trying to make
> zsmalloc more efficient for this particular usecase - which might be
> the better point of attack, if we think it's actually worth it.
>
> So my take is that unless removing it would be outright horrible from
> a %sys POV (which seems pretty unlikely), IMO it would be fine to just
> delete it entirely with a "not worth the maintenance cost" argument.
>
> If you turn the argument around, and somebody would submit the code as
> it is today, with the numbers being what they are above, I'm not sure
> we would even accept it!

The context guy is here :)

Not arguing for one way or another, but I did find the original patch
that introduced same filled page handling:

https://github.com/torvalds/linux/commit/a85f878b443f8d2b91ba76f09da21ac0af22e07f

https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1/T/#u

The number looks impressive, and there is some detail about the
experiment setup, but I can't seem to find what the allocator +
compressor used.

Which, as Johannes has pointed out, matters a lot. A good compressor
(which should work on arguably the most trivial data pattern there is)
+ a backend allocator that is capable of handling small objects well
could make this case really efficient, without resorting to special
handling at the zswap level.


  reply	other threads:[~2024-03-28 23:19 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 23:50 [RFC PATCH 0/9] zswap: store zero-filled pages more efficiently Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 1/9] mm: zswap: always shrink in zswap_store() if zswap_pool_reached_full Yosry Ahmed
2024-03-26 21:49   ` Nhat Pham
2024-03-27  2:21   ` Chengming Zhou
2024-03-28 19:09   ` Johannes Weiner
2024-03-25 23:50 ` [RFC PATCH 2/9] mm: zswap: refactor storing to the tree out of zswap_store() Yosry Ahmed
2024-03-27  2:25   ` Chengming Zhou
2024-03-27 22:29     ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 3/9] mm: zswap: refactor limit checking from zswap_store() Yosry Ahmed
2024-03-27  2:42   ` Chengming Zhou
2024-03-27 22:30     ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 4/9] mm: zswap: move more same-filled pages checks outside of zswap_store() Yosry Ahmed
2024-03-26 21:57   ` Nhat Pham
2024-03-27  2:39   ` Chengming Zhou
2024-03-27 22:32     ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 5/9] mm: zswap: remove zswap_same_filled_pages_enabled Yosry Ahmed
2024-03-26 22:01   ` Nhat Pham
2024-03-27  2:44   ` Chengming Zhou
2024-03-27 22:34     ` Yosry Ahmed
2024-03-28 19:11   ` Johannes Weiner
2024-03-28 20:06     ` Yosry Ahmed
2024-03-29  2:14       ` Yosry Ahmed
2024-03-29 14:02         ` Maciej S. Szmigiero
2024-03-29 17:44           ` Johannes Weiner
2024-03-29 18:22             ` Yosry Ahmed
2024-04-01 10:37               ` Maciej S. Szmigiero
2024-04-01 18:29                 ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 6/9] mm: zswap: drop support for non-zero same-filled pages handling Yosry Ahmed
2024-03-27 11:25   ` Chengming Zhou
2024-03-27 16:40   ` Nhat Pham
2024-03-27 22:38     ` Yosry Ahmed
2024-03-28 19:31   ` Johannes Weiner
2024-03-28 20:23     ` Yosry Ahmed
2024-03-28 21:07       ` Johannes Weiner
2024-03-28 23:19         ` Nhat Pham [this message]
2024-03-29  2:05           ` Yosry Ahmed
2024-03-29  4:27             ` Yosry Ahmed
2024-03-29 17:37               ` Johannes Weiner
2024-03-29 18:56                 ` Yosry Ahmed
2024-03-29 21:17                   ` Johannes Weiner
2024-03-29 22:29                     ` Yosry Ahmed
2024-03-28 23:33       ` Nhat Pham
2024-03-29  2:07         ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 7/9] mm: zswap: store zero-filled pages without a zswap_entry Yosry Ahmed
2024-03-28  8:12   ` Chengming Zhou
2024-03-28 18:45     ` Yosry Ahmed
2024-03-28 19:38   ` Johannes Weiner
2024-03-28 20:29     ` Yosry Ahmed
2024-03-25 23:50 ` [RFC PATCH 8/9] mm: zswap: do not check the global limit for zero-filled pages Yosry Ahmed
2024-03-28  8:15   ` Chengming Zhou
2024-03-25 23:50 ` [RFC PATCH 9/9] mm: zswap: use zswap_entry_free() for partially initialized entries Yosry Ahmed
2024-03-28  8:31   ` Chengming Zhou
2024-03-28 18:49     ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=OPDLxH-0-3F+xOc2SL5Ouj-R-HEC5QQrW+Q9Fn8pyeRg@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox