Re: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, axboe@kernel.dk,
	 bala.seshasayee@linux.intel.com, chrisl@kernel.org,
	david@redhat.com,  hannes@cmpxchg.org,
	kanchana.p.sridhar@intel.com, kasong@tencent.com,
	 linux-block@vger.kernel.org, minchan@kernel.org,
	nphamcs@gmail.com,  ryan.roberts@arm.com, surenb@google.com,
	terrelln@fb.com,  usamaarif642@gmail.com, v-songbaohua@oppo.com,
	wajdi.k.feghali@intel.com,  willy@infradead.org,
	ying.huang@intel.com, yosryahmed@google.com,  yuzhao@google.com,
	zhengtangquan@oppo.com, zhouchengming@bytedance.com
Subject: Re: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages
Date: Fri, 29 Nov 2024 09:40:09 +1300	[thread overview]
Message-ID: <CAGsJ_4wpHMoSk=6UXDNv_r+AR_-TwyQ5bM=sObvgcXinBOhFtA@mail.gmail.com> (raw)
In-Reply-To: <20241127045224.GF440697@google.com>

On Wed, Nov 27, 2024 at 5:52 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/27 09:20), Barry Song wrote:
> [..]
> > >    390 12736
> > >    395 13056
> > >    404 13632
> > >    410 14016
> > >    415 14336
> > >    418 14528
> > >    447 16384
> > >
> > > E.g. 13632 and 13056 are more than 500 bytes apart.
> > >
> > > > swap-out time(ms)       68711              49908
> > > > swap-in time(ms)        30687              20685
> > > > compression ratio       20.49%             16.9%
> > >
> > > These are not the only numbers to focus on, really important metrics
> > > are: zsmalloc pages-used and zsmalloc max-pages-used.  Then we can
> > > calculate the pool memory usage ratio (the size of compressed data vs
> > > the number of pages zsmalloc pool allocated to keep them).
> >
> > To address this, we plan to collect more data and get back to you
> > afterwards. From my understanding, we still have an opportunity
> > to refine the CHAIN SIZE?
>
> Do you mean changing the value?  It's configurable.
>
> > Essentially, each small object might cause some waste within the
> > original PAGE_SIZE. Now, with 4 * PAGE_SIZE, there could be a
> > single instance of waste. If we can manage the ratio, this could be
> > optimized?
>
> All size classes work the same and we merge size-classes with equal
> characteristics.  So in the example above
>
>                 395 13056
>                 404 13632
>
> size-classes #396-403 are merged with size-class #404.  And #404 size-class
> splits zspage into 13632-byte chunks, any smaller objects (e.g. an object
> from size-class #396 (which can be just one byte larger than #395
> objects)) takes that entire chunk and the rest of the space in the chunk
> is just padding.
>
> CHAIN_SIZE is how we find the optimal balance.  The larger the zspage
> the more likely we squeeze some space for extra objects, which otherwise
> would have been just a waste.  With large CHAIN_SIZE we also change
> characteristics of many size classes so we merge less classes and have
> more clusters.  The price, on the other hand, is more physical 0-order
> pages per zspage, which can be painful.  On all the tests I ran 8 or 10
> worked best.

Thanks very much for the explanation. We’ll gather more data on this and follow
up with you.

>
> [..]
> > > another option might be to just use a faster algorithm and then utilize
> > > post-processing (re-compression with zstd or writeback) for memory
> > > savings?
> >
> > The concern lies in power consumption
>
> But the power consumption concern is also in "decompress just one middle
> page from very large object" case, and size-classes de-fragmentation

That's why we have "[patch 4/4] mm: fall back to four small folios if mTHP
allocation fails" to address the issue of "decompressing just one middle page
from a very large object."  I assume that recompression and writeback should
also focus on large objects if the original compression involves multiple pages?

> which requires moving around lots of objects in order to form more full
> zspage and release empty zspages.  There are concerns everywhere, how

I assume the cost of defragmentation is M * N, where:
* M is the number of objects,
* N is the size of the objects.

With large objects, M is reduced to 1/4 of the original number of
objects. Although
N increases, the overall M * N becomes slightly smaller than before,
as N is just
under 4 times the size of the original objects?

> many of them are measured and analyzed and either ruled out or confirmed
> is another question.

In phone scenarios, if recompression uses zstd and the original compression
is based on lz4 with 4KB blocks, the cost to obtain zstd-compressed objects
would be:

* A: Compression of 4 × 4KB using lz4
* B: Decompression of 4 × 4KB using lz4
* C: Compression of 4 × 4KB using zstd

By leveraging the speed advantages of mTHP swap and zstd's large-block
compression,
the cost becomes:
D: Compression of 16KB using zstd

Since D is significantly smaller than C (D < C), it follows that:
D < A + B + C  ?

Thanks
Barry

     prev parent reply	other threads:[~2024-11-28 20:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-21 22:25 Barry Song
2024-11-21 22:25 ` [PATCH RFC v3 1/4] mm: zsmalloc: support objects compressed based on multiple pages Barry Song
2024-11-26  5:37   ` Sergey Senozhatsky
2024-11-27  1:53     ` Barry Song
2024-11-21 22:25 ` [PATCH RFC v3 2/4] zram: support compression at the granularity of multi-pages Barry Song
2024-11-21 22:25 ` [PATCH RFC v3 3/4] zram: backend_zstd: Adjust estimated_src_size to accommodate multi-page compression Barry Song
2024-11-21 22:25 ` [PATCH RFC v3 4/4] mm: fall back to four small folios if mTHP allocation fails Barry Song
2024-11-22 14:54   ` Usama Arif
2024-11-24 21:47     ` Barry Song
2024-11-25 16:19       ` Usama Arif
2024-11-25 18:32         ` Barry Song
2024-11-26  5:09 ` [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages Sergey Senozhatsky
2024-11-26 10:52   ` Sergey Senozhatsky
2024-11-26 20:31     ` Barry Song
2024-11-27  5:04       ` Sergey Senozhatsky
2024-11-28 20:56         ` Barry Song
2024-11-26 20:20   ` Barry Song
2024-11-27  4:52     ` Sergey Senozhatsky
2024-11-28 20:40       ` Barry Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4wpHMoSk=6UXDNv_r+AR_-TwyQ5bM=sObvgcXinBOhFtA@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bala.seshasayee@linux.intel.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=kasong@tencent.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=surenb@google.com \
    --cc=terrelln@fb.com \
    --cc=usamaarif642@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengtangquan@oppo.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox