Re: [PATCH v4] mm: add zblock allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Vitaly Wool <vitaly.wool@konsulko.se>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, Nhat Pham <nphamcs@gmail.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Igor Belousov <igor.b@beldev.am>,
	Minchan Kim <minchan@kernel.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: [PATCH v4] mm: add zblock allocator
Date: Wed, 30 Apr 2025 05:27:18 -0700	[thread overview]
Message-ID: <aBIXJrbxCmYSoCuz@Asmaa.> (raw)
In-Reply-To: <e764d05a-6a83-4563-9f28-3f1a3e28727b@konsulko.se>

On Wed, Apr 23, 2025 at 09:53:48PM +0200, Vitaly Wool wrote:
> On 4/22/25 12:46, Yosry Ahmed wrote:
> > I didn't look too closely but I generally agree that we should improve
> > zsmalloc where possible rather than add a new allocator. We are trying
> > not to repeat the zbud/z3fold or slub/slob stories here. Zsmalloc is
> > getting a lot of mileage from both zswap and zram, and is more-or-less
> > battle-tested. Let's work toward building upon that instead of starting
> > over.
> 
> The thing here is, zblock is using a very different approach to small object
> allocation. The idea is: we have an array of descriptors which correspond to
> multi-page blocks divided in chunks of equal size (block_size[i]). For each
> object of size x we find the descriptor n such as:
> 	block_size[n-1] < n < block_size[n]
> and then we store that object in an empty slot in one of the blocks. Thus,
> the density is high, the search is fast (rbtree based) and there are no
> objects spanning over 2 pages, so no extra memcpy involved.

The block sizes seem to be similar in principle to class sizes in
zsmalloc. It seems to me that there are two apparent differentiating
properties to zblock:

- Block lookup uses an rbtree, so it's faster than zsmalloc's list
  iteration. On the other hand, zsmalloc divides each class into
  fullness groups and tries to pack almost full groups first. Not sure
  if zblock's approach is strictly better.

- Zblock uses higher order allocations vs. zsmalloc always using order-0
  allocations. I think this may be the main advantage and I remember
  asking if zsmalloc can support this. Always using order-0 pages is
  more reliable but may not always be the best choice.

On the other hand, zblock is lacking in other regards. For example:
- The lack of compaction means that certain workloads will see a lot of
  fragmentation. It purely depends on the access patterns. We could end
  up with a lot of blocks each containing a single object and there is
  no way to recover AFAICT.

- Zblock will fail if a high order allocation cannot be satisfied, which
  is more likely to happen under memory pressure, and it's usually when
  zblock is needed in the first place.

- There's probably more, I didn't check too closely, and I am hoping
  that Minchan and Sergey will chime in here.

> 
> And with the latest zblock, we see that it has a clear advantage in
> performance over zsmalloc, retaining roughly the same allocation density for
> 4K pages and scoring better on 16K pages. E. g. on a kernel compilation:
> 
> * zsmalloc/zstd/make -j32 bzImage
> 	real	8m0.594s
> 	user	39m37.783s
> 	sys	8m24.262s
> 	Zswap:            200600 kB <-- after build completion
> 	Zswapped:         854072 kB <-- after build completion
> 	zswpin 309774
> 	zswpout 1538332
> 
> * zblock/zstd/make -j32 bzImage
> 	real	7m35.546s
> 	user	38m03.475s
> 	sys	7m47.407s
> 	Zswap:            250940 kB <-- after build completion
> 	Zswapped:         870660 kB <-- after build completion
> 	zswpin 248606
> 	zswpout 1277319
> 
> So what we see here is that zblock is definitely faster and at least not
> worse with regard to allocation density under heavy load. It has slightly
> worse _idle_ allocation density but since it will quickly catch up under
> load it is not really important. What is important is that its
> characteristics don't deteriorate over time. Overall, zblock is simple and
> efficient and there is /raison d'etre/ for it.

Zblock is performing better for this specific workload, but as I
mentioned earlier there are other aspects that zblock is missing.
Zsmalloc has seen a very large range of workloads of different types,
and we cannot just dismiss this.

> 
> Now, it is indeed possible to partially rework zsmalloc using zblock's
> algorithm but this will be a rather substantial change, equal or bigger in
> effort to implementing the approach described above from scratch (and this
> is what we did), and with such drastic changes most of the testing that has
> been done with zsmalloc would be invalidated, and we'll be out in the wild
> anyway. So even though I see your point, I don't think it applies in this
> particular case.


Well, we should start by breaking down the differences and finding out
why zblock is performing better, as I mentioned above. If it's the
faster lookups or higher order allocations, we can work to support that
in zsmalloc. Similarly, if zsmalloc has unnecessary complexity it'd be
great to get rid of it rather than starting over.

Also, we don't have to do it all at once and invalidate the testing that
zsmalloc has seen. These can be incremental changes that get spread over
multiple releases, getting incremental exposure in the process.

> 
> ~Vitaly

next prev parent reply	other threads:[~2025-04-30 12:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-12 15:42 Vitaly Wool
2025-04-12 19:25 ` Igor Belousov
2025-04-16 12:09 ` Johannes Weiner
2025-04-16 20:10   ` Vitaly
2025-04-17 14:16     ` Johannes Weiner
2025-04-18  7:43       ` Vitaly Wool
2025-04-18 10:52   ` David Hildenbrand
2025-04-18 10:56     ` Vitaly Wool
2025-04-18 11:03       ` David Hildenbrand
2025-04-22 10:46 ` Yosry Ahmed
2025-04-23 19:53   ` Vitaly Wool
2025-04-30 12:27     ` Yosry Ahmed [this message]
2025-05-01 12:41       ` Vitaly Wool
2025-05-01 23:43         ` Sergey Senozhatsky
2025-05-06 13:04         ` Yosry Ahmed
2025-06-11 17:11           ` Vitaly Wool
2025-05-01 23:49     ` Sergey Senozhatsky
2025-05-03  8:27       ` Vitaly
2025-05-04  9:26 ` Andrew Morton
2025-07-20  2:56 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBIXJrbxCmYSoCuz@Asmaa. \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=igor.b@beldev.am \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=shakeel.butt@linux.dev \
    --cc=vitaly.wool@konsulko.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox