From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 95A7CD3ABF4
	for <linux-mm@archiver.kernel.org>; Mon, 11 Nov 2024 20:31:24 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 206F48D0009; Mon, 11 Nov 2024 15:31:24 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1B7F38D0001; Mon, 11 Nov 2024 15:31:24 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 056E98D0009; Mon, 11 Nov 2024 15:31:23 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id DA77E8D0001
	for <linux-mm@kvack.org>; Mon, 11 Nov 2024 15:31:23 -0500 (EST)
Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 85B31121B8C
	for <linux-mm@kvack.org>; Mon, 11 Nov 2024 20:31:23 +0000 (UTC)
X-FDA: 82774958070.22.23DA0F8
Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169])
	by imf09.hostedemail.com (Postfix) with ESMTP id 4C86A140007
	for <linux-mm@kvack.org>; Mon, 11 Nov 2024 20:30:53 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=EWBBIukD;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731356953; a=rsa-sha256;
	cv=none;
	b=k9gW6JV+mkQssyHlGBzrFXLOA6T3forpt+kBjDClZBGfmSY68R9azpWrIxWtfYAkPefycj
	NA5vKE3f1xc18B73HeRwyfcBOpbQQITkIhhknLNdHBtp9nyPBc0SM9AYPn/n2bk1X/ozNE
	Mn+FMxubD/JgcMWSsPyROmCTnAkrQh8=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=EWBBIukD;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1731356953;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=;
	b=CFrxcc1C6wyS6KV5MNekkkD1XouSX3sVMGJX5XS/YDizdv68tK2mOcHsWE93VcRP7yW1gI
	2S9mELjoDPavdlu/wWCJ3zc3uGG1Ywj+SUd8A4MPXH20Bv3oDS7nNvtDaiNB6bQvZeKOER
	cAa3q23XZT3M8c9jxWupW8wIyQQkWHI=
Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-50d34db4edeso3591834e0c.0
        for <linux-mm@kvack.org>; Mon, 11 Nov 2024 12:31:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1731357081; x=1731961881; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=;
        b=EWBBIukDQ67Tby8uibiNQ8Tl8yFp0L/mb47bAQi5Mbp4GHQoXPg7RVE1khwcQ7c790
         TWI/Ah7OwV6JpFsyiKyMPEbznUnlj4+yXR87cwJ5SIiDCooI8pM5L0WDDYRAVNbpdfAB
         7wYXY1Rz64oRWpsGzF+Yahe3RDKwZ2WMMBkzdFkmSbX4crmXOonQCwObhYnqBDxxm1De
         Wzux2sTNDB+KFYT2CadUBL3vy22VYQu09GKOirTFpTfjCGyeD8E85UyqU5GhfU52Kpts
         9st40wPZq4s8ieGWGZ6Qm63s2rqBvVI6wfOcFRl1u36N89ADnMbBEv4vY2cpAXYraoij
         0wJg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731357081; x=1731961881;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=4m+CAl8xmjRRoAU9r3aqEsFRGdffCDozgyQfRK4HAzg=;
        b=F/V5g92cGG2Sbo1B4QF1r9Ee5dyDZyThy073h/pFPI+DjzBG8FPL5ShSEy3nuCO+rR
         B0P7BHltE23neE7zZ6AemgYodFb55ZrKPBnLY3Q2hCEDJU/x+tyr/+TdKoGiwokSfNvE
         QFCgAjkpypse6SHKcBUZDElsyHDkaz3UmMvsAAkgvis5kP+jabx0WuFTP8HpK0ZdVzqA
         Vy1LN5N/7cJ/x/rOzA6o0ywGbqvOBppkM76vvJOXPpQ5OPTGuOoo6BrOacc0sgyaHGwL
         A7IonlcBMqkfmdKjjTUE+XliCeqFNH7cTW93aAO8fyFUUSXB7NSy3vrLD7xmFLIBzKWj
         YIYw==
X-Forwarded-Encrypted: i=1; AJvYcCUmC47u5v87I0/VcIzwO4dAd6lBEnR5Iy7gg+ozkJYIvntfAsW+Ykh+tG0VzwPaei3JJGhDeUt7Og==@kvack.org
X-Gm-Message-State: AOJu0Yx6+o3zoJIACnYGvFhqAHghCIl8abNAurTBj3XbLVAsNwPRS0AO
	HOI6sf26ZbeT4qNJvgkb4XIffgKRRkvndyk7d1N1V5CIYhLtlvEmduT/rClGK7EF0MzuOxQVWip
	K6xoG3n6S7IqrHSs3ZunWkMIqTTg=
X-Google-Smtp-Source: AGHT+IH8p5KmEazeLGjGEu9FVBZXAN7xs55SO7VcOtUW9hovlquT8qaaf2S22O4V5mTf7Y9ydqnw4J6SWv663bs5oTw=
X-Received: by 2002:a05:6122:640c:20b0:50d:5754:c903 with SMTP id
 71dfb90a1353d-513fffc8fc0mr8960702e0c.4.1731357080535; Mon, 11 Nov 2024
 12:31:20 -0800 (PST)
MIME-Version: 1.0
References: <20241107101005.69121-1-21cnbao@gmail.com> <87iksy5mkh.fsf@yhuang6-desk2.ccr.corp.intel.com>
 <CAGsJ_4wOGPbGQgqDidnYUCCpAT8sw+S92NEU+trAQL_rnC10ZA@mail.gmail.com> <28446805-f533-44fe-988a-71dcbdb379ab@gmail.com>
In-Reply-To: <28446805-f533-44fe-988a-71dcbdb379ab@gmail.com>
From: Barry Song <21cnbao@gmail.com>
Date: Tue, 12 Nov 2024 09:31:09 +1300
Message-ID: <CAGsJ_4yuZLOE0_yMOZj=KkRTyTotHw4g5g-t91W=MvS5zA4rYw@mail.gmail.com>
Subject: Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram
 based on multi-pages
To: Usama Arif <usamaarif642@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>, linux-mm@kvack.org, akpm@linux-foundation.org, 
	axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, 
	david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, 
	kasong@tencent.com, linux-block@vger.kernel.org, minchan@kernel.org, 
	nphamcs@gmail.com, senozhatsky@chromium.org, surenb@google.com, 
	terrelln@fb.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, 
	willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, 
	zhengtangquan@oppo.com, zhouchengming@bytedance.com, ryan.roberts@arm.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 4C86A140007
X-Stat-Signature: 8xz5ihmq1jyjmhtaqyeci7qh38cd6jg1
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-HE-Tag: 1731357053-602539
X-HE-Meta: U2FsdGVkX1/8k9G1u7Hh+Ipj2OqfO5hPS8+B+09Sv9NH8rsYOc3/6pDE4TPos+CU2revllY+ybjgOiX7G9hjw7aZwcmp/DDQN4cKOBcdC3vJ0tC/+IIp7j1WOFizV6olkB6bGoqMEvuUEZz8VdG5SqJVGzVCKstCRpTHO2Ihxy1Ij+QAQLI7lNQePyT29wTWAlUy0SnO7Qq8ThPnWpRZeGA3NFrW86wPnRK5dbVdtXdnblSL0kFbE/V8SHflFP/mmrI4+f7fnzriXKcZZdBsngHSzzHcxyZYW/jZ18hjwQhCJp+37zmUzIfIOt96/3/y01gy1DpzA0qnTGMtGNQxDnSgj5k0Sstjz5wGzFAWq4TMy6FyWFevPygEKMieJgICgINXqwLqo4V9tZ69V6aCeChFjI8iIH61Hnl9hWHTq6GFmW1/6LnqldXSI2Qt9NqsqwPnxG43eSGlhZLyQkV4+f/OJRBkuiLadZqZUyAVlYnfxlzLrxZrZYCDskEux2RqSCsQoZjVZEuidgxiFvL8mhbvN5UdR1DbCMQkOeWkjw8XozWGTGQSxWUl5b1j0EDtsS06PpjkKzncpLzXppnPEpaEQzrGHJfUjQWVxkFjTxPsO7afWX+in7rYmltZg36WQ6O5Yy96rRNbHW7HYQ0HIi3gGR7CssP7n5/widMQMU/b6QEVCkL1Btg5r7Lx+6wDFVilPp6dLgx79NxlJx7m2GFqh2MAzDD2vwNb8WLfzjLkI2oCx5WKXGcrrAXJ7tdcvCbeZy4/AEwt2yLLkHfNU70PRzuEGQDjWDjw/MAsIdYc4M6DZAaiJt7fMkgbvG/KzHXCWKQrr88Hoz+b5TpEsg12t0diLAxgGEwETi7oKf6E3n6xGH4mNO4z+eZcw0iyvpyZlloNK9X2yB300gSuE8ieh2xuYhc3+2fBY6bOh/AWokl8w5qDO5xR+kvFQjA6Cz2Cgm2c9kVa6OjAYTp
 1fqK5NV4
 wbrgGMfhUYQu+CWM/oLi/7mybQwfxwtiVrfBMZkbbx3Cnwd+JQ/KjhamqamMkVPrVDU5JIOy+1CPfwjvTQqhjnk+iwn9aKIP6odjCSnW79JelRDoKd9LFLPX4JXQDzh1xc9ZFE8vSZoxQVRlXGcCvHK0Eze8O+UazE/7fKYupZELZThM5M/vQtwtsqeuYMJb2kSRh36LyjE2EXm5qy0ReGoAr5hWsJ082zB9sjT5X2LQsH/8kFQwb/vUPJW/2VDrOMNl4lbhoLcNzsNkzN3zTTUGFJdads/zhuCp5tldq9sT9tY4r3NyuQUGnPxjnSEwk55CO/aNg1leniK+IF/Gefw4m2Zgag07FbzKNzQndu0OrJ8yqWRHgAM05TzduDaEt3uRKFbbpNPA2qWLBNEB7k5C7pK1qmS+UukziyMISodrk2/5g+m7SScVbLbKFfdHjfQx1vUkOJw9/ztnOojSfzN//ahTrEgg17zA3
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, Nov 12, 2024 at 5:43=E2=80=AFAM Usama Arif <usamaarif642@gmail.com>=
 wrote:
>
>
>
> On 08/11/2024 06:51, Barry Song wrote:
> > On Fri, Nov 8, 2024 at 6:23=E2=80=AFPM Huang, Ying <ying.huang@intel.co=
m> wrote:
> >>
> >> Hi, Barry,
> >>
> >> Barry Song <21cnbao@gmail.com> writes:
> >>
> >>> From: Barry Song <v-songbaohua@oppo.com>
> >>>
> >>> When large folios are compressed at a larger granularity, we observe
> >>> a notable reduction in CPU usage and a significant improvement in
> >>> compression ratios.
> >>>
> >>> mTHP's ability to be swapped out without splitting and swapped back i=
n
> >>> as a whole allows compression and decompression at larger granulariti=
es.
> >>>
> >>> This patchset enhances zsmalloc and zram by adding support for dividi=
ng
> >>> large folios into multi-page blocks, typically configured with a
> >>> 2-order granularity. Without this patchset, a large folio is always
> >>> divided into `nr_pages` 4KiB blocks.
> >>>
> >>> The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER`
> >>> setting, where the default of 2 allows all anonymous THP to benefit.
> >>>
> >>> Examples include:
> >>> * A 16KiB large folio will be compressed and stored as a single 16KiB
> >>>   block.
> >>> * A 64KiB large folio will be compressed and stored as four 16KiB
> >>>   blocks.
> >>>
> >>> For example, swapping out and swapping in 100MiB of typical anonymous
> >>> data 100 times (with 16KB mTHP enabled) using zstd yields the followi=
ng
> >>> results:
> >>>
> >>>                         w/o patches        w/ patches
> >>> swap-out time(ms)       68711              49908
> >>> swap-in time(ms)        30687              20685
> >>> compression ratio       20.49%             16.9%
> >>
> >> The data looks good.  Thanks!
> >>
> >> Have you considered the situation that the large folio fails to be
> >> allocated during swap-in?  It's possible because the memory may be ver=
y
> >> fragmented.
> >
> > That's correct, good question. On phones, we use a large folio pool to =
maintain
> > a relatively high allocation success rate. When mTHP allocation fails, =
we have
> > a workaround to allocate nr_pages of small folios and map them together=
 to
> > avoid partial reads.  This ensures that the benefits of larger block co=
mpression
> > and decompression are consistently maintained.  That was the code runni=
ng
> > on production phones.
> >
>
> Thanks for sending the v2!
>
> How is the large folio pool maintained. I dont think there is something i=
n upstream

In production phones, we have extended the migration type for mTHP
separately during Linux
boot[1].

[1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu=
s/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c#L2089

These pageblocks have their own migration type, resulting in a separate bud=
dy
free list.

We prevent order-0 allocations from drawing memory from this pool, ensuring=
 a
relatively high success rate for mTHP allocations.

In one instance, phones reported an mTHP allocation success rate of less th=
an 5%
after running for a few hours without this kind of reservation
mechanism. Therefore,
we need an upstream solution in the kernel to ensure sustainable mTHP suppo=
rt
across all scenarios.

> kernel for this? The only thing that I saw on the mailing list is TAO for=
 pmd-mappable
> THPs only? I think that was about 7-8 months ago and wasn't merged?

TAO supports mTHP as long as it can be configured through the bootcmd:
nomerge=3D25%,4
This means we are providing a 4-order mTHP pool with 25% of total memory
reserved.

Note that the Android common kernel has already integrated TAO[2][3],
so we are trying
to use TAO to replace our previous approach of extending the migration type=
.

[2] https://android.googlesource.com/kernel/common/+/c1ff6dcf209e4abc23584d=
2cd117f725421bccac
[3] https://android.googlesource.com/kernel/common/+/066872d13d0c0b076785f0=
b794b650de0941c1c9

> The workaround to allocate nr_pages of small folios and map them
> together to avoid partial reads is also not upstream, right?

Correct. It's running on the phones[4][5], but I still don't know how
to handle it upstream
properly.

[4] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu=
s/sm8650_u_14.0.0_oneplus12/mm/memory.c#L4656
[5] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplu=
s/sm8650_u_14.0.0_oneplus12/mm/memory.c#L5439

>
> Do you have any data how this would perform with the upstream kernel, i.e=
. without
> a large folio pool and the workaround and if large granularity compressio=
n is worth having
> without those patches?

I=E2=80=99d say large granularity compression isn=E2=80=99t a problem, but =
large
granularity decompression
could be.

The worst case would be if we swap out a large block, such as 16KB,
but end up swapping in
4 times due to allocation failures, falling back to smaller folios. In
this scenario, we would need
to perform three redundant decompressions. I will work with Tangquan
to provide this data this
week.

But once we swap in small folios, they remain small (we can't collapse
them for mTHP).
As a result, the next time, they will be swapped out and swapped in as
small folios.
Therefore, this potential loss is one-time.

>
> Thanks,
> Usama
>
> > We also previously experimented with maintaining multiple buffers for
> > decompressed
> > large blocks in zRAM, allowing upcoming do_swap_page() calls to use the=
m when
> > falling back to small folios. In this setup, the buffers achieved a
> > high hit rate, though
> > I don=E2=80=99t recall the exact number.
> >
> > I'm concerned that this fault-around-like fallback to nr_pages small
> > folios may not
> > gain traction upstream. Do you have any suggestions for improvement?
> >
> >>
> >>> -v2:
> >>>  While it is not mature yet, I know some people are waiting for
> >>>  an update :-)
> >>>  * Fixed some stability issues.
> >>>  * rebase againest the latest mm-unstable.
> >>>  * Set default order to 2 which benefits all anon mTHP.
> >>>  * multipages ZsPageMovable is not supported yet.
> >>>
> >>> Tangquan Zheng (2):
> >>>   mm: zsmalloc: support objects compressed based on multiple pages
> >>>   zram: support compression at the granularity of multi-pages
> >>>
> >>>  drivers/block/zram/Kconfig    |   9 +
> >>>  drivers/block/zram/zcomp.c    |  17 +-
> >>>  drivers/block/zram/zcomp.h    |  12 +-
> >>>  drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++-=
--
> >>>  drivers/block/zram/zram_drv.h |  45 ++++
> >>>  include/linux/zsmalloc.h      |  10 +-
> >>>  mm/Kconfig                    |  18 ++
> >>>  mm/zsmalloc.c                 | 232 +++++++++++++-----
> >>>  8 files changed, 699 insertions(+), 94 deletions(-)
> >>
> >> --
> >> Best Regards,
> >> Huang, Ying
> >

Thanks
barry