From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2510CC282DE
	for <linux-mm@archiver.kernel.org>; Thu, 13 Mar 2025 20:37:20 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id BB82C280005; Thu, 13 Mar 2025 16:37:18 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B6684280001; Thu, 13 Mar 2025 16:37:18 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A0882280005; Thu, 13 Mar 2025 16:37:18 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 871EB280001
	for <linux-mm@kvack.org>; Thu, 13 Mar 2025 16:37:18 -0400 (EDT)
Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 1E6CB160602
	for <linux-mm@kvack.org>; Thu, 13 Mar 2025 20:37:19 +0000 (UTC)
X-FDA: 83217687798.22.779D912
Received: from mail-vk1-f181.google.com (mail-vk1-f181.google.com [209.85.221.181])
	by imf26.hostedemail.com (Postfix) with ESMTP id 16410140009
	for <linux-mm@kvack.org>; Thu, 13 Mar 2025 20:37:16 +0000 (UTC)
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=k6dFx2m9;
	spf=pass (imf26.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1741898237;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Dqu3z7xLiOSf+mcYINqf8wHjgnmfkNcGWDDBMVjmjy8=;
	b=wWl3XZKkGoSiQELgn7ntpVBXHpZ72dhG4f9pL+8FPwiTfi+Lu+XPPv5sxCC8RsO/wGA8X5
	OL84/+mM0YBTmUHvYNeFR9TGwd0KhqkIlLcQgEPLILrynJVco8RRyffU0xDD+jYHfvYKt8
	v7ZAO56KpmJ/ncFzCaOcFrBaW6tkrQU=
ARC-Authentication-Results: i=1;
	imf26.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=k6dFx2m9;
	spf=pass (imf26.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741898237; a=rsa-sha256;
	cv=none;
	b=FY/Cau2nXMNNhc/aS2j6pvh5fFTW7JROh3zYAW+WW0jdYYcsVlIRv2HSrFaa4Rz8S7c/YN
	mk4kpTAPhPa7xAgS/edJ+evevs14xEgm4/Mfpjm8Oawq4h55ZlUU1RRND5dCQiX3lr6J66
	TyY+m6y3rS+ldIxCmjjOVwS+WVZ0TVM=
Received: by mail-vk1-f181.google.com with SMTP id 71dfb90a1353d-5240a432462so1046439e0c.1
        for <linux-mm@kvack.org>; Thu, 13 Mar 2025 13:37:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741898236; x=1742503036; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Dqu3z7xLiOSf+mcYINqf8wHjgnmfkNcGWDDBMVjmjy8=;
        b=k6dFx2m9IyV3oJnYtbCrKBq6Xw10IR5pE2bbPso67pHGD1E4StyOHofBPnZ08vSTvQ
         aYl28VD8wT4qKRfoQHnjWC/QsxH130KyyekgtkmK3z12pg0DutX5T6RsEGfHCso+84ew
         pGEZsrt+ujGAlANczyumZ0vWTiekzFbZul4y/pgse8VwkiVRvte/fr19Sdz5B6VYAm2m
         1fD9HXulQHUN8ZUOxq16b6HgZnLBmQY1wmKJ6fY+dXcJxkIcAv7Q3hk2aztFZckMWfYh
         67BwfiEQ2pC0oOvfhRpyM7zP45sIDvlUfvbdMnD6lD0VvP7MQ88rLpU7KYAz5YJUU/km
         825w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741898236; x=1742503036;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Dqu3z7xLiOSf+mcYINqf8wHjgnmfkNcGWDDBMVjmjy8=;
        b=NpAU/eGG4Xp4SMjL+sFgcIriAIMyQAvXga6dqrMKYkh7t0kL1RXJ6Ka4U1NGFCrNKw
         HnDkhCqnE3ABwQe4et9jmgsrqeewwH/M1RCuqhwLRuCcG0xHA2eXMnDdI8YdqEbr1O/t
         I4txZ1dp2fJaI2TX5GBc0jaaVwtmXLMAPY7smegVvXHNcqVBgcraVhhT2GT/6hm4Uj3W
         ndhvGHgMF4EYm3AfYaxaPGrtxI6jZ5esskNN3LbIQNcnbRuzSdY1ui95aMrkmGXXi/Pe
         Oeo4REY+e3ONWm+marnxfvuYJYXrjBZZA6PygLDfVmA9v+veIYA+/+1I2pfiwUWWFZS3
         xVMQ==
X-Forwarded-Encrypted: i=1; AJvYcCXUbQ+dki3Z/SOl8t/B8Z/LtLZ0ZiNpIVOgUrdmK9ftlaUJRVSTEZ6W/12YaYtgec3adBqIjuBUEQ==@kvack.org
X-Gm-Message-State: AOJu0YwlFbVXJp/gxCSQrL5d65Gh0aYicV9bKiPYwRiMqP3zUOtu1FyA
	/ZvjZCIHfGFQx2ab0hMdkOuHYUAwBmun0bkhldEf4rHiu2wGMyrzBLSiny8uo+SrpjGy6nyvl6s
	ZX2ScxeiB1AfHf9D2FEARrvDUUu8=
X-Gm-Gg: ASbGncu1mGnDHaWP6KQ0buZKFa3euRfe8EHR0tmEAWVG8cSlh7Cg/EVf6gRqiJa38h1
	Vfk/yg5V2b/TLJg7JAjQor/kf1+l+5oacUg1Xrxg/cFQ50Rn4B61crZUcmT5Cp+UpdikByYz+/D
	+6Y892H2pr4DTrHhHRfkH6XChPkg==
X-Google-Smtp-Source: AGHT+IGhWPno/cBlLy5i4JFgfTgN4go/KloyhLNpBTEj6UkmZZkGZMKDNDN1ix3of11Oz/UJSScIn1QWnKmOTjaulGI=
X-Received: by 2002:a05:6122:3115:b0:523:6eef:af62 with SMTP id
 71dfb90a1353d-5243a4082e2mr4188850e0c.4.1741898235977; Thu, 13 Mar 2025
 13:37:15 -0700 (PDT)
MIME-Version: 1.0
References: <20250307120141.1566673-1-qun-wei.lin@mediatek.com>
 <Z9HOavSkFf01K9xh@google.com> <5gqqbq67th4xiufiw6j3ewih6htdepa4u5lfirdeffrui7hcdn@ly3re3vgez2g>
 <CAGsJ_4xwnVxn1odj=j+z0VXm1DRUmnhugnwCH-coqBLJweDu9Q@mail.gmail.com>
 <Z9MCwXzYDRJoTiIr@google.com> <CAGsJ_4yaSx1vEiZdCouBveeH3D-bQDdvrhRpz=Kbvqn30Eh-nA@mail.gmail.com>
 <Z9MWzDUxUigJrZXt@google.com>
In-Reply-To: <Z9MWzDUxUigJrZXt@google.com>
From: Barry Song <21cnbao@gmail.com>
Date: Fri, 14 Mar 2025 09:37:04 +1300
X-Gm-Features: AQ5f1JogIHzZmp31Tn9SRX0wI-8utb1OhKH-ruNw33einw0Y1Le7Sq_FlEtZGgo
Message-ID: <CAGsJ_4z9pSt=LdfDUmQ7YhNocE6CJGxEwighSygGZrDFSyKU+A@mail.gmail.com>
Subject: Re: [PATCH 0/2] Improve Zram by separating compression context from kswapd
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>, Qun-Wei Lin <qun-wei.lin@mediatek.com>, 
	Jens Axboe <axboe@kernel.dk>, Vishal Verma <vishal.l.verma@intel.com>, 
	Dan Williams <dan.j.williams@intel.com>, Dave Jiang <dave.jiang@intel.com>, 
	Ira Weiny <ira.weiny@intel.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Matthias Brugger <matthias.bgg@gmail.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>, Chris Li <chrisl@kernel.org>, 
	Ryan Roberts <ryan.roberts@arm.com>, "Huang, Ying" <ying.huang@intel.com>, 
	Kairui Song <kasong@tencent.com>, Dan Schatzberg <schatzberg.dan@gmail.com>, 
	Al Viro <viro@zeniv.linux.org.uk>, linux-kernel@vger.kernel.org, 
	linux-block@vger.kernel.org, nvdimm@lists.linux.dev, linux-mm@kvack.org, 
	linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, 
	Casper Li <casper.li@mediatek.com>, Chinwen Chang <chinwen.chang@mediatek.com>, 
	Andrew Yang <andrew.yang@mediatek.com>, James Hsu <james.hsu@mediatek.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Queue-Id: 16410140009
X-Rspamd-Server: rspam03
X-Stat-Signature: y7dtitffqc6x9go85kbwza6xdwzrmar7
X-HE-Tag: 1741898236-461205
X-HE-Meta: U2FsdGVkX1/VbqGAZI0fNsDvmhYzjQN0ubzYfiHl/6UbpY7HLsTf9AOv1/bck4qYXTWk4gj6gya7jlO6dvZcieldJH6LkCDGtizb+nsrTd6zxIweog22XSSc7x8WB/r61j2i5OL8AmeZSVS6xTzlQ7wlUZ57MLmFOGY/YMIY2JHLz68zTp4SFU7K1QPpQ9+DrQL3PeDTd4pa+FCUFSDHH/f8nxRzdskRZRNgDtMlq0Fwi61ZE8tqOZPDdl/X+0mDjbaEteERAAFg9BBbM00XNJeSqHVHTFkyiCVWTeWhXcZCr5EZ4zJqsM2nP45zMDHTeGI9+Pbjgw4Ue/a/N34eu8EsJkjGZifPY5DUYd3+YJola1Rwz2MW+BEmsp1X5k5Tlc53VhNjIwl5brr1SN3HV+ChLhbk+2h8scBUbMfh3vH15HQez1E12jYE8TpbkIo1Ha6g6Lo1g7q+n6dYSjwG01KP+8LFyvq/t8eac6SIpP3i57J9ILcY3cGRzuodCXjOvTF9SC7m++A6hSHsiK2pLFzKj4/bMnZe6YO0cCeWegfg3Uzg6QhlFS01gl7ZybWYXguBqieQOkUDF+eRGPZ72uk7G61gMnJWO8GUoqbxYW2eSdj//zvQuZQzzqjVn2u1AO4i9owVbET6viDpuyaUEkRNE3CEKhr/4R/1rRTyCta8IregGARuciRragYhvqHDKENYYYxORUiJyOZm+BsC1/tnpr+oMyU9VCmaUS5c8zeGf4tyKNNUcWkzCeCC0/OeqEE8QLE3eTKp/OS4E13l+/+g73UxEFoNq9WoyovLSO58rh4tfa8mU2QSZV3WWcKkFdo6EPKD5VAPh0swD5gjCEHOvft7pGAdI4QYY92mWmLYE7qsjLWF9ve/cBxNmg/SOwn3UIc6zoLTaPGPBvB199+2dIrO8nsAcb1EFMbhITh07PuRpPobQigy9b9G1qxulWORLJaUE6QMTIhczqb
 9X+OmZJC
 3ObI7shFOIRFRM+OVdlGJIc+ecaG2vYIbe7PyWHqqG7d/QeAj8cPV1eZZ2g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Fri, Mar 14, 2025 at 6:33=E2=80=AFAM Minchan Kim <minchan@kernel.org> wr=
ote:
>
> On Fri, Mar 14, 2025 at 05:58:00AM +1300, Barry Song wrote:
> > On Fri, Mar 14, 2025 at 5:07=E2=80=AFAM Minchan Kim <minchan@kernel.org=
> wrote:
> > >
> > > On Thu, Mar 13, 2025 at 04:45:54PM +1300, Barry Song wrote:
> > > > On Thu, Mar 13, 2025 at 4:09=E2=80=AFPM Sergey Senozhatsky
> > > > <senozhatsky@chromium.org> wrote:
> > > > >
> > > > > On (25/03/12 11:11), Minchan Kim wrote:
> > > > > > On Fri, Mar 07, 2025 at 08:01:02PM +0800, Qun-Wei Lin wrote:
> > > > > > > This patch series introduces a new mechanism called kcompress=
d to
> > > > > > > improve the efficiency of memory reclaiming in the operating =
system. The
> > > > > > > main goal is to separate the tasks of page scanning and page =
compression
> > > > > > > into distinct processes or threads, thereby reducing the load=
 on the
> > > > > > > kswapd thread and enhancing overall system performance under =
high memory
> > > > > > > pressure conditions.
> > > > > > >
> > > > > > > Problem:
> > > > > > >  In the current system, the kswapd thread is responsible for =
both
> > > > > > >  scanning the LRU pages and compressing pages into the ZRAM. =
This
> > > > > > >  combined responsibility can lead to significant performance =
bottlenecks,
> > > > > > >  especially under high memory pressure. The kswapd thread bec=
omes a
> > > > > > >  single point of contention, causing delays in memory reclaim=
ing and
> > > > > > >  overall system performance degradation.
> > > > > >
> > > > > > Isn't it general problem if backend for swap is slow(but synchr=
onous)?
> > > > > > I think zram need to support asynchrnous IO(can do introduce mu=
ltiple
> > > > > > threads to compress batched pages) and doesn't declare it's
> > > > > > synchrnous device for the case.
> > > > >
> > > > > The current conclusion is that kcompressd will sit above zram,
> > > > > because zram is not the only compressing swap backend we have.
> > >
> > > Then, how handles the file IO case?
> >
> > I didn't quite catch your question :-)
>
> Sorry for not clear.
>
> What I meant was zram is also used for fs backend storage, not only
> for swapbackend. The multiple simultaneous compression can help the case,
> too.

I agree that multiple asynchronous threads might transparently improve
userspace read/write performance with just one thread or a very few threads=
.
However, it's unclear how genuine the requirement is. On the other hand,
in such cases, userspace can always optimize read/write bandwidth, for
example, by using aio_write() or similar methods if they do care about
the read/write bandwidth.

Once the user has multiple threads (close to the number of CPU cores),
asynchronous multi-threading won't offer any benefit and will only result
in increased context switching. I guess that is caused by the fundamental
difference between zram and other real devices with hardware offloads -
that zram always relies on the CPU and operates synchronously(no
offload, no interrupt from HW to notify the completion of compression).

>
> >
> > >
> > > >
> > > > also. it is not good to hack zram to be aware of if it is kswapd
> > > > , direct reclaim , proactive reclaim and block device with
> > > > mounted filesystem.
> > >
> > > Why shouldn't zram be aware of that instead of just introducing
> > > queues in the zram with multiple compression threads?
> > >
> >
> > My view is the opposite of yours :-)
> >
> > Integrating kswapd, direct reclaim, etc., into the zram driver
> > would violate layering principles. zram is purely a block device
>
> That's the my question. What's the reason zram need to know about
> kswapd, direct_reclaim and so on? I didn't understand your input.

Qun-Wei's patch 2/2, which modifies the zram driver, contains the following
code within the zram driver:

+int schedule_bio_write(void *mem, struct bio *bio, compress_callback cb)
+{
+ ...
+
+        if (!nr_kcompressd || !current_is_kswapd())
+                 return -EBUSY;
+
+}

It's clear that Qun-Wei decided to disable asynchronous threading unless
the user is kswapd. Qun-Wei might be able to provide more insight on this
decision.

My guess is:

1. Determining the optimal number of threads is challenging due to varying
CPU topologies and software workloads. For example, if there are 8 threads
writing to zram, the default 4 threads might be slower than using all 8 thr=
eads
synchronously. For servers, we could have hundreds of CPUs.
On the other hand, if there is only one thread writing to zram, using 4 thr=
eads
might interfere with other workloads too much and cause the phone to heat u=
p
quickly.

2. kswapd is the user that truly benefits from asynchronous threads. Since
it handles asynchronous memory reclamation, speeding up its process
reduces the likelihood of entering slowpath / direct reclamation. This is
where it has the greatest potential to make a positive impact.

>
> > driver, and how it is used should be handled separately. Callers have
> > greater flexibility to determine its usage, similar to how different
> > I/O models exist in user space.
> >
> > Currently, Qun-Wei's patch checks whether the current thread is kswapd.
> > If it is, compression is performed asynchronously by threads;
> > otherwise, it is done in the current thread. In the future, we may
>
> Okay, then, why should we do that without following normal asynchrnous
> disk storage? VM justs put the IO request and sometimes congestion
> control. Why is other logic needed for the case?

It seems there is still some uncertainty about why current_is_kswapd()
is necessary, so let's get input from Qun-Wei as well.

Despite all the discussions, one important point remains: zswap might
also need this asynchronous thread. For months, Yosry and Nhat have
been urging the zram and zswap teams to collaborate on those shared
requirements. Having one per-node thread for each kswapd could be the
low-hanging fruit for both zswap and zram.

Additionally, I don't see how the prototype I proposed here [1] would confl=
ict
with potential future optimizations in zram, particularly those aimed at
improving filesystem read/write performance through multiple asynchronous
threads, if that is indeed a valid requirement.

[1] https://lore.kernel.org/lkml/20250313093005.13998-1-21cnbao@gmail.com/

>
> > have additional reclaim threads, such as for damon or
> > madv_pageout, etc.
> >
> > > >
> > > > so i am thinking sth as below
> > > >
> > > > page_io.c
> > > >
> > > > if (sync_device or zswap_enabled())
> > > >    schedule swap_writepage to a separate per-node thread
> > >
> > > I am not sure that's a good idea to mix a feature to solve different
> > > layers. That wouldn't be only swap problem. Such an parallelism under
> > > device  is common technique these days and it would help file IO case=
s.
> > >
> >
> > zswap and zram share the same needs, and handling this in page_io
> > can benefit both through common code. It is up to the callers to decide
> > the I/O model.
> >
> > I agree that "parallelism under the device" is a common technique,
> > but our case is different=E2=80=94the device achieves parallelism with
> > offload hardware, whereas we rely on CPUs, which can be scarce.
> > These threads may also preempt CPUs that are critically needed
> > by other non-compression tasks, and burst power consumption
> > can sometimes be difficult to control.
>
> That's general problem for common resources in the system and always
> trace-off domain in the workload areas. Eng folks has tried to tune
> them statically/dynamically depending on system behavior considering
> what they priorites.

Right, but haven't we yet taken on the task of tuning multi-threaded zram?

>
> >
> > > Furthermore, it would open the chance for zram to try compress
> > > multiple pages at once.
> >
> > We are already in this situation when multiple callers use zram simulta=
neously,
> > such as during direct reclaim or with a mounted filesystem.
> >
> > Of course, this allows multiple pages to be compressed simultaneously,
> > even if the user is single-threaded. However, determining when to enabl=
e
> > these threads and whether they will be effective is challenging, as it
> > depends on system load. For example, Qun-Wei's patch chose not to use
> > threads for direct reclaim as, I guess,  it might be harmful.
>
> Direct reclaim is already harmful and that's why VM has the logic
> to throttle writeback or other special logics for kswapd or direct
> reclaim path for th IO, which could be applied into the zram, too.

I'm not entirely sure that the existing congestion or throttled writeback
can automatically tune itself effectively with non-offload resources. For
offload resources, the number of queues and the bandwidth remain stable,
but for CPUs, they fluctuate based on changes in system workloads.

Thanks
Barry