From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1D58C369D9 for ; Wed, 30 Apr 2025 17:06:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CE226B00D1; Wed, 30 Apr 2025 13:06:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57E176B00D7; Wed, 30 Apr 2025 13:06:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 446176B00DF; Wed, 30 Apr 2025 13:06:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 265356B00D1 for ; Wed, 30 Apr 2025 13:06:13 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CBC84B9B68 for ; Wed, 30 Apr 2025 17:06:13 +0000 (UTC) X-FDA: 83391338226.19.345A5DE Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf01.hostedemail.com (Postfix) with ESMTP id ED1244000F for ; Wed, 30 Apr 2025 17:06:11 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VWAjsfLr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746032772; a=rsa-sha256; cv=none; b=FB0pJ8ZQJmzX1p7J7s9DTJBxbDA18OsmREGwV/ggMcKMW739QXouWpc3IiXBXV9gkmWg/U JUHC7/GfiXQ2eIkxbqRCcYI+timgtfzheP2WThGCPsbPWC4suOxGkQyidhvViVFIeBAXsS TiJLMSXN6kz36pUnWsQ8fIlkxOBcxkQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VWAjsfLr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746032772; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QJHez8HNwZpEakJ3GJ7Qf+N0NueKUDTHXJladqEDSg4=; b=g5Q3zk+pKilJvaOgHD3d7hJu7M3dRl0MErknjvqICWeaVhjFNGDzptRNpmo/QCHbziwipy f2+AzXjOuRS9lFz0/5MJLe9elEmgAtn8ycpV93CDnWk6kAQ2cBi1691AyClwhycyjq+3uN aQzEhmmLfevQPnfDvFEOJJJZjjAah/c= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6e8f6970326so1814266d6.0 for ; Wed, 30 Apr 2025 10:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746032771; x=1746637571; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QJHez8HNwZpEakJ3GJ7Qf+N0NueKUDTHXJladqEDSg4=; b=VWAjsfLrjw4q/Bi1NW5w3EjSBbL7DN/4btNp48PN5eYGTiu0ztafT1qL4LHgksOr9i XYd8UmDcNO83wkyJfRTRYUTmz9J87R79rOisrU+7PTXLZCoUT+HcgLpJKW7XSXyrEecr lId8UBCvguo19VMIEcKT/0zodyRe39cr4PfOU38UgrSvObLAL10JMeFK3w5IOrKSMe/O RvBZCJHTlimnj8JMj70PB2ILIpkin1S25aL/rwCTrgT49acONvd76NggRmYAzH88TMp5 4wSTfgWeLG9H5VgifOFA0PWg66MzgJQ3/opEl5K126rbaqubcGhE+KaaAHrj/CdB90Jj vfdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746032771; x=1746637571; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QJHez8HNwZpEakJ3GJ7Qf+N0NueKUDTHXJladqEDSg4=; b=sF3EYi1xktvWvPEmcTl1hO8EWzLW50Y6/7J0Bl7TgRdJn9krhCGarP6XfaYp2HqBq0 wt2POXfgkB2YjRo4jQWwEBDXNOui0cJZXKGdCCcYjNsOOwd0DlCNtT2JZMFL+qXS9eqY ZkkwqplKb01ROCgKK5iRSa7i5JJcj5rJ8U06fgiwcNaSicXPs87Fx5kdiREPAinXnqFA 7xmdukitciBbkm62/4zc0gVN5cO6VWW/BLeBRyDWoCgHzUGXbxxJXsg4Q7ouR4oV8Qmv 9O5Lkx8c26dMJCVWujL855JFOoxaaAa7ebxI1N4NE6LalJZhctIqBPOF5e1157Wyst4v osPw== X-Forwarded-Encrypted: i=1; AJvYcCWEMyl538QiJBD+7sPMy4fjvrnooflXmtVcvhCad9oYh5xnoc1e/f/11MacfA2bx3A97AJHg4/O+w==@kvack.org X-Gm-Message-State: AOJu0Yy3/Z4BaZ0ZLFnfIQtLr2NFBz/8AGTIUBu7h4hX4V6u5pJTNQZI t0rtLCsr6hFgRGpvtPSzrG5pPyYDwCJkao4Odvap0tF3PUVKqT5D00Pln0uMnTqY+3g/NVgAd6U RXIKRt0os8XSKhzaSGIKP2Zb+/vA= X-Gm-Gg: ASbGnctC4i4TmOkdQYlqy9mZz2JJqEsfKXzI2I4uS0+i8+cwMLjQnrOqhvRytO6AR8L uZeFXWjOur+6ZVjecnn/1ySGFgd67kjb2Ka3UjLTKUID36IJ5/eytQERTHMgcQnKzNVWluYscio soIXue+nCuGoUh+vkyIIdxRVV8kfjPcO8XSQ== X-Google-Smtp-Source: AGHT+IG0E4wID2HPpTmHxUm2md1g3OULl4QwRgKw/Yz/dJz8ojI3h6U7Tg4mhPuA4opVEeJPKJ9IXACDI4cYf214WjM= X-Received: by 2002:a05:6214:1bc7:b0:6e6:591b:fa62 with SMTP id 6a1803df08f44-6f4fce40dc6mr67144826d6.5.1746032770591; Wed, 30 Apr 2025 10:06:10 -0700 (PDT) MIME-Version: 1.0 References: <20250430082651.3152444-1-qun-wei.lin@mediatek.com> In-Reply-To: <20250430082651.3152444-1-qun-wei.lin@mediatek.com> From: Nhat Pham Date: Wed, 30 Apr 2025 10:05:59 -0700 X-Gm-Features: ATxdqUHgRrr3TynCPFETJ0RePkjNkdgQExomlSQnpz_nUQy5C6ldMmhK-GnUrFU Message-ID: Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression To: Qun-Wei Lin Cc: Andrew Morton , Mike Rapoport , Matthias Brugger , AngeloGioacchino Del Regno , Sergey Senozhatsky , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Casper Li , Chinwen Chang , Andrew Yang , James Hsu , Barry Song <21cnbao@gmail.com>, Joshua Hahn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: ED1244000F X-Stat-Signature: jen4apfqxs6ccsdhbw9kgdq16fbzi7ek X-HE-Tag: 1746032771-335170 X-HE-Meta: U2FsdGVkX18hoFBC4hbjybcux6jYqj5a3auCDv8QrK1tYQQIvdS04lX4Kp+sgwfwskzMKDwKVlzggWK7Ctr1bD/smmId+zMrXd6mGGZ5M/k1W4/nPaWku9BgNd/r4otaiev8Su+mSk6RTW9PpMeKsGqt5D0t3RAB7cQgu2EjGEJeSf5TQl8H+3Re0UxhtFirRNzbrLc1x30Eb7HMi462d9asJ0PXonlVIMMw+O85b4DL5yeJrgVa/Co/hCKsih5A4M2C9tWyXjnopXyVjvYS7PDMFXSqSvGVYLIXEbiGBCjl4NRtcQ5iPcc1TZlcQdm44QWIAb6LATB3vuTb/i+ZBu+uPUyy0l8qkVvtvbWw2+y1Q0iQtJyTQd34qirMlNlr0LWchY1MClir/CSoqK5tn3TwQSyYT3VBjtZtUY50f+CM2RA+lwWPzu2H6iIKfj6/VtYI6AsYZXt1GybiZAEfushzGIVv4bYmNRPwfma5E6k/mU75ZpYqIr6GZHNFxSpnNDXEJm3Xd/WQbkz4UaMy+Qkcb5UHzFvbhJ2sMGZ+JjzSCn8NPlf2qjL3p1Wi+cUKpBH1CBRZc3ySB03Inhb+0x8IOgbH7caJgt0sYi+NQvTonS/MeN648n+pKycVrM2wTrBlUW3QOOgYzdFvhiWsRR6uTVMUBT4nXdQqKA5FGQ+3GOgQ/v6fiGQbbZTEi/Kgt82t9KSVtrvQsbsq4diSxej7s55DN6vV526rpI3+FApFN4wh0PJh87VgY4FAVptQCAwC8B75zUHn2MrOnpkqM/stLPJmET/ShJru0I8wiljHZC0MjkiZfh9dU41G4AmsPx9raoOKt/Gbfb+c9fMS6QWIxAu8XRk3IUvwutu/WE6uHH3Yc20whxxR8dquQI8PCk5yRXvIVNsYpMpzlwOMsnO9W2qRGK8OlW6oAcDsLIGd+WYmQJwkbreaL0VZSVsvvWLzqaQqVTMZpmjiiYd Z1iZJJyG +RjJGtyr8iPUkvZ/edGQxFQL3qdYa6qkiyIJabmxSy+oywoNx+IuVJYu+faGqM/BlU4q8MUBA5wtqp/9pj7kwNJkosuKpPcrRL8Qm+XcvFXO3zKGIy+/5lKhEXAwAJ6l8cUkbXXxUOaE7bbCTK27aoo6Yv+WcV6YS+NQBbaAqfLvC8aJHa88/3R8V+ViUZTE5DSl8oF/qiIBAL4IdtTtKM9igpFzcBxNOofES2oUhuVBdzpagvHphjVYJF2WPpcAQhxaSlMTE6DI+0ej1p8n2s2ka5xONDhR0KWUBIipUk2LhTne4jiaeHmT7VvCh1oOKS9QYa4zAdsAeKYxRYipa5ZoM21fCJbuyjhsIkurm/rLhW2BRe4KQyW/J9ot89mS2FtyrDrf42NyQTYizoqPuczoJ5OL3SBifVRTGx2MjEUwqIidxjOeaQ7iG/NLqjbUTdBqEWJeVNE1dgpzJn6a08zjLBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 30, 2025 at 1:27=E2=80=AFAM Qun-Wei Lin wrote: > > This patch series introduces a new mechanism called kcompressd to > improve the efficiency of memory reclaiming in the operating system. > > Problem: > In the current system, the kswapd thread is responsible for both scanni= ng > the LRU pages and handling memory compression tasks (such as those > involving ZSWAP/ZRAM, if enabled). This combined responsibility can lea= d > to significant performance bottlenecks, especially under high memory > pressure. The kswapd thread becomes a single point of contention, causi= ng > delays in memory reclaiming and overall system performance degradation. > > Solution: > Introduced kcompressd to handle asynchronous compression during memory > reclaim, improving efficiency by offloading compression tasks from > kswapd. This allows kswapd to focus on its primary task of page reclaim > without being burdened by the additional overhead of compression. > > In our handheld devices, we found that applying this mechanism under high > memory pressure scenarios can increase the rate of pgsteal_anon per secon= d > by over 260% compared to the situation with only kswapd. Additionally, we > observed a reduction of over 50% in page allocation stall occurrences, > further demonstrating the effectiveness of kcompressd in alleviating memo= ry > pressure and improving system responsiveness. > > Co-developed-by: Barry Song <21cnbao@gmail.com> > Signed-off-by: Barry Song <21cnbao@gmail.com> > Signed-off-by: Qun-Wei Lin > Reference: Re: [PATCH 0/2] Improve Zram by separating compression context= from kswapd - Barry Song > https://lore.kernel.org/lkml/20250313093005.13998-1-21cnbao@gm= ail.com/ > --- > include/linux/mmzone.h | 6 ++++ > mm/mm_init.c | 1 + > mm/page_io.c | 71 ++++++++++++++++++++++++++++++++++++++++++ > mm/swap.h | 6 ++++ > mm/vmscan.c | 25 +++++++++++++++ > 5 files changed, 109 insertions(+) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 6ccec1bf2896..93c9195a54ae 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > #include > > /* Free memory management - zoned buddy allocator. */ > @@ -1398,6 +1399,11 @@ typedef struct pglist_data { > > int kswapd_failures; /* Number of 'reclaimed =3D=3D 0'= runs */ > > +#define KCOMPRESS_FIFO_SIZE 256 > + wait_queue_head_t kcompressd_wait; > + struct task_struct *kcompressd; > + struct kfifo kcompress_fifo; > + > #ifdef CONFIG_COMPACTION > int kcompactd_max_order; > enum zone_type kcompactd_highest_zoneidx; > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 9659689b8ace..49bae1dd4584 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -1410,6 +1410,7 @@ static void __meminit pgdat_init_internals(struct p= glist_data *pgdat) > pgdat_init_kcompactd(pgdat); > > init_waitqueue_head(&pgdat->kswapd_wait); > + init_waitqueue_head(&pgdat->kcompressd_wait); > init_waitqueue_head(&pgdat->pfmemalloc_wait); > > for (i =3D 0; i < NR_VMSCAN_THROTTLE; i++) > diff --git a/mm/page_io.c b/mm/page_io.c > index 4bce19df557b..d85deb494a6a 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -233,6 +233,38 @@ static void swap_zeromap_folio_clear(struct folio *f= olio) > } > } > > +static bool swap_sched_async_compress(struct folio *folio) > +{ > + struct swap_info_struct *sis =3D swp_swap_info(folio->swap); > + int nid =3D numa_node_id(); > + pg_data_t *pgdat =3D NODE_DATA(nid); > + > + if (unlikely(!pgdat->kcompressd)) > + return false; > + > + if (!current_is_kswapd()) > + return false; > + > + if (!folio_test_anon(folio)) > + return false; > + /* > + * This case needs to synchronously return AOP_WRITEPAGE_ACTIVATE > + */ > + if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) > + return false; Ah, this is unfortunate. At this point, we do not know whether the page is compressible yet. If we decide to perform async compression here, and the page is incompressible, and we disable zswap writeback, we risk not being able to activate it down the line, making it more likely that we try it again too soon :( Hopefully we can remove this limitation, when Joshua's work to store incompressible pages in the zswap LRU lands. Then, even if the page is incompressible, we won't retry it and just put it in the zswap LRU... > + > + sis =3D swp_swap_info(folio->swap); There's a slight hitch here. Upstream-wise, zswap differs slightly from zram: it is cgroup-controlled. zswap can be disabled on a per-cgroup basis. This is useful, for e.g, when we know for certain that a workload's data are not compressible, and/or they are not latency-sensitive so might as well use disk swap. If the folio's cgroup has its zswap limit reached/disables zswap, then we should fallback to disk swapping right away, instead of holding the page. I think we should check it here. Maybe add a mem_cgroup_may_zswap() helper (see obj_cgroup_may_zswap() for implementation details - should be a simple-ish refactor), and check here, in addition to zswap_is_enabled() check? Something like: if ((zswap_is_enabled() && mem_cgroup_may_zswap(folio_memcg(folio))) || data_race(sis->flags & SWP_SYNCHRONOUS_IO)) Does that sound reasonable, Qun-Wei and Barry? > + if (zswap_is_enabled() || data_race(sis->flags & SWP_SYNCHRONOUS_= IO)) { > + if (kfifo_avail(&pgdat->kcompress_fifo) >=3D sizeof(folio= ) && > + kfifo_in(&pgdat->kcompress_fifo, &folio, sizeof(f= olio))) { > + wake_up_interruptible(&pgdat->kcompressd_wait); > + return true; > + } > + } > + > + return false; > +} > + > /* > * We may have stale swap cache pages in memory: notice > * them here and get rid of the unnecessary final write. > @@ -275,6 +307,15 @@ int swap_writepage(struct page *page, struct writeba= ck_control *wbc) > */ > swap_zeromap_folio_clear(folio); > } > + > + /* > + * Compression within zswap and zram might block rmap, unmap > + * of both file and anon pages, try to do compression async > + * if possible > + */ > + if (swap_sched_async_compress(folio)) > + return 0; > + > if (zswap_store(folio)) { > count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); > folio_unlock(folio); > @@ -289,6 +330,36 @@ int swap_writepage(struct page *page, struct writeba= ck_control *wbc) > return 0; > } > > +int kcompressd(void *p) > +{ > + pg_data_t *pgdat =3D (pg_data_t *)p; > + struct folio *folio; > + struct writeback_control wbc =3D { > + .sync_mode =3D WB_SYNC_NONE, > + .nr_to_write =3D SWAP_CLUSTER_MAX, > + .range_start =3D 0, > + .range_end =3D LLONG_MAX, > + .for_reclaim =3D 1, > + }; > + > + while (!kthread_should_stop()) { > + wait_event_interruptible(pgdat->kcompressd_wait, > + !kfifo_is_empty(&pgdat->kcompress_fifo)); > + > + while (!kfifo_is_empty(&pgdat->kcompress_fifo)) { > + if (kfifo_out(&pgdat->kcompress_fifo, &folio, siz= eof(folio))) { > + if (zswap_store(folio)) { > + count_mthp_stat(folio_order(folio= ), MTHP_STAT_ZSWPOUT); > + folio_unlock(folio); > + continue; > + } > + __swap_writepage(folio, &wbc); > + } > + } > + } > + return 0; > +} > + > static inline void count_swpout_vm_event(struct folio *folio) > { > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > diff --git a/mm/swap.h b/mm/swap.h > index 6f4a3f927edb..3579da413dc2 100644 > --- a/mm/swap.h > +++ b/mm/swap.h > @@ -22,6 +22,7 @@ static inline void swap_read_unplug(struct swap_iocb *p= lug) > void swap_write_unplug(struct swap_iocb *sio); > int swap_writepage(struct page *page, struct writeback_control *wbc); > void __swap_writepage(struct folio *folio, struct writeback_control *wbc= ); > +int kcompressd(void *p); > > /* linux/mm/swap_state.c */ > /* One swap address space for each 64M swap space */ > @@ -199,6 +200,11 @@ static inline int swap_zeromap_batch(swp_entry_t ent= ry, int max_nr, > return 0; > } > > +static inline int kcompressd(void *p) > +{ > + return 0; > +} > + > #endif /* CONFIG_SWAP */ > > #endif /* _MM_SWAP_H */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 3783e45bfc92..2d7b9167bfd6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -7420,6 +7420,7 @@ unsigned long shrink_all_memory(unsigned long nr_to= _reclaim) > void __meminit kswapd_run(int nid) > { > pg_data_t *pgdat =3D NODE_DATA(nid); > + int ret; > > pgdat_kswapd_lock(pgdat); > if (!pgdat->kswapd) { > @@ -7433,7 +7434,26 @@ void __meminit kswapd_run(int nid) > } else { > wake_up_process(pgdat->kswapd); > } > + ret =3D kfifo_alloc(&pgdat->kcompress_fifo, > + KCOMPRESS_FIFO_SIZE * sizeof(struct folio= *), > + GFP_KERNEL); > + if (ret) { > + pr_err("%s: fail to kfifo_alloc\n", __func__); > + goto out; > + } > + > + pgdat->kcompressd =3D kthread_create_on_node(kcompressd, = pgdat, nid, > + "kcompressd%d", nid); > + if (IS_ERR(pgdat->kcompressd)) { > + pr_err("Failed to start kcompressd on node %d=EF= =BC=8Cret=3D%ld\n", > + nid, PTR_ERR(pgdat->kcompressd)); > + pgdat->kcompressd =3D NULL; > + kfifo_free(&pgdat->kcompress_fifo); > + } else { > + wake_up_process(pgdat->kcompressd); > + } > } > +out: > pgdat_kswapd_unlock(pgdat); > } > > @@ -7452,6 +7472,11 @@ void __meminit kswapd_stop(int nid) > kthread_stop(kswapd); > pgdat->kswapd =3D NULL; > } > + if (pgdat->kcompressd) { > + kthread_stop(pgdat->kcompressd); > + pgdat->kcompressd =3D NULL; > + kfifo_free(&pgdat->kcompress_fifo); > + } > pgdat_kswapd_unlock(pgdat); > } > > -- > 2.45.2 >