From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A776EC282DE for ; Thu, 13 Mar 2025 09:30:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05725280003; Thu, 13 Mar 2025 05:30:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2284280001; Thu, 13 Mar 2025 05:30:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7541280003; Thu, 13 Mar 2025 05:30:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B208B280001 for ; Thu, 13 Mar 2025 05:30:21 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 540E11A1D9C for ; Thu, 13 Mar 2025 09:30:22 +0000 (UTC) X-FDA: 83216007084.07.402B6EB Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf08.hostedemail.com (Postfix) with ESMTP id 38F3E160017 for ; Thu, 13 Mar 2025 09:30:20 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XPazDULJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741858220; a=rsa-sha256; cv=none; b=vBrATn+bc3ORbIV6dy3oeKOJUyZuh9sq0GKVQX6SZpu0I7sRxfIYXjZYqQtw/744S9eWyD BoAzejq2+T83irVd5WFqklcCBnj/g+c3FR9R60MunGS/QnkIio7mtq2PzODZZ4U0m4Gois MDaa5Z/l2QqvBzXeO8a4DbxL8dERnp0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XPazDULJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741858220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DF1eXiWA++7dVvFQ387XbvvBNFJ2uiLqvlepOnibQqU=; b=BP6e2e+leJQEHULLRpCz9p/dmc0zZG4G6eorMw/xMqB62XwqBkEKFjz0bE2svYxAAJguff Q3MDwLBMlchUKzDub5g4NZ2miEp9MLDpJpYg9eTyz2Fk0BRCZ9xylOVm7DmobSDDMpqbll SoydGxx4vsLUROxs1EpAG0Yth0vjAYs= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-223fb0f619dso13069775ad.1 for ; Thu, 13 Mar 2025 02:30:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741858219; x=1742463019; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DF1eXiWA++7dVvFQ387XbvvBNFJ2uiLqvlepOnibQqU=; b=XPazDULJqt37JvarkLFXANPpOxzspOdtnx98pv9iyg7l6lUe9AZwZz2Ywx4QeUNHHi QWwkTjM2+4S2znYmz4i8frbrLgiedNLQEa/yqaA+yVl7EFQm3VBT1kcJ2/rgGb1ThiZF /9GS6CYHSHWXC4Fh8I33RI9WCtZV1NsjzMW7ufu9e/pqJRvBj5XhVs++ZS94OGr64jrw 9P6G/4pInhanmfW9pAbgRlwJCox+QFf5M1YE3qA6pAlGTxzDf29bYsPAGT+nce0tdHCZ RZBCMqfyX+aog0UY5Rq/4igWzueYnsiZBJHZZZuO5p7U8DYrCvFhSk3fKIkoLNNxr/ov Xcog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741858219; x=1742463019; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DF1eXiWA++7dVvFQ387XbvvBNFJ2uiLqvlepOnibQqU=; b=iKT+19WWmst/NezSV88c8mQ0wbFFCjXfmJ1dRgc8iacQtYdcK9Po9wjMqBhpj8ZfjZ 1UgNzca40tbAVBiLozuC7py3G1nwiELnO+TRR7n1GiMlC3pEwJcEba/EWKiNlDZBDaYl yfN3U0rqe2pKzjXe1730w91NyjK80K23Tdsc1wfsjRYEnJ3hSIeCXstWRBgXYx+8UaXX t+7JKfqEGgqPw3iU0zWW4kQFJTrWyvHads1E2hham8VyZ8bLrJNZWUBWo3jELAd22p9T Vmuhz/fz7XLdiCHx7k0q64/D5+8IBwOIX1YuxumNE5yrf8IbbzOLYdLR0eyQJXOn/K+6 HVRA== X-Forwarded-Encrypted: i=1; AJvYcCURbi3k/7hkOrjZeroFjzTdhjCKqHaPryUveaI6QRh9NKWO5UIoze/3RMwHZKGDfE0ja+YwLsWZLA==@kvack.org X-Gm-Message-State: AOJu0YxHYS/ZsE34YpshcJ9aW1aKSwRwEq61jIjsaMY/TTI3iBuMjhRv tOzV5WBeujBRgedNmngAszDF3ETQKK0Vgj3nsEdvREvj4nYsLjmtJ7FScQlM X-Gm-Gg: ASbGncsd7ViwZi9YwOd0gU9imsAQSSePzGWi2loFpW2KOTzFChTfvMXOcQ/vjOnUr1r VERRA5wIl8p7HKdcrl5eaVUA8ZGPLnR7+LJmMQCYGOZKoOq2R/eN1MyCsxCD/+W89V3zyfFXo3B Tj/ImqGrZH30Y/aLMxEAny8r2k/ALk2gCm3hvB3mikMlO+ad59EFQw6eo0mbbVNazB3cBBWchCw jkAncpNMUtj6AdkNkUsBFJBPrPuPtWHOGK5vY8kB0xv0S9EbdYxWu6OSDHbBFUOfQ2WlVr5iqLu 82doPZRzXW8GZglBMc+LgxH6yAguhh2EjjXpfZJOc3GqDrBeUgsEp26w6c6Lkg== X-Google-Smtp-Source: AGHT+IENeSJ/IswzZ3DorkAIV3dT3SU4Nhv66RaT71103q97yi/XYFYo7f6+vWyZvR7l0in1am2n/w== X-Received: by 2002:a05:6a00:194c:b0:730:79bf:c893 with SMTP id d2e1a72fcca58-736eb7a0176mr14602942b3a.4.1741858218860; Thu, 13 Mar 2025 02:30:18 -0700 (PDT) Received: from Barrys-MBP.hub ([118.92.30.135]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-73711694e34sm927243b3a.140.2025.03.13.02.30.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 02:30:18 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: minchan@kernel.org, qun-wei.lin@mediatek.com, senozhatsky@chromium.org, nphamcs@gmail.com Cc: 21cnbao@gmail.com, akpm@linux-foundation.org, andrew.yang@mediatek.com, angelogioacchino.delregno@collabora.com, axboe@kernel.dk, casper.li@mediatek.com, chinwen.chang@mediatek.com, chrisl@kernel.org, dan.j.williams@intel.com, dave.jiang@intel.com, ira.weiny@intel.com, james.hsu@mediatek.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org, linux-mm@kvack.org, matthias.bgg@gmail.com, nvdimm@lists.linux.dev, ryan.roberts@arm.com, schatzberg.dan@gmail.com, viro@zeniv.linux.org.uk, vishal.l.verma@intel.com, ying.huang@intel.com Subject: Re: [PATCH 0/2] Improve Zram by separating compression context from kswapd Date: Thu, 13 Mar 2025 22:30:05 +1300 Message-Id: <20250313093005.13998-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: erofs1bujmtk877fkr7ekyc89yia1zxw X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 38F3E160017 X-Rspam-User: X-HE-Tag: 1741858220-593302 X-HE-Meta: U2FsdGVkX1+i1iHnyb5PuLIVxH8N2gZt1i9WLL3tfOk6c4pO6/qbgiO4g4LirjaYu8iK8CkLeay0sIl6rdid/bnG6Its6Zt2lgIxUOOHborKcHrbCLkQ9MzusA25XThP4kvB4HoDwKiaRsMcMYwqKwp6cKrrb1/CGf6LZ9gxePr+DrE3+JDF+CjXqFhSDqt5XCCVsJNYF4C/8/NvcLj8sN/xPXMDWvt650p9f/EIv1OZcpylSTPb9W84W0Cg8q8DqFTpyP8lkL7L8HMPYjHT1krORFZTn4snbSWJ5nYURGl1hgKEUgyneSUyLyhO1VrhCZfru7JAdQv6ZhGfGi0QIjJjLkhdfM6NVAxK/liW1X1u6Qmes+poFJw9kaj9zunMOC8hicchMS5huyf24XDPt2vvLp/a6ifAOPj/w+2Se1JdD9E78Fv82YzGtxg/uaQB94T5p74I69i0Kazhp6oA7v5UVCJxCHs/74Sa9t8Fp1Hzd1mYHfsrO1t8PUMoFhVTF9nIlGJGO7Qr6dQYm5EA93XE5e2jPdZ38xC8FFqzZVOE6l0ml5WWy4A0qypu54fGt2Et+Or7YJELm0mDCCBqgEhVNffbQveG7TTasdI0NsSxwm2x7EYluGED5PUWOgZ/l4lf6+DBDHdXDXP9eZU9MGRe74W2X/lqx2ZF8OHW9sg3ya9P92t/CoUeXHKPytlHr2fugTDcfe345kAWJgMye5gBqcylgwE3fdBrmzgESqTDNcTvSNx3pLEyqRqnAMSv2jsTlOS/Ke8EXWtBB+QikAGhbgzggurU2+URfQ4ZOth0NBuWq4XuCDNQwn+D3Czwur+xZCVhWAuzxKJGXxjpxwINd8QBzlhssn7h0ac9CB6xdeGJZfuvUBQ8toDQdeAFe4tUGb7CxjMGCVbzIDktWcO51p9aabJ+ZTSneGVLXmrVLkoRbugVGDmGbI3l0yQH3HpgSz6Xh3d7Et9tCpw bjT4G7Xy 6k6Idj+09KcLW8g27JqAV6ULGm+OCMw9R0CedQKaS+k2RvzFGk4a6xC4/d1iOAbQH57JQNGNMzFCNz1cQHfnh0/QdcWBdI6Yka7LlNtI5S8/lvA1OG6kLT3Ckiu1i+x02d96EqLFTSuwSx4GN5n9CsFn9H53naXjKa07EY1o42KnMmdSmDFnwqD4IhjNpUjZ+MvMly89FN1nsnu8v8dnmDohUt5KyvEDyZhWL6HyDWnTGa/cfixfcxd99n5Qy17euaoY/xH5lInabxFwfFRqqLHSi2hyZHV/UIAMJRingonP+KcNHL8/O5fB51+vU1+DqDhzAuukt3CeddJdVxWecwpACWPt/xhfeoQTVBfyiT4wXngrcfqYoJ7aB1QiBY78SkgPo2ZpFBEJlCO9mEyHs6nYrsTYbVuLx9f3pMGZnF0Dg3SsUQeI3cGXbjLj0dbOt2MGRfzE2jYRXUPQpZu5GaIusHPWNKFZq6POqS+bJKgKQSWd3BfK+6k/T8qwVs85nMyUC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 13, 2025 at 4:52 PM Barry Song <21cnbao@gmail.com> wrote: > > On Thu, Mar 13, 2025 at 4:09 PM Sergey Senozhatsky > wrote: > > > > On (25/03/12 11:11), Minchan Kim wrote: > > > On Fri, Mar 07, 2025 at 08:01:02PM +0800, Qun-Wei Lin wrote: > > > > This patch series introduces a new mechanism called kcompressd to > > > > improve the efficiency of memory reclaiming in the operating system. The > > > > main goal is to separate the tasks of page scanning and page compression > > > > into distinct processes or threads, thereby reducing the load on the > > > > kswapd thread and enhancing overall system performance under high memory > > > > pressure conditions. > > > > > > > > Problem: > > > > In the current system, the kswapd thread is responsible for both > > > > scanning the LRU pages and compressing pages into the ZRAM. This > > > > combined responsibility can lead to significant performance bottlenecks, > > > > especially under high memory pressure. The kswapd thread becomes a > > > > single point of contention, causing delays in memory reclaiming and > > > > overall system performance degradation. > > > > > > Isn't it general problem if backend for swap is slow(but synchronous)? > > > I think zram need to support asynchrnous IO(can do introduce multiple > > > threads to compress batched pages) and doesn't declare it's > > > synchrnous device for the case. > > > > The current conclusion is that kcompressd will sit above zram, > > because zram is not the only compressing swap backend we have. > > also. it is not good to hack zram to be aware of if it is kswapd > , direct reclaim , proactive reclaim and block device with > mounted filesystem. > > so i am thinking sth as below > > page_io.c > > if (sync_device or zswap_enabled()) > schedule swap_writepage to a separate per-node thread > Hi Qun-wei, Nhat, Sergey and Minchan, I managed to find some time to prototype a kcompressd that supports both zswap and zram, though it has only been build-tested. Hi Qun-wei, Apologies, but I’m quite busy with other tasks and don’t have time to debug or test it. Please feel free to test it. When you submit v2, you’re welcome to keep yourself as the author of the patch as v1. If you’re okay with it, you can also add me as a co-developer in the changelog. The below prototype, I'd rather start with a per-node thread approach. While this might not provide the greatest benefit, it carries the least risk and helps avoid complex questions, such as how to determine the number of threads. - And we have actually observed a significant reduction in allocstall by using a single thread to asynchronously handle kswapd's compression as I reported. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index dbb0ad69e17f..4f9ee2fb338d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -23,6 +23,7 @@ #include #include #include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -1389,6 +1390,11 @@ typedef struct pglist_data { int kswapd_failures; /* Number of 'reclaimed == 0' runs */ +#define KCOMPRESS_FIFO_SIZE 256 + wait_queue_head_t kcompressd_wait; + struct task_struct *kcompressd; + struct kfifo kcompress_fifo; + #ifdef CONFIG_COMPACTION int kcompactd_max_order; enum zone_type kcompactd_highest_zoneidx; diff --git a/mm/mm_init.c b/mm/mm_init.c index 281802a7a10d..8cd143f59e76 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1410,6 +1410,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) pgdat_init_kcompactd(pgdat); init_waitqueue_head(&pgdat->kswapd_wait); + init_waitqueue_head(&pgdat->kcompressd_wait); init_waitqueue_head(&pgdat->pfmemalloc_wait); for (i = 0; i < NR_VMSCAN_THROTTLE; i++) diff --git a/mm/page_io.c b/mm/page_io.c index 4bce19df557b..7bbd14991ffb 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -233,6 +233,33 @@ static void swap_zeromap_folio_clear(struct folio *folio) } } +static bool swap_sched_async_compress(struct folio *folio) +{ + struct swap_info_struct *sis = swp_swap_info(folio->swap); + int nid = numa_node_id(); + pg_data_t *pgdat = NODE_DATA(nid); + + if (unlikely(!pgdat->kcompressd)) + return false; + + if (!current_is_kswapd()) + return false; + + if (!folio_test_anon(folio)) + return false; + /* + * This case needs to synchronously return AOP_WRITEPAGE_ACTIVATE + */ + if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) + return false; + + sis = swp_swap_info(folio->swap); + if (zswap_is_enabled() || data_race(sis->flags & SWP_SYNCHRONOUS_IO)) + return kfifo_in(&pgdat->kcompress_fifo, folio, sizeof(folio)); + + return false; +} + /* * We may have stale swap cache pages in memory: notice * them here and get rid of the unnecessary final write. @@ -275,6 +302,15 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) */ swap_zeromap_folio_clear(folio); } + + /* + * Compression within zswap and zram might block rmap, unmap + * of both file and anon pages, try to do compression async + * if possible + */ + if (swap_sched_async_compress(folio)) + return 0; + if (zswap_store(folio)) { count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); folio_unlock(folio); @@ -289,6 +325,38 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) return 0; } +int kcompressd(void *p) +{ + pg_data_t *pgdat = (pg_data_t *)p; + struct folio *folio; + struct writeback_control wbc = { + .sync_mode = WB_SYNC_NONE, + .nr_to_write = SWAP_CLUSTER_MAX, + .range_start = 0, + .range_end = LLONG_MAX, + .for_reclaim = 1, + }; + + while (!kthread_should_stop()) { + wait_event_interruptible(pgdat->kcompressd_wait, + !kfifo_is_empty(&pgdat->kcompress_fifo)); + + if (kthread_should_stop()) + break; + while(!kfifo_is_empty(&pgdat->kcompress_fifo)) { + if (kfifo_out(&pgdat->kcompress_fifo, &folio, sizeof(folio))) { + if (zswap_store(folio)) { + count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); + folio_unlock(folio); + return 0; + } + __swap_writepage(folio, &wbc); + } + } + } + return 0; +} + static inline void count_swpout_vm_event(struct folio *folio) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/swap.h b/mm/swap.h index 0abb68091b4f..38d61c6a06f1 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -21,6 +21,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); void __swap_writepage(struct folio *folio, struct writeback_control *wbc); +int kcompressd(void *p); /* linux/mm/swap_state.c */ /* One swap address space for each 64M swap space */ @@ -198,6 +199,11 @@ static inline int swap_zeromap_batch(swp_entry_t entry, int max_nr, return 0; } +static inline int kcompressd(void *p) +{ + return 0; +} + #endif /* CONFIG_SWAP */ #endif /* _MM_SWAP_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 2bc740637a6c..ba0245b74e45 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -7370,6 +7370,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) void __meminit kswapd_run(int nid) { pg_data_t *pgdat = NODE_DATA(nid); + int ret; pgdat_kswapd_lock(pgdat); if (!pgdat->kswapd) { @@ -7383,7 +7384,23 @@ void __meminit kswapd_run(int nid) } else { wake_up_process(pgdat->kswapd); } + ret = kfifo_alloc(&pgdat->kcompress_fifo, + KCOMPRESS_FIFO_SIZE * sizeof(struct folio *), + GFP_KERNEL); + if (ret) + goto out; + pgdat->kcompressd = kthread_create_on_node(kcompressd, pgdat, nid, + "kcompressd%d", nid); + if (IS_ERR(pgdat->kcompressd)) { + pr_err("Failed to start kcompressd on node %d,ret=%ld\n", + nid, PTR_ERR(pgdat->kcompressd)); + pgdat->kcompressd = NULL; + kfifo_free(&pgdat->kcompress_fifo); + } else { + wake_up_process(pgdat->kcompressd); + } } +out: pgdat_kswapd_unlock(pgdat); } @@ -7402,6 +7419,11 @@ void __meminit kswapd_stop(int nid) kthread_stop(kswapd); pgdat->kswapd = NULL; } + if (pgdat->kcompressd) { + kthread_stop(pgdat->kcompressd); + pgdat->kcompressd = NULL; + kfifo_free(&pgdat->kcompress_fifo); + } pgdat_kswapd_unlock(pgdat); } > btw, ran the current patchset with one thread(not default 4) > on phones and saw 50%+ allocstall reduction. so the idea > looks like a good direction to go. > Thanks Barry