From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E21C3ABBC for ; Mon, 12 May 2025 07:56:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA6896B00C9; Mon, 12 May 2025 03:56:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2FF86B00CA; Mon, 12 May 2025 03:56:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CD976B00CB; Mon, 12 May 2025 03:56:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7B06D6B00C9 for ; Mon, 12 May 2025 03:56:45 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB4E31207F9 for ; Mon, 12 May 2025 07:56:45 +0000 (UTC) X-FDA: 83433499170.29.0BE492F Received: from SHSQR01.spreadtrum.com (unknown [222.66.158.135]) by imf24.hostedemail.com (Postfix) with ESMTP id 5695218000C for ; Mon, 12 May 2025 07:56:42 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf24.hostedemail.com: domain of zhaoyang.huang@unisoc.com designates 222.66.158.135 as permitted sender) smtp.mailfrom=zhaoyang.huang@unisoc.com ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf24.hostedemail.com: domain of zhaoyang.huang@unisoc.com designates 222.66.158.135 as permitted sender) smtp.mailfrom=zhaoyang.huang@unisoc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747036604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=Pniiwde+MAAxxXXxCdae01g2P0+UaGcGVqo53fOieCA=; b=zDYsGD/bC05KidPMr54KWtBh6hZ0fBrcXtUYppB4jwKy+xZuNpiaSpPWZAfTWOxUjwgj4A gJWd8P1ySgjM4BBmULeEYsQ2mRgO7/UIXCdkb9Fqh6nsEDpo4+h2/V5MLUlj9+Ud2L4Pf4 JOvPlypUkQwQiICatWu07Gg6q4vegSs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747036604; a=rsa-sha256; cv=none; b=fpNnz6KzfZbV9l60qPGQW2B0jdpkCYDySheuDhQpKwk67C+Sc+0VHoAr3HYKqGJREam6xe 8PIqaXWOnvyr9Wq9G/kFG/IGgPBCBRVcdEpg7R+n+lUhgTsZ1YPoIxQLoW1AnHSG9Sad7w J0/p8IPV9S1Wi3TJaYjPFBHJg7tgE6k= Received: from dlp.unisoc.com ([10.29.3.86]) by SHSQR01.spreadtrum.com with ESMTP id 54C7u7LI010914; Mon, 12 May 2025 15:56:07 +0800 (+08) (envelope-from zhaoyang.huang@unisoc.com) Received: from SHDLP.spreadtrum.com (bjmbx01.spreadtrum.com [10.0.64.7]) by dlp.unisoc.com (SkyGuard) with ESMTPS id 4ZwsLc3vFVz2Nm5S0; Mon, 12 May 2025 15:54:00 +0800 (CST) Received: from bj03382pcu01.spreadtrum.com (10.0.73.40) by BJMBX01.spreadtrum.com (10.0.64.7) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Mon, 12 May 2025 15:56:05 +0800 From: "zhaoyang.huang" To: Andrew Morton , Yu Zhao , , , Zhaoyang Huang , Subject: [RFC PATCH] mm: throttling the reclaim when LRUVEC is congested under MGLRU Date: Mon, 12 May 2025 15:55:57 +0800 Message-ID: <20250512075557.2308397-1-zhaoyang.huang@unisoc.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.0.73.40] X-ClientProxiedBy: SHCAS01.spreadtrum.com (10.0.1.201) To BJMBX01.spreadtrum.com (10.0.64.7) X-MAIL:SHSQR01.spreadtrum.com 54C7u7LI010914 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5695218000C X-Stat-Signature: ikfuubg9ayu4ixt8uzj7jcp784o5adje X-Rspam-User: X-HE-Tag: 1747036602-72140 X-HE-Meta: U2FsdGVkX1+/Q8nnaXx7RQQtYYPVt3jA5OfN0ygFaz260ZTywWwkAiBpiTj+KJ8sMWOBrG+pw2y0jW9vYmXJxTX+D6c6znu4jXBz5vYPhOfVhudg1sbytalxZuqvBlvC2L2CwVUMi/Wk1+w9Zc0xQ21bRg98vDbyrLWpeKv03vg7z5qbFZE+sSjZnXFCqlJgyayG4dou/XZ8QB144+NJM/uoJP9Abg9PqNTRBfvXWn565tSG1bdtGaUkjfz20hJgm3KF3XGVsHetOTCN7ThRu1ojbVJ5cv0QbceHK+hqewlDDftwavO6WI1v+nMcBBo2tZ+5R7RfluJUhqWECVlKbn7duKOQ3g7nc0r34Wlks4atC607neT8IxGUENrO3jcEnnM8uKkp8XzjSoA8EBLarK+vthbJgohVO5cq7LsVefLDCVBvArUXK4lmbhh8rzNU3Sgkgbn0xxuQ3nqIPm3DEleg+xBZ1osepgMHc/rnu49ivqnnySGOPDhcGXN79C+SntbiZSOgu06tOkAmssqGyn3Dw+rCmV3JgxKYOuYSkJfJDHbuxjTXO1Jz97yFHYuQOcRlTAmC9VlqhFTSCGCkVh5K4+Ot4Xr3rxjgKW5Xzz+JfJ+YBe83JO+RAaPOJAXiQ8BfeDR140kkg2VM/fxr0AkDhXSi8CqxlLSPHpm6Mfa7SR3hUoUKw7yMkSgl/48oEITR8cWAvdlf/HgSZ6RCp8fKMibkHqUIhOb8I6iyTsGTIGxKRj6EHEYe8orXw7h6RW3V5HUfBSr+frBdq35w9SFYkun7GwOoVezjlYv/s8sFhRP/+YdrVumgoIymsvVB3wv/O40tEdG5GfWjQz2Z095cHLtzUpUNXMfOL863+3GoSpemcqqcuOzqvsxyBXKeA0Hbqs6x1QeegBK/8cpa5TABGJt+6Y85pGERDOy4OlQ1nIRmUaH5BXQuS9Jr80HKS1kCjA7usYdXSsFNLhH c5WMyN0e LFr3TbBky5GMlBEFMUAkR4ohPHYRR4vZoEqC1ugFSUmZc4MRvIdIXxu12sv2Os6irvMftCZfY/eUV66FODo+yxMuOLKOFzOmer/W0LU62RYRLd9buE1y5hZujsBmNZXQ75u7GpwPTz9ieAbKr2qJQ0iTqxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Zhaoyang Huang Our v6.6 based ANDROID system with 4GB RAM and per pid based MEMCGv2 enabled constantly experienc starving of local watchdog process [1] during an extreme fill data test over file system, which will generate enormous dirty pages on page cache along with page fault from userspace. Furthermore, we can see 423 out of 507 UN tasks are blocked by the same callstack which indicating heavy IO pressure. However, the same test case could get pass under legacy LRU. By further debug, we find that 90% reclaimed folios are dirty [2] which have reclaim be hard to reclaim folios and introduce extra IO by page thrashing(clean cold mapped page get dropped and refault quickly). We temporarily solving this by simulating the throttle thing as legacy LRU does. I think this patch works because of reclaim_throttle happens when all dirty pages of one round of scanning pages are all congested(writeback & reclaim), which is easily to reach when memcgs are configured in small granularity as we do(memcg for each single process). [1] PID: 1384 TASK: ffffff80eae5e2c0 CPU: 4 COMMAND: "watchdog" #0 [ffffffc088e4b9f0] __switch_to at ffffffd0817a8d34 #1 [ffffffc088e4ba50] __schedule at ffffffd0817a955c #2 [ffffffc088e4bab0] schedule at ffffffd0817a9a24 #3 [ffffffc088e4bae0] io_schedule at ffffffd0817aa1b0 #4 [ffffffc088e4bb90] folio_wait_bit_common at ffffffd08099fe98 #5 [ffffffc088e4bc40] filemap_fault at ffffffd0809a36b0 #6 [ffffffc088e4bd60] handle_mm_fault at ffffffd080a01a74 #7 [ffffffc088e4bdc0] do_page_fault at ffffffd0817b5d38 #8 [ffffffc088e4be20] do_translation_fault at ffffffd0817b5b1c #9 [ffffffc088e4be30] do_mem_abort at ffffffd0806e09f4 #10 [ffffffc088e4be70] el0_ia at ffffffd0817a0d94 #11 [ffffffc088e4bea0] el0t_64_sync_handler at ffffffd0817a0bfc #12 [ffffffc088e4bfe0] el0t_64_sync at ffffffd0806b1584 [2] crash_arm64_v8.0.4++> kmem -p|grep reclaim|wc -l 22184 crash_arm64_v8.0.4++> kmem -p|grep dirty|wc -l 20484 crash_arm64_v8.0.4++> kmem -p|grep "dirty.*reclaim"|wc -l 20151 crash_arm64_v8.0.4++> kmem -p|grep "writeback.*reclaim"|wc -l 123 Signed-off-by: Zhaoyang Huang --- mm/vmscan.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 3783e45bfc92..a863d5cb5281 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4698,6 +4698,11 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; sc->nr_reclaimed += reclaimed; + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); @@ -6010,10 +6015,36 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed; struct lruvec *target_lruvec; bool reclaimable = false; + unsigned long flags; if (lru_gen_enabled() && root_reclaim(sc)) { memset(&sc->nr, 0, sizeof(sc->nr)); lru_gen_shrink_node(pgdat, sc); + /* + * Tag a node/memcg as congested if all the dirty pages were marked + * for writeback and immediate reclaim (counted in nr.congested). + * + * Legacy memcg will stall in page writeback so avoid forcibly + * stalling in reclaim_throttle(). + */ + if (sc->nr.dirty && sc->nr.dirty == sc->nr.congested) { + set_bit(LRUVEC_CGROUP_CONGESTED, &flags); + + if (current_is_kswapd()) + set_bit(LRUVEC_NODE_CONGESTED, &flags); + } + + /* + * Stall direct reclaim for IO completions if the lruvec is + * node is congested. Allow kswapd to continue until it + * starts encountering unqueued dirty pages or cycling through + * the LRU too quickly. + */ + if (!current_is_kswapd() && current_may_throttle() && + !sc->hibernation_mode && + (test_bit(LRUVEC_CGROUP_CONGESTED, &flags) || + test_bit(LRUVEC_NODE_CONGESTED, &flags))) + reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED); return; } -- 2.25.1