From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A8ADDCCFA1A for ; Mon, 10 Nov 2025 08:19:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 030038E0016; Mon, 10 Nov 2025 03:19:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F22718E0002; Mon, 10 Nov 2025 03:19:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E11788E0016; Mon, 10 Nov 2025 03:19:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D15FC8E0002 for ; Mon, 10 Nov 2025 03:19:26 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 963B24C5B0 for ; Mon, 10 Nov 2025 08:19:26 +0000 (UTC) X-FDA: 84093997932.17.124FB42 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) by imf20.hostedemail.com (Postfix) with ESMTP id DF0761C000D for ; Mon, 10 Nov 2025 08:19:24 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=OdYJSjJS; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of qi.zheng@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1762762765; a=rsa-sha256; cv=none; b=Ltpp99aVeBZ4aPlnFXWmCsqj9yJTJ0sg6zhfRFK7EjWTRb6yhF98Qdm504d9OYv04BbLYj eoUZuQ3pgExF/aJjWZ5dAYDIHjf6utB5t98XlCM4FC/5NLU6a2mfgkPp9LddkR6iWRS0+p GSKhnbZYb7a5vuhMPTVGWip/a/iFpLc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=OdYJSjJS; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of qi.zheng@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1762762765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DcH5FRh2OgNOiAA5XZ85DFKUDIXRNw+gAWgPnqyqYP4=; b=OkVq5Gc4hbKQMIv7OaicCCV5jdYJ7RXiebr+kZhLcK6tiGGvvveJ6ierg+FTu0GWWE0MNb 8r8xXu6aytsJKQbPdKab6IUHrI9WdfGdk6SpSS6VTBVCwHSb/yr9eIuLCUIY1ZgtYV2fGv C+1L3gEBKoZVCisto+vKVNDoOtxTjyw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1762762761; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DcH5FRh2OgNOiAA5XZ85DFKUDIXRNw+gAWgPnqyqYP4=; b=OdYJSjJSJdjBoyYkava6ac8Gy+FB90GioWBEWKbzkcWNozB5qkOZZPF/jVTVGcQCKsmE/T Boit6jYQ7m5eROKfG7JnIXra5LCXxCXeilTWneeY9HFrzUX9ouHRy3DXLQuJiPeEI6yUwR j6Eiznh1aF6jCuHAtVgsCyGNs9Ew7ps= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org, richard.weiyang@gmail.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v6 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Mon, 10 Nov 2025 16:17:57 +0800 Message-ID: <59cb6b6fb5ffcff9d23b81890b252960139ad8e7.1762762324.git.zhengqi.arch@bytedance.com> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: DF0761C000D X-Stat-Signature: hq9sxqxg1bq54cx4jxdqr5asb5e65hq4 X-HE-Tag: 1762762764-866449 X-HE-Meta: U2FsdGVkX18iJM2f60kXrcL3QasNdlnZWRKSXb5dpO8DevVGZtCn3u4D4+IANjg8FK1pG0gRseBVWbfS1p6N3NvFJLaMqZwxRQS208tXQyxhSsQntTLAOmSbQS6qrNy5GVt+ODW/PZ2DJ/b5AS4A4liVQbPvS28RDkm+VjPhmlDlgZk6oZXTG2/dkoxJAtuJeBNWLKLMCPOMCq6nakRkScG8tFKGp5Nf13BdXnYRbLwUGOCz03MtHOHR1i6WM5w2Ntpknqs25C2N0NTX+FUccbOSBTL0MWoKhzUpuBd1f6A0HsUCNB8T0QI6I7XwuDAjjmHxpRCuGub0sRZiyewzSi6HCxmLZcaesshwsWvdLIfv7FbAJ88WjP8doU9ysDiKz2C9DsEFPbn8bghWWTXi72fh/4P8hUkJ+u5Qgtn5SiyaTYzUXgREeUAKwYUsSCwYilYjO1ER5RWG21/b4EWatWOzxiNQjoSx4bNzBLeXxJO0RRYoqc85Hvy+ZTWlJc1nqSn/gORwAiuh8DX6+CDtC0b6my4RjGEeuicwH5v7sZOyxYka9/MAIVaPag0ewjCofvJ77r3bd5C9vS+YbZl98hXFuCWy9sbtb5suvRm1lGAx2CqTA0/+ZpuoL/CNYxlthUPKOjUgt2c8X2lbfZ+L3pGxutDf9o2fsfd4AsKaIzG1sS0GWdgYWRHptYwAoX5O9KHNriiIcLkc+vaT28SOJxEuFLHTRVmmgU8FMZNyDAF3Y12fS7aksGH+BGPLMLFtFNRayaCF7qRv/TSE1dVN9zJtQ8CsCaRz29J4JttLcH0Pv8syWfYKae3sicpvIxXja0pAzCiQ4ZgD6wsk8WYqTCJHDTMjZv6zTfh0orMDo77d+3pgOKSuoRp6ksH5dRTzMQ6MxyBVNC8vhKNWeyVUHAWv87XkCRfWOTpFozAny8oYjEAqNA41XJryTF3n3kCW+yrAZWnu2sxZkl1rluv q5zSsvFt eBqp9gWUtdqREMfLSBHTgZjWcWkgEmfiU4xhAXoLd1F21sfzhAbQHcJ3C46JzaZXYucahXPSVF3xDBfR4PN7AJgMY9NeHUkDPqNyskWZZtGKr786k1bWX6SMo8Ck2rO3QVNEAs6VagvsSQeAdkXDJUjj+BRmcf1dTQ0WcNSyya+PZIqv/WcuGM78Dx+S7kTsnvrjP9WqtMrpmTUUvgItg2n4qofyiAj/WN2rPDKDmt1nYzsVDORj0iKVBTfRyXWbcEgURTQfBIrQBDK4MFEJU3ggprhnOsDxHC6tL4RE+DLfiRSvdEQv9A1hhbzjZKJZGhfIey55n7k+COb6Zisj3MghVZhCV2TBodJteLOuPt6v8y7I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Muchun Song The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, the lock is not needed as it is not protecting anything. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped"). It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent first. So this patch prepares for using folio_split_queue_lock_irqsave() as the memcg may change then. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Zi Yan Acked-by: David Hildenbrand Acked-by: Shakeel Butt Reviewed-by: Wei Yang Reviewed-by: Harry Yoo --- mm/huge_memory.c | 87 +++++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 46 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 858f896ef3bf3..db03853a73e3f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3901,21 +3901,22 @@ static int __folio_split(struct folio *folio, unsigned int new_order, struct lruvec *lruvec; int expected_refs; - if (old_order > 1 && - !list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + if (old_order > 1) { + if (!list_empty(&folio->_deferred_list)) { + ds_queue->split_queue_len--; + /* + * Reinitialize page_deferred_list after removing the + * page from the split_queue, otherwise a subsequent + * split will see list corruption when checking the + * page_deferred_list. + */ + list_del_init(&folio->_deferred_list); + } if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(old_order, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } - /* - * Reinitialize page_deferred_list after removing the - * page from the split_queue, otherwise a subsequent - * split will see list corruption when checking the - * page_deferred_list. - */ - list_del_init(&folio->_deferred_list); } split_queue_unlock(ds_queue); if (mapping) { @@ -4314,35 +4315,40 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, { struct deferred_split *ds_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev = NULL; - int split = 0, removed = 0; + struct folio *folio, *next; + int split = 0, i; + struct folio_batch fbatch; + folio_batch_init(&fbatch); + +retry: ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (!folio_batch_space(&fbatch)) + break; } split_queue_unlock_irqrestore(ds_queue, flags); - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i = 0; i < folio_batch_count(&fbatch); i++) { bool did_split = false; bool underused = false; + struct deferred_split *fqueue; + folio = fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4365,38 +4371,27 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); + fqueue = folio_split_queue_lock_irqsave(folio, &flags); + if (list_empty(&folio->_deferred_list)) { + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; } - if (folio) - folio_put(folio); + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -= removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - - if (prev) - folio_put(prev); + if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) { + cond_resched(); + goto retry; + } /* * Stop shrinker if we didn't split any page, but the queue is empty. -- 2.20.1