From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5A39C02198 for ; Tue, 18 Feb 2025 07:06:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68A9B2800E8; Tue, 18 Feb 2025 02:06:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 614382800E4; Tue, 18 Feb 2025 02:06:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B4602800E8; Tue, 18 Feb 2025 02:06:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 28F592800E4 for ; Tue, 18 Feb 2025 02:06:12 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A7B261606FF for ; Tue, 18 Feb 2025 07:06:11 +0000 (UTC) X-FDA: 83132181342.05.50B4C3D Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf27.hostedemail.com (Postfix) with ESMTP id 47D1240003 for ; Tue, 18 Feb 2025 07:06:08 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eePwfrfl; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739862370; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Eyl6WgjL1XiGk5qWO8immzuKe7Gz2yvBndli/Sz5Ra0=; b=I+d9Ez9AKXwyAJXWG1QDsq5m1K38iGZIXyDTemVXS9hPjKJf+zznTq3zBM5hCLLx4GYqMs JR4V7dCRzyqpsbRTcYeOgb5IMcJwPQawC+0HVejWDFmoSJSywTxHSF6mn8CdDzVK6zNxWB BtmTtXuXUs+FpwWkqkPHt0c2dE6HP3E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739862370; a=rsa-sha256; cv=none; b=YpxpBzuzhafQJ2NzHZycr6+xMf2/2/fL6VtaXNrbBMnmmpqz0BX0JHmyJzbE9mieiOsIJW 6KKfu2eyrRZE3u3NArBxiqhxFfvM/lPujDHFOq67Umc3y2DnR/xIQQ28KGW8WVEDPw2AuH lJ4ai7dFf8ZYbAzcwmmavHwBVt/5eAY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eePwfrfl; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1739862366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eyl6WgjL1XiGk5qWO8immzuKe7Gz2yvBndli/Sz5Ra0=; b=eePwfrflks+k1sWQ9Br10G/dZioRG/ePOtfm+PclxERTgdiQdzvUwZKwwCFC+pD5s95h2K 2Jm/Szj8OYfQADNmhKehNbcsXfktHNIv5hRhRuGX37RxYJkSKAzQkmrZd3MD9hxG4qNjKB vTv9jTagJvvUOgFCWrciO9vjDPBiO5E= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: [PATCH V2] mm/hugetlb: wait for hugepage folios to be freed X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <1739604026-2258-1-git-send-email-yangge1116@126.com> Date: Tue, 18 Feb 2025 15:05:29 +0800 Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, osalvador@suse.de, liuzixing@hygon.cn Content-Transfer-Encoding: quoted-printable Message-Id: References: <1739604026-2258-1-git-send-email-yangge1116@126.com> To: yangge1116@126.com X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 47D1240003 X-Rspamd-Server: rspam07 X-Stat-Signature: uehuci7axgjgmmwjmktfdsmpxkp34m48 X-HE-Tag: 1739862368-288139 X-HE-Meta: U2FsdGVkX1+MyDuKRZ5ObqSDcy/zT8P/pE5CjU+FJ+/35NI0LbYdnIVVTdNQ/vcZgLOlCSGN/cOQjg3pWm0rB8+RyLPH0x9fBNIQHvH5dqX4m80ZtzOnJEBJnXILpGQf3z59XOae06bLLXCWbeu4U6qdWNWBxKuTNacRbCUnRw9GbSuyXlyx6f9b0W15VsvdVb9Av7WzYrSp5fp7aLn/fEf+Ax5w0bjKXWdAj1jWoRX6mNoQv/uQm+EFus10v5EwKG214Z7qA7pQrK4kIbyZbg9bwSVvtzuZrP1RcOqoS0A9+Mn57U4Mho0IaKSJOKkn3LrRAnFFDCTe0/D3yOpviUelbQCiSQgoy4yMkV7lyQQwxztP+By2fM2MeW7KE+m34jdQ4mkrpzAQetnGkS16eD4alO90idH154xsG+MOWHZLFY4iHQbe1cwg4ytqIRzakpApNdgcEZ7TX/zZ56qcOH7pEJkgoQiz6PDnRAwawqd67Dvh7BEuFk+8MDAg8K84WIHsXhy5tHNp1rsN++k+bgf5KtBZqpcMj/JEgqZodX3FWaWdyXXVKDyzgokFNyDDPtOSnwHcC/CXPFJbX/pcUM4yhsTtpeX5kLyZpVcLUtlbfnT6/DrNKziP0kWYjSF6o1/FePvyiZVjWa6JCL166VuXSxQlZiXZ9xaesWpBAesndzDaNRQYkqsQ1RzXWJE3OXbdsiyouxJnZRmYEqUq0zED8l1FCdKS89hCChQC8d666yhKT4oxOv2b9hNUGY+GE9AUYjm3y42V1kG3VIsd5MtrfbP10ZjoVPRPD9yVlKTGZDPBHnTkADvbuAjKRlq+3+3Fbpy1IkGmRXykcf4Rd/s9LHpcxVL+fGA5vDJMlgVHTeC5j/q39Vy4PFhLt8KnzKMJ2OvdaoyDrz7Q0e1r/n++RLsDj6eFKVXDI4vKNmVwFUFrEAdoo735iXWGzCh2SJblC3SiPI7ATa0Nqwt 5GM2mtpV KKBmIQLUBSieW82bkEdhOVBl3joBtcrTrnR+GD0eBW8ePAlnRYAKDItjmcbE/6oK3Bjj5PylF+UkdSoRbadI2fzweXUjzeuMa1x6nQsNbicMYfaC0IeSZH7BXAmAuMVE3H6vnragpp1yQvt73fmTo9YhxIocE13SGDQmQZHgAA07QZDf1vs3NhJg+28KjZNUmu5VfglCgAlGjzjCbG92BullAqR8mU1C1GkNL/YT/+dPWv6OGdFd/8iN+wWQU3NluA3csR1HOdz9wr2V/8B284nHgxl3qwKzNyBMdkk3uSqMU0qTsC/fRRTFXdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Feb 15, 2025, at 15:20, yangge1116@126.com wrote: >=20 > From: Ge Yang >=20 > Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer = freeing > of HugeTLB pages"), which supports deferring the freeing of HugeTLB = pages, > the allocation of contiguous memory through cma_alloc() may fail > probabilistically. >=20 > In the CMA allocation process, if it is found that the CMA area is = occupied > by in-use hugepage folios, these in-use hugepage folios need to be = migrated > to another location. When there are no available hugepage folios in = the > free HugeTLB pool during the migration of in-use HugeTLB pages, new = folios > are allocated from the buddy system. A temporary state is set on the = newly > allocated folio. Upon completion of the hugepage folio migration, the > temporary state is transferred from the new folios to the old folios. > Normally, when the old folios with the temporary state are freed, it = is > directly released back to the buddy system. However, due to the = deferred > freeing of HugeTLB pages, the PageBuddy() check fails, ultimately = leading > to the failure of cma_alloc(). >=20 > Here is a simplified call trace illustrating the process: > cma_alloc() > ->__alloc_contig_migrate_range() // Migrate in-use hugepage > ->unmap_and_move_huge_page() > ->folio_putback_hugetlb() // Free old folios > ->test_pages_isolated() > ->__test_page_isolated_in_pageblock() > ->PageBuddy(page) // Check if the page is in buddy >=20 > To resolve this issue, we have implemented a function named > wait_for_hugepage_folios_freed(). This function ensures that the = hugepage > folios are properly released back to the buddy system after their = migration > is completed. By invoking wait_for_hugepage_folios_freed() before = calling > PageBuddy(), we ensure that PageBuddy() will succeed. >=20 > Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages") The actual blamed commit should be the commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in = non-task context") which is the first to introducing the delayed work to free the hugetlb = pages. It was removed by commit db71ef79b59bb2 and then was brought back by = commit b65d4adbc0f0 immediately. > Signed-off-by: Ge Yang > --- >=20 > V2: > - flush all folios at once suggested by David >=20 > include/linux/hugetlb.h | 5 +++++ > mm/hugetlb.c | 8 ++++++++ > mm/page_isolation.c | 10 ++++++++++ > 3 files changed, 23 insertions(+) >=20 > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6c6546b..04708b0 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, = struct huge_bootmem_page *m); >=20 > int isolate_or_dissolve_huge_page(struct page *page, struct list_head = *list); > int replace_free_hugepage_folios(unsigned long start_pfn, unsigned = long end_pfn); > +void wait_for_hugepage_folios_freed(void); > struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > unsigned long addr, bool = cow_from_owner); > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int = preferred_nid, > @@ -1092,6 +1093,10 @@ static inline int = replace_free_hugepage_folios(unsigned long start_pfn, > return 0; > } >=20 > +static inline void wait_for_hugepage_folios_freed(void) > +{ > +} > + > static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct = *vma, > unsigned long addr, > bool cow_from_owner) > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 30bc34d..36dd3e4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long = start_pfn, unsigned long end_pfn) > return ret; > } >=20 > +void wait_for_hugepage_folios_freed(void) We usually use the "hugetlb" term now instead of "huge_page" to = differentiate with THP. So I suggest naming it as wait_for_hugetlb_folios_freed(). > +{ > + struct hstate *h; > + > + for_each_hstate(h) > + flush_free_hpage_work(h); Because all hstate use the shared work to defer the freeing of hugetlb = pages, we only need to flush once. Directly useing flush_work(&free_hpage_work) is = enough. > +} > + > typedef enum { > /* > * For either 0/1: we checked the per-vma resv map, and one resv > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 8ed53ee0..f56cf02 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, = unsigned long end_pfn, > int ret; >=20 > /* > + * Due to the deferred freeing of HugeTLB folios, the hugepage = folios may > + * not immediately release to the buddy system. This can cause = PageBuddy() > + * to fail in __test_page_isolated_in_pageblock(). To ensure = that the > + * hugepage folios are properly released back to the buddy = system, we hugetlb folios. Muchun, Thanks. > + * invoke the wait_for_hugepage_folios_freed() function to wait = for the > + * release to complete. > + */ > + wait_for_hugepage_folios_freed(); > + > + /* > * Note: pageblock_nr_pages !=3D MAX_PAGE_ORDER. Then, chunks of = free > * pages are not aligned to pageblock_nr_pages. > * Then we just check migratetype first. > --=20 > 2.7.4 >=20