From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1CB2C02198 for ; Tue, 18 Feb 2025 06:53:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E23212800E7; Tue, 18 Feb 2025 01:53:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DAC262800E4; Tue, 18 Feb 2025 01:53:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD7532800E7; Tue, 18 Feb 2025 01:53:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 911F52800E4 for ; Tue, 18 Feb 2025 01:53:37 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 25D9F806A8 for ; Tue, 18 Feb 2025 06:53:37 +0000 (UTC) X-FDA: 83132149674.18.1D275A9 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) by imf20.hostedemail.com (Postfix) with ESMTP id E5C091C0005 for ; Tue, 18 Feb 2025 06:53:34 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ZN4Scf8a; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.179 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739861615; a=rsa-sha256; cv=none; b=1VZhUguxCtUxwyfzT79gLN0W5abOQkEDwUCNQaa1+jv5HYrz5rKYjmPkZcTzK/MQtJgAcn RWe1zIwd0NQXR017QArnYh5bf4y3anfkn5rvvKKQCPtOIqgPPJ5ggRodvlkqXdS6k35e8H cVBfl9h8hYT31JvYzbWU2zWST4cDtqY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ZN4Scf8a; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.179 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739861615; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TiZDm5WB+DR8AOOzAJovL1KKTNOLguMYnNr7TgS2/kI=; b=F7jWZ695lSsPYmdBQ9c16MDZFzq4lmFttaLvI4ilseZcwoBnGeMaTxUeAKq7ERgOlL9fsS JoyReR3BvyfqEdlTo3J5oRV1UHbCKWvTi5jzpwA/91BcHz8FYlE5Qqx49j8JFzVzypN0gu uPTifY1CVF6ORouznMpF4WTH0JE8UE0= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1739861612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TiZDm5WB+DR8AOOzAJovL1KKTNOLguMYnNr7TgS2/kI=; b=ZN4Scf8aOx/mLSbXedowH5VfG93Rh4d8xF5iqUbtXgOr4d5Mo+aLAI1Z/qxoEvf5+j1HaH d11kDpWJBXRltIYp4wTTw79E/OMmPMzmr6CUmnnJ+F2PituacheXnYBAVwU6Jo3QfigIhr IvnxLVJpt67I7BCa43XLadCkFLOTCj0= From: Muchun Song Message-Id: <574F9D6A-F370-4A8C-9044-BC0A6189F055@linux.dev> Content-Type: multipart/alternative; boundary="Apple-Mail=_043B49FC-6A5F-448E-B7A9-9007AD3ED8E5" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: [PATCH V2] mm/hugetlb: wait for hugepage folios to be freed Date: Tue, 18 Feb 2025 14:52:54 +0800 In-Reply-To: <1739604026-2258-1-git-send-email-yangge1116@126.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, osalvador@suse.de, liuzixing@hygon.cn, Ge Yang To: yangge1116@126.com References: <1739604026-2258-1-git-send-email-yangge1116@126.com> X-Migadu-Flow: FLOW_OUT X-Stat-Signature: brdwufjg8grahjroeh5w6ebnrmwm3yhk X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E5C091C0005 X-Rspam-User: X-HE-Tag: 1739861614-173422 X-HE-Meta: U2FsdGVkX19wnpThT1sMtsNSTjHCJFs4gH7P4nbFBZgh683mMGmRp4tonHDGIE5jJWVFBF1l+iOqIlie507++T73jleB1RVfCAqw7SHf/AfzID0aEDH/hvgrXHmTXwqPw3m5eZtOlhv88kVSOMxVUrY/mAgSCpKr9VG9HNhdMEZowv53ATmLTs9ot33K7jWmTehpqVRXM4jpMDtG2pa25Zw3UNBWYGcRzzQVWfg1l51UeKE7DL75SAi/kP1W6YtAEpFVqjY67rOAvHXTVxyezBgzJtP3sazL7Jq3OnbAB6moZ+7+tF28pEMb8UYWybotNJifLObQlq2OS138fb7qjjXaACHqLlqROJNBT1cVE7yoc0BdWHOf2iZhnNXCjHNiTtBdkyD8i5fLg2xaWHkw3Xze4xRy9zRr2DEosFKzf1aXzj41xlA6K7tVlWIDrqTBtT3bomGUHyd1RkA2D7bafWgZnx48D+fDWWmpaw670yxVVNakzwggZ3s/1YCbbYtitCzcu1JLhv72GmGdserX541VOFZKuioVK8AmSaNb4lJt83IQUr06r1U3VcyLWLSfcLR8R1+5YCyH1NN2SGDFEUK1hIWLvnS3CzcDVzhglwkJyHPQEdDeNJN7NA1X7eUwAjVsyuXlBcp/ct22nzmKP8lt+OE67Nsz2r9SCYWeH7uvdEpL5k2aMGlfawhVnJaGYgNHsBoYh+Z+GEhf/My+YecaJg8e/YzGvDqAAG+WUZeSfUBNxvNy2PHdAoc7I/7d4DnRnsndBJGkbfE71sJ3gEJX7p36cvrQEXURFtrXkzZ2iSHNHgAqlnPhW/BqtfVpgV8RWEkTiYFKE2LqUYYdzQxzlSVM7CXlGC8kdnX2cukbhCHVl7MQ67xsXtxxikbG2u+HRHfnna626b/RrVBYUYsuI/I/3AVj5+BzM/3knUZWbBX06YI2rOxoqdydiy8yoWug3rUxI1M754zLfIR BlebbHfx brJxInNtXE1j72qVpYwiK2XaRD5Tr8ofCkjSb8ISQWtHi15YEEb9BHxkht082YNa6bDVTKmWmoFjEb3vi6jXJxhygP/3vZyeOFUuqlFqxf9V1YlOVwo3a0tGiE3HRNmci3IVoVtiFJEZBT0ckxcm1/osLTIHUr1ch2fqmViXwGmRwEO8tcb6EOXUETbF0WIJ1kJ86voJKl45NunJ1jrNg+JSa/+Awb1jQLH3I/lcy5R3WhGXaA1PeffEk684EwrSAR8Qz5mgoEPv8t9N5YNJzxt7ekR+ZHHQGhP+L9+fStY4WylSea/zudKBpSpO3WvygT/b+i+8ohY+LYPUMPYgD5hZ9UgtUY6u566+Cb6INcPh7c5ladDb+IaidCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --Apple-Mail=_043B49FC-6A5F-448E-B7A9-9007AD3ED8E5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Feb 15, 2025, at 15:20, yangge1116@126.com wrote: >=20 > =EF=BB=BFFrom: Ge Yang >=20 > Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer = freeing > of HugeTLB pages"), which supports deferring the freeing of HugeTLB = pages, > the allocation of contiguous memory through cma_alloc() may fail > probabilistically. >=20 > In the CMA allocation process, if it is found that the CMA area is = occupied > by in-use hugepage folios, these in-use hugepage folios need to be = migrated > to another location. When there are no available hugepage folios in = the > free HugeTLB pool during the migration of in-use HugeTLB pages, new = folios > are allocated from the buddy system. A temporary state is set on the = newly > allocated folio. Upon completion of the hugepage folio migration, the > temporary state is transferred from the new folios to the old folios. > Normally, when the old folios with the temporary state are freed, it = is > directly released back to the buddy system. However, due to the = deferred > freeing of HugeTLB pages, the PageBuddy() check fails, ultimately = leading > to the failure of cma_alloc(). >=20 > Here is a simplified call trace illustrating the process: > cma_alloc() > ->__alloc_contig_migrate_range() // Migrate in-use hugepage > ->unmap_and_move_huge_page() > ->folio_putback_hugetlb() // Free old folios > ->test_pages_isolated() > ->__test_page_isolated_in_pageblock() > ->PageBuddy(page) // Check if the page is in buddy >=20 > To resolve this issue, we have implemented a function named > wait_for_hugepage_folios_freed(). This function ensures that the = hugepage > folios are properly released back to the buddy system after their = migration > is completed. By invoking wait_for_hugepage_folios_freed() before = calling > PageBuddy(), we ensure that PageBuddy() will succeed. >=20 > Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages") > Signed-off-by: Ge Yang > --- >=20 > V2: > - flush all folios at once suggested by David >=20 > include/linux/hugetlb.h | 5 +++++ > mm/hugetlb.c | 8 ++++++++ > mm/page_isolation.c | 10 ++++++++++ > 3 files changed, 23 insertions(+) >=20 > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6c6546b..04708b0 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, = struct huge_bootmem_page *m); >=20 > int isolate_or_dissolve_huge_page(struct page *page, struct list_head = *list); > int replace_free_hugepage_folios(unsigned long start_pfn, unsigned = long end_pfn); > +void wait_for_hugepage_folios_freed(void); > struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > unsigned long addr, bool cow_from_owner); > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int = preferred_nid, > @@ -1092,6 +1093,10 @@ static inline int = replace_free_hugepage_folios(unsigned long start_pfn, > return 0; > } >=20 > +static inline void wait_for_hugepage_folios_freed(void) > +{ > +} > + > static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct = *vma, > unsigned long addr, > bool cow_from_owner) > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 30bc34d..36dd3e4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long = start_pfn, unsigned long end_pfn) > return ret; > } >=20 > +void wait_for_hugepage_folios_freed(void) We usually use the "hugetlb" term now instead of "huge_page" to = differentiate with THP. So I suggest naming it as = wait_for_hugetlb_folios_freed(). > +{ > + struct hstate *h; > + > + for_each_hstate(h) > + flush_free_hpage_work(h); Because all hstate use the shared work to defer the freeing of hugetlb = pages, we only need to flush once. Directly useing = flush_work(&free_hpage_work) is enough. > +} > + > typedef enum { > /* > * For either 0/1: we checked the per-vma resv map, and one resv > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 8ed53ee0..f56cf02 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, = unsigned long end_pfn, > int ret; >=20 > /* > + * Due to the deferred freeing of HugeTLB folios, the hugepage = folios may > + * not immediately release to the buddy system. This can cause = PageBuddy() > + * to fail in __test_page_isolated_in_pageblock(). To ensure that = the > + * hugepage folios are properly released back to the buddy = system, we hugetlb folios, pls. Thanks, Muchun > + * invoke the wait_for_hugepage_folios_freed() function to wait = for the > + * release to complete. > + */ > + wait_for_hugepage_folios_freed(); > + > + /* > * Note: pageblock_nr_pages !=3D MAX_PAGE_ORDER. Then, chunks of = free > * pages are not aligned to pageblock_nr_pages. > * Then we just check migratetype first. > --=20 > 2.7.4 >=20 --Apple-Mail=_043B49FC-6A5F-448E-B7A9-9007AD3ED8E5 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8


On = Feb 15, 2025, at 15:20, yangge1116@126.com = wrote:

=EF=BB=BFFrom: Ge Yang = <yangge1116@126.com>

Since the = introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer = freeing
of HugeTLB pages"), which supports deferring the = freeing of HugeTLB pages,
the allocation of contiguous = memory through cma_alloc() may = fail
probabilistically.

I= n the CMA allocation process, if it is found that the CMA area is = occupied
by in-use hugepage folios, these in-use = hugepage folios need to be migrated
to another location. = When there are no available hugepage folios in the
free = HugeTLB pool during the migration of in-use HugeTLB pages, new = folios
are allocated from the buddy system. A temporary = state is set on the newly
allocated folio. Upon = completion of the hugepage folio migration, = the
temporary state is transferred from the new folios = to the old folios.
Normally, when the old folios with = the temporary state are freed, it is
directly released = back to the buddy system. However, due to the = deferred
freeing of HugeTLB pages, the PageBuddy() check = fails, ultimately leading
to the failure of = cma_alloc().

Here is a simplified call = trace illustrating the = process:
cma_alloc()
=    ->__alloc_contig_migrate_range() // Migrate in-use = hugepage
=        ->unmap_and_move_huge_page()<= /span>
=            ->fol= io_putback_hugetlb() // Free old folios
=    ->test_pages_isolated()
=        ->__test_page_isolated_in_pag= eblock()
=             -&= gt;PageBuddy(page) // Check if the page is in = buddy

To resolve this issue, we have = implemented a function = named
wait_for_hugepage_folios_freed(). This function = ensures that the hugepage
folios are properly released = back to the buddy system after their migration
is = completed. By invoking wait_for_hugepage_folios_freed() before = calling
PageBuddy(), we ensure that PageBuddy() will = succeed.

Fixes: b65d4adbc0f0 ("mm: = hugetlb: defer freeing of HugeTLB pages")
Signed-off-by: = Ge Yang = <yangge1116@126.com>
---

= V2:
- flush all folios at once suggested by = David

include/linux/hugetlb.h | =  5 +++++
mm/hugetlb.c =            | =  8 ++++++++
mm/page_isolation.c =     | 10 ++++++++++
3 files = changed, 23 insertions(+)

diff --git = a/include/linux/hugetlb.h = b/include/linux/hugetlb.h
index 6c6546b..04708b0 = 100644
--- a/include/linux/hugetlb.h
+++ = b/include/linux/hugetlb.h
@@ -697,6 +697,7 @@ bool = hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page = *m);

int = isolate_or_dissolve_huge_page(struct page *page, struct list_head = *list);
int replace_free_hugepage_folios(unsigned long = start_pfn, unsigned long end_pfn);
+void = wait_for_hugepage_folios_freed(void);
struct folio = *alloc_hugetlb_folio(struct vm_area_struct *vma,
=                unsigned long = addr, bool cow_from_owner);
struct folio = *alloc_hugetlb_folio_nodemask(struct hstate *h, int = preferred_nid,
@@ -1092,6 +1093,10 @@ static inline int = replace_free_hugepage_folios(unsigned long start_pfn,
=    return 0;
= }

+static inline void = wait_for_hugepage_folios_freed(void)
+{
+}=
+
static inline struct folio = *alloc_hugetlb_folio(struct vm_area_struct *vma,
=                     =   unsigned long addr,
      =                 bool = cow_from_owner)
diff --git a/mm/hugetlb.c = b/mm/hugetlb.c
index 30bc34d..36dd3e4 = 100644
--- a/mm/hugetlb.c
+++ = b/mm/hugetlb.c
@@ -2955,6 +2955,14 @@ int = replace_free_hugepage_folios(unsigned long start_pfn, unsigned long = end_pfn)
   return ret;
= }

+void = wait_for_hugepage_folios_freed(void)

We usually use the "hugetlb" term now instead of "huge_page" to = differentiate with THP. So I suggest naming it as = wait_for_hugetlb_folios_freed().

+{
+    struct hstate = *h;
+
+   =  for_each_hstate(h)
+       =  flush_free_hpage_work(h);

Because all hstate use the shared work to defer the freeing of = hugetlb pages, we only need to flush once. Directly useing = flush_work(&free_hpage_work) is = enough.

+}
+
typedef enum = {
   /*
    * = For either 0/1: we checked the per-vma resv map, and one = resv
diff --git a/mm/page_isolation.c = b/mm/page_isolation.c
index 8ed53ee0..f56cf02 = 100644
--- a/mm/page_isolation.c
+++ = b/mm/page_isolation.c
@@ -615,6 +615,16 @@ int = test_pages_isolated(unsigned long start_pfn, unsigned long = end_pfn,
   int = ret;

  =  /*
+     * Due to the deferred freeing = of HugeTLB folios, the hugepage folios may
+   =   * not immediately release to the buddy system. This can cause = PageBuddy()
+     * to fail in = __test_page_isolated_in_pageblock(). To ensure that = the
+     * hugepage folios are properly = released back to the buddy system, = we

hugetlb folios, = pls.

Thanks,
Muchun

+     * invoke the = wait_for_hugepage_folios_freed() function to wait for = the
+     * release to = complete.
+     */
+   =  wait_for_hugepage_folios_freed();
+
= +    /*
    * Note: = pageblock_nr_pages !=3D MAX_PAGE_ORDER. Then, chunks of = free
    * pages are not aligned to = pageblock_nr_pages.
    * Then we just check = migratetype first.
-- =
2.7.4

= --Apple-Mail=_043B49FC-6A5F-448E-B7A9-9007AD3ED8E5--