From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C68E0E77188 for ; Sun, 22 Dec 2024 08:14:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 066C16B007B; Sun, 22 Dec 2024 03:14:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F3CE86B0082; Sun, 22 Dec 2024 03:14:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF91C6B0083; Sun, 22 Dec 2024 03:14:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BF7526B007B for ; Sun, 22 Dec 2024 03:14:06 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 29F021213D9 for ; Sun, 22 Dec 2024 08:14:06 +0000 (UTC) X-FDA: 82921881336.11.6BE41EA Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.9]) by imf04.hostedemail.com (Postfix) with ESMTP id 0E6274000A for ; Sun, 22 Dec 2024 08:13:23 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=WabXblh+; spf=pass (imf04.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734855210; a=rsa-sha256; cv=none; b=wlSQTNy1g1kHFiTHQJ31fKV3zO6lMikZ6iokdkEzO4t4K16ZYWsMScuc5xj9LwfaYSXeBL sdYVg3zKXrFQgaAvBH4+LW19E3bGrsWr4vqVJoR9MkjYXcjHY9/dIGRSRaeMiDtIwwnc8R SWDsQFWGikBAFMWUv0iXE0b0c2vRMyg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=WabXblh+; spf=pass (imf04.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734855210; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MEj+1+5yRDWIb0ATQhyGvjl5S2ZyCLRVuyNGEQnBk68=; b=0vYJ/yF9m+VMZc2y+5nc7OuT6hF1igSa1nsGOFi8U1L27DC7Sm7p4JKMy2/fdEYu87VRlu mAD33xwkAanhVvrpn5gDgM/T1gcSsqyYqipHTud/PrN5LSVbGAtXMEixZy7+fVspyawKl1 ke3T+wgHKqwASp/DyJWH+tfnSEQPH+4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=MEj+1+5yRDWIb0ATQhyGvjl5S2ZyCLRVuyNGEQnBk68=; b=WabXblh+7T8z+YPvm0ufFQLa67aDzukKmFkbD2dwi+V3MilPgeYNVI01MCjm5x 3bwMM0N8mMY8zWz7H0kILnJuywcXA99d9dXaGPmHKn0dXbwak2xoznva+rCSA5bY dAN2d2m9SZi2kXPWMFs7lO9RmWq2b+fhdIWhiYMGvUK5U= Received: from [172.20.10.3] (unknown []) by gzga-smtp-mtada-g1-0 (Coremail) with SMTP id _____wD314M+ymdnnSbTAA--.60994S2; Sun, 22 Dec 2024 16:13:51 +0800 (CST) Message-ID: Date: Sun, 22 Dec 2024 16:13:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] replace free hugepage folios after migration To: David Hildenbrand , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, muchun.song@linux.dev, liuzixing@hygon.cn References: <1734503588-16254-1-git-send-email-yangge1116@126.com> <0ca35fe5-9799-4518-9fb1-701c88501a8d@redhat.com> From: Ge Yang In-Reply-To: <0ca35fe5-9799-4518-9fb1-701c88501a8d@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD314M+ymdnnSbTAA--.60994S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxKrWkXrWDtFy3XFWUAryrJFb_yoW7uw15pF y8Gr15GrWDJr9rGr1Iqan8Ar1Sy3ykXFWjkFWftrW3ZFnxtr929Fn0ywn093y8Cr97CF4I vFWjqr4kuF1UAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UKzuZUUUUU= X-Originating-IP: [39.144.107.68] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWRO9G2dnuXaZzgAAs0 X-Rspamd-Queue-Id: 0E6274000A X-Stat-Signature: rbjghbi4whxdscd1z68n1h8wrnoq7ush X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1734855203-246728 X-HE-Meta: U2FsdGVkX1965x3FKjbub4ffdwCgNDjowjUm05qh4c2SXKGf9k8avKSqyrGUH1pGRdKnVUB5Yg4rvhTxUs6VQ62Y+jXA2pCcua9kuK8mnXm4kE9RrHV6HO9N8olXQFfk3tFphDEy2IIJq2DP2nCE2MfekE5gqtHi0AVKppRk7TS9yeL9/Y8Xea+BWv4H6Y1pzMS+mL0D/fdBQcr/rsLLRykbWZUYCZm/Q6cUN8a2j1Skl4ASDIq7Y1kKKELf1qjBRTiMyvcpxWA8NuYScOGtLIUAau3Gp0eUV/MydFOjBoEfGbovPcK1iECGVkXOwz7HPcjTtM08VjEQcGVeWvrdk5KNLe9sNneKCmQ7Cv5UmXTmF38SSSPj1jQl/VAYJ7f3xUup4syH0TUsQXXLBfYLY+yV+/2xTgfKla+ASKKNQRmrO9VQBGcx/+QgEUYldasEeL9+IHmjFicHIHObNyBM+S02WGBylelQvDjsDRiER1IHt5MS9SPefheNTUrDp2kgZQKTCz/OclMiDnDvCLibjHLsCDLSah+PAQQGjl5bQo8Xc2trQvAO1B3ouMAx1gt1XAYlI8x6lv3Urz8U9BbaXUE1XxWRBs01gAy8Bvz/k85aMuHHP0rOzODWNXDiuLZMz+NwuQGy3TKHzM2STHOXtFmR0SlhYBuJ4SmL/xi3VnDpB/y3XISjKnk8K9aN/g6ZIycrBMvmHpdtFtIn4u+Eyu0JaCJthkJ44CtmWHzOV0ujCUhcvcyqj4Nu5iao+ZsqvYv4AjTPyuXFPKSKPIpYEt5odw87JCM1D6T+E8Jb2DzvECjS9ktEOecSBfXenRwayahl06PpKv2ScLyNzNENnpQeOLCywxdBw4m/FB44Px3P7RQhrG3uAmVpkuOJ5JZfu7pdkKuAP4vlNIcC9ZFmFO0m0Tf9RDS23EnqsyNd/YLBJLud4Nbw0WLOIZScSbejE8Ki+8TbEf0k9UJtOGs oKr7jWyG Lg5bs8Lbrgc7jHZHkbva85HQKGRkHtXFm1t/fY7mR+Uun4pdDQld58xytDps5C5SUiHTFoNidDAxlgVnBZWOXFoIKsU2xAGP4dSe30LC5U69+8d2omWbMGWZIJDpB5SOEN5EQ67vH2wJLejy0+dAXxIESvJox+I90u7/LiNq/XukY03xPZ4tgifGl8mXbhlDzCSuoyCA/wXIudE5+oMZfUdAhilvGANxP8GVVRC9bR5fBj9Fn2kEOStB4CbV0donZtbV2111+AUpvY0DgFDJd8fTmdw3yic+3/QPTyfVta6OqCmehM0of5iMyimHTMHUX1Ld8IsjRpx4t78Z/enjGOGjIPlWGeao4hJqzg+ZrxoRka2i/4sHU+H4NKvDPW1TaqJE5/WfI+NYyoIw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/21 22:35, David Hildenbrand 写道: > On 18.12.24 07:33, yangge1116@126.com wrote: >> From: yangge >> >> My machine has 4 NUMA nodes, each equipped with 32GB of memory. I >> have configured each NUMA node with 16GB of CMA and 16GB of in-use >> hugetlb pages. The allocation of contiguous memory via the >> cma_alloc() function can fail probabilistically. >> >> The cma_alloc() function may fail if it sees an in-use hugetlb page >> within the allocation range, even if that page has already been >> migrated. When in-use hugetlb pages are migrated, they may simply >> be released back into the free hugepage pool instead of being >> returned to the buddy system. This can cause the >> test_pages_isolated() function check to fail, ultimately leading >> to the failure of the cma_alloc() function: >> cma_alloc() >>      __alloc_contig_migrate_range() // migrate in-use hugepage >>      test_pages_isolated() >>          __test_page_isolated_in_pageblock() >>               PageBuddy(page) // check if the page is in buddy >> >> To address this issue, we will add a function named >> replace_free_hugepage_folios(). This function will replace the >> hugepage in the free hugepage pool with a new one and release the >> old one to the buddy system. After the migration of in-use hugetlb >> pages is completed, we will invoke the replace_free_hugepage_folios() >> function to ensure that these hugepages are properly released to >> the buddy system. Following this step, when the test_pages_isolated() >> function is executed for inspection, it will successfully pass. >> >> Signed-off-by: yangge >> --- >>   include/linux/hugetlb.h |  6 ++++++ >>   mm/hugetlb.c            | 37 +++++++++++++++++++++++++++++++++++++ >>   mm/page_alloc.c         | 13 ++++++++++++- >>   3 files changed, 55 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index ae4fe86..7d36ac8 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h >> @@ -681,6 +681,7 @@ struct huge_bootmem_page { >>   }; >>   int isolate_or_dissolve_huge_page(struct page *page, struct >> list_head *list); >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn); >>   struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >>                   unsigned long addr, int avoid_reserve); >>   struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int >> preferred_nid, >> @@ -1059,6 +1060,11 @@ static inline int >> isolate_or_dissolve_huge_page(struct page *page, >>       return -ENOMEM; >>   } >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn) >> +{ >> +    return 0; >> +} >> + >>   static inline struct folio *alloc_hugetlb_folio(struct >> vm_area_struct *vma, >>                          unsigned long addr, >>                          int avoid_reserve) >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 8e1db80..a099c54 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2975,6 +2975,43 @@ int isolate_or_dissolve_huge_page(struct page >> *page, struct list_head *list) >>       return ret; >>   } >> +/* >> + *  replace_free_hugepage_folios - Replace free hugepage folios in a >> given pfn >> + *  range with new folios. >> + *  @stat_pfn: start pfn of the given pfn range >> + *  @end_pfn: end pfn of the given pfn range >> + *  Returns 0 on success, otherwise negated error. >> + */ >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn) >> +{ >> +    struct hstate *h; >> +    struct folio *folio; >> +    int ret = 0; >> + >> +    LIST_HEAD(isolate_list); >> + >> +    while (start_pfn < end_pfn) { >> +        folio = pfn_folio(start_pfn); >> +        if (folio_test_hugetlb(folio)) { >> +            h = folio_hstate(folio); >> +        } else { >> +            start_pfn++; >> +            continue; >> +        } >> + >> +        if (!folio_ref_count(folio)) { >> +            ret = alloc_and_dissolve_hugetlb_folio(h, folio, >> &isolate_list); >> +            if (ret) >> +                break; >> + >> +            putback_movable_pages(&isolate_list); >> +        } >> +        start_pfn++; >> +    } >> + >> +    return ret; >> +} >> + >>   struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >>                       unsigned long addr, int avoid_reserve) >>   { >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index dde19db..1dcea28 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6504,7 +6504,18 @@ int alloc_contig_range_noprof(unsigned long >> start, unsigned long end, >>       ret = __alloc_contig_migrate_range(&cc, start, end, migratetype); >>       if (ret && ret != -EBUSY) >>           goto done; >> -    ret = 0; >> + >> +    /* >> +     * When in-use hugetlb pages are migrated, they may simply be >> +     * released back into the free hugepage pool instead of being >> +     * returned to the buddy system. After the migration of in-use >> +     * huge pages is completed, we will invoke the >> +     * replace_free_hugepage_folios() function to ensure that >> +     * these hugepages are properly released to the buddy system. >> +     */ > > As mentioned in my other mail, what I don't like about this is, IIUC, > the pages can get reallocated anytime after we successfully migrated > them, or is there  anything that prevents that? > The pages can get reallocated anytime after we successfully migrated them. Currently, I haven't thought of a good way to prevent it. > Did you ever try allocating a larger range with a single > alloc_contig_range() call, that possibly has to migrate multiple hugetlb > folios in one go (and maybe just allocates one of the just-freed hugetlb > folios as migration target)? > I have tried using a single alloc_contig_range() call to allocate a larger contiguous range, and it works properly. This is because during the period between __alloc_contig_migrate_range() and isolate_freepages_range(), no one allocates a hugetlb folio from the free hugetlb pool. >