From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5214EE77188 for ; Fri, 20 Dec 2024 08:56:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCCFC6B007B; Fri, 20 Dec 2024 03:56:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D7C1C6B0083; Fri, 20 Dec 2024 03:56:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C44626B0089; Fri, 20 Dec 2024 03:56:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A78DF6B007B for ; Fri, 20 Dec 2024 03:56:46 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 16F76140806 for ; Fri, 20 Dec 2024 08:56:46 +0000 (UTC) X-FDA: 82914731214.05.4EF0199 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.9]) by imf30.hostedemail.com (Postfix) with ESMTP id 4FBF280008 for ; Fri, 20 Dec 2024 08:55:37 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=RAqHHnzS; spf=pass (imf30.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734684988; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bAGuREr7HeDMX+iJP6gjezFu9fw3q6kjQHi3rKN4Et8=; b=gHnHGvFZ+TZPW9kXUE62aV3ifTQrnBNpvZ96ntn8O+2z5zvj6Lpvp2Y9S174OEeYe9NQfA Q37Xppe2LPQcxTQuosmPPnX2FGXeWPa78UN1cPXP/UOOmwctz0BKhDw3N03Nzi20gWFKY8 dZQh7jMoGe7c6NFWToAxnAqiY7KEajA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=RAqHHnzS; spf=pass (imf30.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734684988; a=rsa-sha256; cv=none; b=KdjgPENAlMVJantYJUM8fPeQgCE7jIT94qRSW5g3pM/2+Qsc8Av2dki66CK7s4mZl+OKIe GmJmPKb6QsBrKldzvvcIiFYjsEKBLhH6ZDBxvzMS0BU6+jQyCyc2VOGZ4J4csXLwnywMhp g3EmD9o+TW4LbqiW3Bd7/Wi9zPMWNW8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=bAGuREr7HeDMX+iJP6gjezFu9fw3q6kjQHi3rKN4Et8=; b=RAqHHnzSeKNsQszCDqHl0IpSRHlWgwNQRTsIePebKh0lEtbuDo/HAcjoOmL29J CzDwQuXKnZgCJQwHthr/CLKJ9JfCl6QnAZjvm4l0mzWTxMy9nGOK3f3Wxivry14A EsRLMDJOuFZZx/U8m9dOlE2vaUhsISGbBN6W7N25igAuA= Received: from [172.20.10.3] (unknown [39.144.39.55]) by gzsmtp3 (Coremail) with SMTP id pykvCgBHrlFDMWVn+T0BCw--.29692S2; Fri, 20 Dec 2024 16:56:36 +0800 (CST) Message-ID: <1241b567-88b6-462c-9088-8f72a45788b7@126.com> Date: Fri, 20 Dec 2024 16:56:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] replace free hugepage folios after migration To: David Hildenbrand , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, muchun.song@linux.dev, liuzixing@hygon.cn, Oscar Salvador References: <1734503588-16254-1-git-send-email-yangge1116@126.com> <0b41cc6b-5c93-408f-801f-edd9793cb979@redhat.com> From: Ge Yang In-Reply-To: <0b41cc6b-5c93-408f-801f-edd9793cb979@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:pykvCgBHrlFDMWVn+T0BCw--.29692S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxKrWkKw1UAF1xAw48Ww4kZwb_yoWxXrWfpF y8Gr15KrWDJr9rGr12qan8Cr1SvrWkXFWjyFWfJ343ZFnxtr929F1Dtwn093yrAr97CF4I vFW2qFWkuF1UAa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UKFAJUUUUU= X-Originating-IP: [39.144.39.55] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbifgu7G2dlLw8k+wAAso X-Rspamd-Queue-Id: 4FBF280008 X-Stat-Signature: j7trcy6yi5i6yaap8spdmzyse6joj3du X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734684937-161836 X-HE-Meta: U2FsdGVkX1/mqwJGTO1cuFZARhLKcwL/LncLhMrni+t7LBXU79yRvt9+gJqSmON3FeqhpmCbuQmyyE3+8l+I3IE+7YNltZ00/164o55iZYaf6N+GxZ8ahmkyzwEKDctFigH/oWgsCDawd7dAO5Q3n8WNq9xM+IDAxllY7YjcI1AKbexuqV/Vuzjx2mZBfreWVESzy/CAA9M2aeg+QXX92JiEuypi/atnFbO0+HCRhXeUE7q94CG42JJ8k0SrYIhCIhtGC74pGCNOZF/QKhCgb86PsvQu71t3zPql9/3WNjTIvVIa1bclqqq0NtVRr482Litm4i7VjHuadwpCYrVigqo8TsYzHJEvpPXkbj/F1uNYSxUC34loE0Jii6DcLfcNc6e+ER63gg57QNWttv0TlPEDzla2JyXKOfd8ZXzuk+tz2K25iSFVrHM6z2H2+PbhLCG6Yrx6AV/1AFyR1v8Aot4UwPVHb/MtG2g3cLob89cyIfY33hVxAVCoarqWi/waG+0hHHiT3n30fPloBAzcdDkyyQrPxMEOuXoIUeb4h6syUtWo3x+AdFiZITSuAWf1xRqx3tWQRtTRNUNsoBpJk3SbZzH8mEjgIQfZBxuxrw4uTxB87kBCFKdftaSeEFQV18LcSGmQ6Qhw0Mc15mgFuBx9V8iRmbOuaoQaAtQOdpG85e6ah7F0dlgytNsC7me2TJcZGuhhW++bwc2YH+GLXm59L4C3mjLRhZJmIqfJDwqt47pltKGoYXTmNh5zHcob7MLl5ByUxokWL36sO455Ffw2Ha4EgqHcF3Y6s9wZThAYEtWI3T6sAZH3jaZedRpTzl9lYyxYAxa3t1ewYUM0j9bOkJ8R9TJfUxutTJfB8LEaLOAEUzC3Te4ipD2QpE/CTwZ9Y42gNSJSrnpVC4Ei8VFhkUVGVcZTteD9Tef3IaFLQu1UpsrrcJQjQFS29AsQ94BHwTVhqcA85jEMClF 1dWs++Pt aqDaDAIDDaW3jIJ7hj8zNO5AW0JPFKcGHl5F7EL+IkhaDWAeaaD2IdG1uQrG+1LwN+FOCFNZJRk6TwwpigbfR6Yl3h2DI9oNATcdr3FhRjjlhklVlp0CA6cKSg25MustGyJ6icv/yIxB20kVsdyYpTLRzniXf2PMwF+a3YEjWGTWhif/i9RzhNPgSdLYnXYRqfExbzIzVDiPAjqLE1/zmFLhzGQgJkh2CzNBnsePRvFhspRrofS6MAov2FMBxTMEdRpdPGqNyJvqkXvQsDCovLwcHMUY33teR/aE4lODSwmqPSlKG+/qBjIEJAYBp9TYtw2kP3K+E+3/ZeX6YCOjDjz9P7Qx7U5j+kSSl+eIvoIFCo+Iem+q838g0XakwF5hn/04C3nhueVHiHbXyLmKmxWUeLYEIuy0xDyEPSWefKHueQQlCtgM75hfu5vGEC/eZ6nzD8YP9OH0fOW+11B0QvwHgCA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/20 0:40, David Hildenbrand 写道: > On 18.12.24 07:33, yangge1116@126.com wrote: >> From: yangge > > CCing Oscar, who worked on migrating these pages during memory offlining > and alloc_contig_range(). > >> >> My machine has 4 NUMA nodes, each equipped with 32GB of memory. I >> have configured each NUMA node with 16GB of CMA and 16GB of in-use >> hugetlb pages. The allocation of contiguous memory via the >> cma_alloc() function can fail probabilistically. >> >> The cma_alloc() function may fail if it sees an in-use hugetlb page >> within the allocation range, even if that page has already been >> migrated. When in-use hugetlb pages are migrated, they may simply >> be released back into the free hugepage pool instead of being >> returned to the buddy system. This can cause the >> test_pages_isolated() function check to fail, ultimately leading >> to the failure of the cma_alloc() function: >> cma_alloc() >>      __alloc_contig_migrate_range() // migrate in-use hugepage >>      test_pages_isolated() >>          __test_page_isolated_in_pageblock() >>               PageBuddy(page) // check if the page is in buddy > > I thought this would be working as expected, at least we tested it with > alloc_contig_range / virtio-mem a while ago. > > On the memory_offlining path, we migrate hugetlb folios, but also > dissolve any remaining free folios even if it means that we will going > below the requested number of hugetlb pages in our pool. > > During alloc_contig_range(), we only migrate them, to then free them up > after migration. > > Under which circumstances doe sit apply that "they may simply be > released back into the free hugepage pool instead of being returned to > the buddy system"? > After migration, in-use hugetlb pages are only released back to the hugetlb pool and are not returned to the buddy system. The specific steps for reproduction are as follows: 1,Reserve hugetlb pages. Some of these hugetlb pages are allocated within the CMA area. echo 10240 > /proc/sys/vm/nr_hugepages 2,To ensure that hugetlb pages are in an in-use state, we can use the following command. qemu-system-x86_64 \ -mem-prealloc \ -mem-path /dev/hugepage/ \ ... 3,At this point, using cma_alloc() to allocate contiguous memory may result in a probable failure. >> >> To address this issue, we will add a function named >> replace_free_hugepage_folios(). This function will replace the >> hugepage in the free hugepage pool with a new one and release the >> old one to the buddy system. After the migration of in-use hugetlb >> pages is completed, we will invoke the replace_free_hugepage_folios() >> function to ensure that these hugepages are properly released to >> the buddy system. Following this step, when the test_pages_isolated() >> function is executed for inspection, it will successfully pass. >> >> Signed-off-by: yangge >> --- >>   include/linux/hugetlb.h |  6 ++++++ >>   mm/hugetlb.c            | 37 +++++++++++++++++++++++++++++++++++++ >>   mm/page_alloc.c         | 13 ++++++++++++- >>   3 files changed, 55 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index ae4fe86..7d36ac8 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h >> @@ -681,6 +681,7 @@ struct huge_bootmem_page { >>   }; >>   int isolate_or_dissolve_huge_page(struct page *page, struct >> list_head *list); >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn); >>   struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >>                   unsigned long addr, int avoid_reserve); >>   struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int >> preferred_nid, >> @@ -1059,6 +1060,11 @@ static inline int >> isolate_or_dissolve_huge_page(struct page *page, >>       return -ENOMEM; >>   } >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn) >> +{ >> +    return 0; >> +} >> + >>   static inline struct folio *alloc_hugetlb_folio(struct >> vm_area_struct *vma, >>                          unsigned long addr, >>                          int avoid_reserve) >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 8e1db80..a099c54 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2975,6 +2975,43 @@ int isolate_or_dissolve_huge_page(struct page >> *page, struct list_head *list) >>       return ret; >>   } >> +/* >> + *  replace_free_hugepage_folios - Replace free hugepage folios in a >> given pfn >> + *  range with new folios. >> + *  @stat_pfn: start pfn of the given pfn range >> + *  @end_pfn: end pfn of the given pfn range >> + *  Returns 0 on success, otherwise negated error. >> + */ >> +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned >> long end_pfn) >> +{ >> +    struct hstate *h; >> +    struct folio *folio; >> +    int ret = 0; >> + >> +    LIST_HEAD(isolate_list); >> + >> +    while (start_pfn < end_pfn) { >> +        folio = pfn_folio(start_pfn); >> +        if (folio_test_hugetlb(folio)) { >> +            h = folio_hstate(folio); >> +        } else { >> +            start_pfn++; >> +            continue; >> +        } >> + >> +        if (!folio_ref_count(folio)) { >> +            ret = alloc_and_dissolve_hugetlb_folio(h, folio, >> &isolate_list); >> +            if (ret) >> +                break; >> + >> +            putback_movable_pages(&isolate_list); >> +        } >> +        start_pfn++; >> +    } >> + >> +    return ret; >> +} >> + >>   struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >>                       unsigned long addr, int avoid_reserve) >>   { >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index dde19db..1dcea28 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6504,7 +6504,18 @@ int alloc_contig_range_noprof(unsigned long >> start, unsigned long end, >>       ret = __alloc_contig_migrate_range(&cc, start, end, migratetype); >>       if (ret && ret != -EBUSY) >>           goto done; >> -    ret = 0; >> + >> +    /* >> +     * When in-use hugetlb pages are migrated, they may simply be >> +     * released back into the free hugepage pool instead of being >> +     * returned to the buddy system. After the migration of in-use >> +     * huge pages is completed, we will invoke the >> +     * replace_free_hugepage_folios() function to ensure that >> +     * these hugepages are properly released to the buddy system. >> +     */ >> +    ret = replace_free_hugepage_folios(start, end); >> +    if (ret) >> +        goto done; >>       /* >>        * Pages from [start, end) are within a pageblock_nr_pages > >