From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38D33C3ABBC for ; Mon, 12 May 2025 06:38:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 554806B00B3; Mon, 12 May 2025 02:38:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5010E6B00B4; Mon, 12 May 2025 02:38:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C9586B00B5; Mon, 12 May 2025 02:38:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1A8BC6B00B3 for ; Mon, 12 May 2025 02:38:39 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 27A01805E6 for ; Mon, 12 May 2025 06:38:39 +0000 (UTC) X-FDA: 83433302358.30.14DE119 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf09.hostedemail.com (Postfix) with ESMTP id 2A9C7140002 for ; Mon, 12 May 2025 06:38:35 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747031917; a=rsa-sha256; cv=none; b=MDhSiaafvSX6dyKNSVKuK+uscDS86ovjnSumzGsiP5hSbpaT4qLf9tnxkdQgdN58vI+yf2 uIE5TyNz5UMInA2B1XCHnQ2qDRAsQqoepq9RfyPg6Iv/MboTRKgNxumGAGuWY6gTjwxJmv PvG7CsB95/ZY3v2zQ1Wc0uyE0I55aw8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747031917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=slEYSenCZ2naapgIXU2Av1JPuSqmEfFa+s8jnB3RF5A=; b=mUowMYJQCJghQgycVcSaavYG6z9CQz/ZnMaVYus+rS3g3P6LfrB+5VBQjF/zv5PlIzMbJZ pAQcmEtAqiLPOoYw8tpy3VrKPJD85Mmy/suqkhnNvwT7eY7ctlsx5Kq+pD1Ag+hno7o7vl HhMYPot31n4cszLGnn+7uGI7p3zJX+k= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4ZwqbJ3JBSz1Z1Zl; Mon, 12 May 2025 14:34:52 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 7DCAC1800B1; Mon, 12 May 2025 14:38:30 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 12 May 2025 14:38:29 +0800 Message-ID: <80039e40-a518-a85b-b955-96fb048a2dd0@huawei.com> Date: Mon, 12 May 2025 14:38:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER To: Zi Yan , CC: Andrew Morton , , , , References: <20250312084705.2938220-1-tujinjiang@huawei.com> <20250417195935.2ac19ac5f92add5931b6fa5a@linux-foundation.org> <6E553AA1-5B53-4E52-9940-3B8E0DE36FC1@nvidia.com> From: Jinjiang Tu In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2A9C7140002 X-Rspam-User: X-Stat-Signature: njkgsitpkr55hno5u4ekfr5tdhxpbci8 X-HE-Tag: 1747031915-213475 X-HE-Meta: U2FsdGVkX1/MNJ+cyowm5k36pDuh1YCOT88gwpvVXFgG6YHY22bE+MQ4K5pr8tfBZUQjDQv5Hx3X7bfw0xY5CYlXM7sJhp0jbaLb73GyjzOyHpYtCKscorSpikapTcOxii/7DNSRjGw/kfYZQGwHkcxXpaz3yYRT1YVJGCo+kC/cXNpjERJerDg0HYzz5QO7WFPafZi1LGykMjUI/Z2auMpez1+EU+VK9nO6EiNc1OK8abCQuU5zWMGrZLkM7P6hH3S2gM0fI5tSaG0574wabEqtoedscJiwA+Ks1EuTP29iwhBRhuperzRI4OI/I6JWec71trQMa/Z4iNQvAS1ARR8FS2fpVGjvO+0Al/tG7VyQmy0E0d0Oygr14Hd9qWrchzspnOixtQ8KEAfbEshhMJ0nqKuaKHpZVmGs39FVmBEVhtn7NH0pxAaoiCw3lbRZtz1KI9Zn9k4gFWgZt5E/1PRWBTeOhNxefuGxwRlQgJ8CRBTysRSoK+1XR2D/4pWEFEudZb/L1eukklgzUIgweMLDxRI1sDIa2cuIj+nrtU5BTwC2j6x7t9Eu83Dp1TExh7RP6gILc1FBiitPRbpDF3MdWH6qv3ntuQ1w4D18lycHcE4OcIs2ybWdqMRZDxDwg1GGiDhrKqeRWWKA45aFzwetFPNbYAmrAxktPjeGhn1dhmzG+w3htS/+3z1OPMkIM6rMD8aidcviXkePBrSVn+REMk26UntDEDC7Mwg1iveCM/rah1iXbZnWzhevon7zYzPWTEWeeeF5+kjpAKENj8HzW1cBwGetwaZgKMHXM7izW5c+ftdTT6Hz9mNM5Ux7FVLfG80wtL5voBiW7TC2YrqktdaeZHZdpqiPoPD0rVjumjsQg8BF654ssEpk99OfemaaAmeBoWEaaJjEuWjMNS97wJV6fH19Ui3bGTTFn6KQF4j0H7y4NtAadrifPSlKHc+CTwKGUkoRPZnFuVf C4Q9/Lvk QhOaIM8x7hfj975f3CNdeGiuf//zwvu8oRHUxWSt1IeJGAma+PHfBN/Ajxn48ARCBNLzUMDw/Jz+rsD0VmTHVsir4fT8A/Jg+P/Q6Rxg3ADP1MokG5jI5UOczainxNq5p3Ww7azUGunrYPY0xXkBdLBHdmc0I5H5FZpIlpB6oE0rTLwPAinMRFUYUhPvsGsCK68m1CO+bvYxc3KFWBLpxL7er6H6Lfg79BIkY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/4/19 9:32, Zi Yan 写道: > On 18 Apr 2025, at 20:54, Jinjiang Tu wrote: > >> 在 2025/4/19 5:32, Zi Yan 写道: >>> Hi Jinjiang, >>> >>> On 17 Apr 2025, at 22:59, Andrew Morton wrote: >>> >>>> On Wed, 12 Mar 2025 16:47:05 +0800 Jinjiang Tu wrote: >>>> >>>>> When calling alloc_contig_range() with __GFP_COMP and the order of >>>>> requested pfn range is pageblock_order, less than MAX_ORDER, I triggered >>>>> WARNING as follows: >>>>> >>>>> PFN range: requested [2150105088, 2150105600), allocated [2150105088, 2150106112) >>>>> WARNING: CPU: 3 PID: 580 at mm/page_alloc.c:6877 alloc_contig_range+0x280/0x340 >>> Basically, you are using alloc_contig_range() to allocate a compound page >>> that can be allocated from buddy allocator, since order is < MAX_ORDER. >>> What is the use case? Why is alloc_contig_range() used? >> In CMA case, alloc_contig_range() is used to allocate from requested pfn range, and the order may >> be < MAX_ORDER. > But why do you need __GFP_COMP? I thought __GFP_COMP was only used for > 1GB hugetlb. > >>>>> alloc_contig_range() marks pageblocks of the requested pfn range to be >>>>> isolated, migrate these pages if they are in use and will be freed to >>>>> MIGRATE_ISOLATED freelist. >>>>> >>>>> Suppose two alloc_contig_range() calls at the same time and the requested >>>>> pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400) >>>>> respectively. Suppose the two memory range are in use, then >>>>> alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED >>>>> freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger >>>>> buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in >>>>> alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING. >>>>> >>>>> To fix it, call free_contig_range() to free the excess pfn range. >>>> This has been in mm-hotfixes for a month without issue. Is there any >>>> reviewer interest? >>>> >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -6528,7 +6528,8 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, >>>>> goto done; >>>>> } >>>>> >>>>> - if (!(gfp_mask & __GFP_COMP)) { >>>>> + if (!(gfp_mask & __GFP_COMP) || >>>>> + (is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)) { >>>>> split_free_pages(cc.freepages, gfp_mask); >>> This does not look right to me. When a compound page is requested, >>> alloc_contig_range() should give a compound page, but split_free_pages() >>> will make the free page as a list of contiguous order-0 pages. >>> >>> I do not think we should keep this patch. >>> >>> Jinjiang, let me know if I miss anything. >> After split_free_pages(), below code is execucted to collapse the contiguous order-0 pages >> to a compound page. This is wrong. After split_free_pages(), these pages are order-0 pages with refcount 1. Call prep_new_page(head) and set_page_refcounted(head) lead to VM_BUG_ON_PAGE(page_ref_count(page), page). > OK, got it. Since cc.freepages can be MAX_PAGE_ORDER and the requested > PFN range is smaller than MAX_PAGE_ORDER. A more “right” way of handling > this would be in the second if (the one shown below), you check order > against the cc.freepages order and free out of requested range pages. > But that might be too complicated. Your approach is simpler. For (outer_start != start) || (outer_end != end) case, We could split the pages to order-0 list, and only call post_alloc_hook/set_page_refcounted for pages in [outer_start, start), [end, outer_end) range, and call free_contig_range() for them. How about this approach? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ff19413e876..a80767621c52 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6625,6 +6625,37 @@ static void split_free_pages(struct list_head *list, gfp_t gfp_mask) } } +static void split_pages_to_order0(struct list_head *list) +{ + int order; + + for (order = 0; order < NR_PAGE_ORDERS; order++) { + struct page *page, *next; + int nr_pages = 1 << order; + + list_for_each_entry_safe(page, next, &list[order], lru) { + int i; + + list_del(&page->lru); + for (i = 0; i < nr_pages; i++) + list_add_tail(&page[i].lru, &list[0]); + } + } +} + +static void free_pfn_range(unsigned long start, unsigned end, gfp_t gfp_mask) +{ + struct page *page; + unsigned long i; + + page = pfn_to_page(start); + for (i = 0; i < end - start; ++i, ++page) { + post_alloc_hook(page, 0, gfp_mask); + set_page_refcounted(page); + } + free_contig_range(start, end - start); +} + static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask) { const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM; @@ -6800,7 +6831,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, * isolated free pages can have higher order than the requested * one. Use split_free_pages() to free out of range pages. */ - if (!(gfp_mask & __GFP_COMP) || range_order < MAX_PAGE_ORDER) { + if (!(gfp_mask & __GFP_COMP)) { split_free_pages(cc.freepages, gfp_mask); /* Free head and tail (if any) */ @@ -6809,8 +6840,17 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, if (end != outer_end) free_contig_range(end, outer_end - end); - outer_start = start; - outer_end = end; + } else if ((outer_start != start) || (end != outer_end)) { + struct page *page; + int i; + + split_pages_to_order0(cc.freepages); + + if (start != outer_start) + free_pfn_range(outer_start, start, gfp_mask); + + if (end != outer_end) + free_pfn_range(end, outer_end, gfp_mask); } > > Can you add a comment above > “(is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)” to > explain why __GFP_COMP needs split_free_pages()? Something like: > > With __GFP_COMP and the requested order < MAX_PAGE_ORDER, isolated free > pages can have higher order than the requested one. Use split_free_pages() > to free out of range pages. > > >>   if (start == outer_start && end == outer_end && is_power_of_2(end - start)) { > is_power_of_2(end - start) is used twice. Can you add a variable for it > if that sounds good to you? > > Thanks. > >>         struct page *head = pfn_to_page(start); >>         int order = ilog2(end - start); >> >>         check_new_pages(head, order); >>         prep_new_page(head, order, gfp_mask, 0); >>         set_page_refcounted(head); >> >>   } >> >> Thanks. >> >>>>> /* Free head and tail (if any) */ >>>>> @@ -6536,7 +6537,15 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, >>>>> free_contig_range(outer_start, start - outer_start); >>>>> if (end != outer_end) >>>>> free_contig_range(end, outer_end - end); >>>>> - } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) { >>>>> + >>>>> + outer_start = start; >>>>> + outer_end = end; >>>>> + >>>>> + if (!(gfp_mask & __GFP_COMP)) >>>>> + goto done; >>>>> + } >>>>> + >>>>> + if (start == outer_start && end == outer_end && is_power_of_2(end - start)) { >>>>> struct page *head = pfn_to_page(start); >>>>> int order = ilog2(end - start); >>>>> >>> Best Regards, >>> Yan, Zi > > Best Regards, > Yan, Zi