From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F273EC52D7D for ; Fri, 16 Aug 2024 00:56:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FB0F6B00B2; Thu, 15 Aug 2024 20:56:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A8E06B00BD; Thu, 15 Aug 2024 20:56:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FB266B00C0; Thu, 15 Aug 2024 20:56:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3FEE96B00B2 for ; Thu, 15 Aug 2024 20:56:37 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 28D0E1612F6 for ; Fri, 16 Aug 2024 00:56:36 +0000 (UTC) X-FDA: 82456293192.20.41183E3 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf13.hostedemail.com (Postfix) with ESMTP id AF9B32001E for ; Fri, 16 Aug 2024 00:56:32 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723769739; a=rsa-sha256; cv=none; b=NhGzYdeHB/SIE/pmyxhIi9uB/3gR9BOlk4rxs958lF5TjYIF7eO3xjNAF0IHQlyU4x39Qb tgb/4fDkcAN4zSF8a9+SfMzCIuI2DeyrvNN/ilYFN0KX5Ru2ZZ2TM1hLEu1cxlx3Go2NST 4/H5FtnGRHdBIJFr1QOyxhLoTpMuU9M= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723769739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gcXi5fXDzA/LhXQArAyOB8GqSbSEcDQxPrBZv+8JQ9U=; b=DGvkvI0eCCtT5n2CThtzSjH0DnZoGazssPOcfUCUVJtLKznOrads43I2nAsczVRIn70w0L HR8I4cxtBZzbKyHQJyAAK33IA3yJvVLe22lX+8GMku5Ert9VMQ//fM7u2vLI6DjuDuDQ0g w65KwfO9RJfNRe/AEUi1ii6JD7S/SUI= Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4WlNjK4S3pz1S7jR; Fri, 16 Aug 2024 08:51:33 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id 06E441400FD; Fri, 16 Aug 2024 08:56:28 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 16 Aug 2024 08:56:27 +0800 Message-ID: <2de87220-82ab-40a9-84c8-24f24d70aa6e@huawei.com> Date: Fri, 16 Aug 2024 08:56:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC] mm: skip gigantic pages in isolate_single_pageblock() when mem offline Content-Language: en-US To: Zi Yan CC: Matthew Wilcox , Andrew Morton , David Hildenbrand , Oscar Salvador , References: <20240813125226.1478800-1-wangkefeng.wang@huawei.com> <92fedec5-62c9-4ec0-9d4c-a722b30aa63c@huawei.com> <905740F8-58C6-4333-8EA1-4A53C95CC1FE@nvidia.com> <50FEEE33-49CA-48B5-B4C5-964F1BE25D43@nvidia.com> <113f25e0-7eed-405b-9369-bc23b780d315@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemf100008.china.huawei.com (7.185.36.138) X-Rspamd-Queue-Id: AF9B32001E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: xt49jqj1ofzmxc8t31ytopoymty9k3ab X-HE-Tag: 1723769792-155168 X-HE-Meta: U2FsdGVkX19RRIP39SEKUFHZs8+gvo/+MPf6TuRMMsRGQ7THdG1VWyjr6ABDPSdMDc028ejF6dxPD0XTPjov1Gja66Uwc1B87wQEB7f6ZZ6ExWdWdji/9nvd6aUkkRY5ZKNp0kuL7hV+pS9yflDStpQljRJOGv2Qwk0KFCo2KWvJeiRb6enCns51EfU5rsYVFBU8vSvrnc/XquSPRVy0Bi8DeWq1x2+ipXl8LV/E5bMGXAPnVxS+G+KDh60GEyW+H30PNDvxLvEOYQ4LGpVasfPkJ2OojenYLWfAmAmK2JtmV6RKuClA+ACLjgVzqZtJ0aKS5cpL9iocClgQb2RmOR1yorIcDblHNJx/K408WGmqpdf0VJ8lb5wzj/8UhQLoRxxw3lE+gtitHrV6kYWMXWHnaNKMnSAYOEz9rl8/b49wobFBiKFF9WVHki6MsKGFgsvP2BLmFAXmRc4bUkfYDHyunWBV619tpTl5rvil1+GILJhA81kiFFrydMYHomapwikkmMluc0myaDZikVzNmsDL2cN+SNMsWMs5GygoLigfDCdLHpk1itaw/hr/aIwOoBGDdD4OtGc0vtgan3c0jnbGry2JT1HeZkQNtNGp4JQ5dA6VqIz7aKyWDx5UpjIetxpH95fnhlYcFHFnWq6aehymMPtXqBRA5RSZDmX+uAUtqDqSwDVgU1G4BXZJ0XIzxXH1BselvlQVpZkzynzQl+91nhWiwSUoQu4i3IIzEwkNH5c6r3pkel4SfArhP9kDa5RpUR+TQQpzLnj+vV8pj5AOnQFjmG0CK95MGUbBHqQsJu+CWrpieM/Tun3lC++ZSn0PpjbPt+vXor0JWmlUXnN6/8ye52nzt2I1KUkUgljxBpHv/8c2bDNW1BxBlQZBJJiTV0D5If2ZrTTX0XBPfKle4rSFYj8iVmEJ6DTZyBIqdGdlyONeRASUFeANiKZI08f6TiYA2fp56G95Nho PVuiAi4A +ilF/hG7byk3I7ZPB+408vGqAXq14pp+2gIfzBY17a9T/KDahDmNAqKuUJ9gphLFTx3Ncv+6HEIdcFSoaazq9QN/1A58DKKQXq5mTRNvXJFmS65+8kShArpXaNfc+SLU1FyJjRSZh8uA4Q32nRF5iZvl9waCuInME/9tidPVKTlYbmBnn5JgCm1VFeH3Ezp0gC0XU4z6iYAzMPJQVuzI9tINvGIaPaMDQGOdsmfkK5BY4NTE0MJZEmMuNrYqLjmW4JmUOUwbRTgco1vyTj/WiayIPG09k2sM8fRc85hEqHvjAuYOpjjdwmmlVp8a2Q7SrxhgxiKwxR1PHLTE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/16 0:43, Zi Yan wrote: > On 14 Aug 2024, at 22:58, Kefeng Wang wrote: > >> On 2024/8/14 22:53, Zi Yan wrote: >>> On 13 Aug 2024, at 22:01, Kefeng Wang wrote: >>> >>>> On 2024/8/13 22:59, Zi Yan wrote: >>>>> On 13 Aug 2024, at 10:46, Kefeng Wang wrote: >>>>> >>>>>> On 2024/8/13 22:03, Matthew Wilcox wrote: >>>>>>> On Tue, Aug 13, 2024 at 08:52:26PM +0800, Kefeng Wang wrote: >>>>>>>> The gigantic page size may larger than memory block size, so memory >>>>>>>> offline always fails in this case after commit b2c9e2fbba32 ("mm: make >>>>>>>> alloc_contig_range work at pageblock granularity"), >>>>>>>> >>>>>>>> offline_pages >>>>>>>> start_isolate_page_range >>>>>>>> start_isolate_page_range(isolate_before=true) >>>>>>>> isolate [isolate_start, isolate_start + pageblock_nr_pages) >>>>>>>> start_isolate_page_range(isolate_before=false) >>>>>>>> isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock >>>>>>>> __alloc_contig_migrate_range >>>>>>>> isolate_migratepages_range >>>>>>>> isolate_migratepages_block >>>>>>>> isolate_or_dissolve_huge_page >>>>>>>> if (hstate_is_gigantic(h)) >>>>>>>> return -ENOMEM; >>>>>>>> >>>>>>>> [ 15.815756] memory offlining [mem 0x3c0000000-0x3c7ffffff] failed due to failure to isolate range >>>>>>>> >>>>>>>> Fix it by skipping the __alloc_contig_migrate_range() if met gigantic >>>>>>>> pages when memory offline, which return back to the original logic to >>>>>>>> handle the gigantic pages. >>>>>>> >>>>>>> This seems like the wrong way to fix this. The logic in the next >>>>>>> PageHuge() section seems like it's specifically supposed to handle >>>>>>> gigantic pages. So you've just made that dead code, but instead of >>>>>>> removing it, you've left it there to confuse everyone? >>>>>> >>>>>> isolate_single_pageblock() in start_isolate_page_range() will be called >>>>>> from memory offline and contig allocation (alloc_contig_pages()), this >>>>>> changes only restore the behavior from memory offline code, but we still >>>>>> fail in contig allocation. >>>>>> >>>>>> From memory offline, we has own path to isolate/migrate page or dissolve >>>>>> free hugetlb folios, so I think we don't depends on the __alloc_contig_migrate_range(). >>>>>>> >>>>>>> I admit to not understanding this code terribly well. >>>>>>> >>>>>> A quick search from [1], the isolate_single_pageblock() is added for >>>>>> contig allocation, but it has negative effects on memory hotplug, >>>>>> Zi Yan, could you give some comments? >>>>>> >>>>>> [1] https://lore.kernel.org/linux-mm/20220425143118.2850746-1-zi.yan@sent.com/ >>>>> >>>>> Probably we can isolate the hugetlb page and use migrate_page() instead of >>>>> __alloc_contig_migrate_range() in the section below, since we are targeting >>>>> only hugetlb pages here. It should solve the issue. >>>> >>>> For contig allocation, I think we must isolate/migrate page in >>>> __alloc_contig_migrate_range(), but for memory offline,(especially for >>>> gigantic hugepage)as mentioned above, we already have own path to >>>> isolate/migrate used page and dissolve the free pages,the >>>> start_isolate_page_range() only need to mark page range MIGRATE_ISOLATE, >>>> that is what we did before b2c9e2fbba32, >>>> >>>> start_isolate_page_range >>>> scan_movable_pages >>>> do_migrate_range >>>> dissolve_free_hugetlb_folios >>>> >>>> Do we really need isolate/migrate the hugetlb page and for memory >>>> offline path? >>> >>> For memory offline path, there is do_migrate_range() to move the pages. >>> For contig allocation, there is __alloc_contig_migrate_range() after >>> isolation to migrate the pages. >>> >>> The migration code in isolate_single_pageblock() is not needed. >>> Something like this would be OK, just skip the page and let either >>> do_migrate_range() or __alloc_contig_migrate_range() to handle it: >> >> Oh, right, for alloc_contig_range(), we do have another __alloc_contig_migrate_range() after start_isolate_page_range(), then we >> could drop the following code, >> >>> >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index 042937d5abe4..587d723711c5 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -402,23 +402,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >>> >>> #if defined CONFIG_COMPACTION || defined CONFIG_CMA >>> if (PageHuge(page)) { >>> - int page_mt = get_pageblock_migratetype(page); >>> - struct compact_control cc = { >>> - .nr_migratepages = 0, >>> - .order = -1, >>> - .zone = page_zone(pfn_to_page(head_pfn)), >>> - .mode = MIGRATE_SYNC, >>> - .ignore_skip_hint = true, >>> - .no_set_skip_hint = true, >>> - .gfp_mask = gfp_flags, >>> - .alloc_contig = true, >>> - }; >>> - INIT_LIST_HEAD(&cc.migratepages); >>> - >>> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >>> - head_pfn + nr_pages, page_mt); >>> - if (ret) >>> - goto failed; >>> pfn = head_pfn + nr_pages; >>> continue; >>> } >> >> >> But we need to remove the CONFIG_COMPACTION/CMA too, thought? >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 042937d5abe4..785c2d320631 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -395,30 +395,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >> unsigned long head_pfn = page_to_pfn(head); >> unsigned long nr_pages = compound_nr(head); >> >> - if (head_pfn + nr_pages <= boundary_pfn) { >> - pfn = head_pfn + nr_pages; >> - continue; >> - } >> - >> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA >> - if (PageHuge(page)) { >> - int page_mt = get_pageblock_migratetype(page); >> - struct compact_control cc = { >> - .nr_migratepages = 0, >> - .order = -1, >> - .zone = page_zone(pfn_to_page(head_pfn)), >> - .mode = MIGRATE_SYNC, >> - .ignore_skip_hint = true, >> - .no_set_skip_hint = true, >> - .gfp_mask = gfp_flags, >> - .alloc_contig = true, >> - }; >> - INIT_LIST_HEAD(&cc.migratepages); >> - >> - ret = __alloc_contig_migrate_range(&cc, head_pfn, >> - head_pfn + nr_pages, page_mt); >> - if (ret) >> - goto failed; >> + if (head_pfn + nr_pages <= boundary_pfn || >> + PageHuge(page)) >> pfn = head_pfn + nr_pages; >> continue; >> } >> @@ -432,7 +410,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >> */ >> VM_WARN_ON_ONCE_PAGE(PageLRU(page), page); >> VM_WARN_ON_ONCE_PAGE(__PageMovable(page), page); >> -#endif >> goto failed; >> } > > That looks good to me. Thanks for your comments, will send a new version soon. > > Best Regards, > Yan, Zi