From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70ED4E77188 for ; Sun, 22 Dec 2024 11:51:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FADB6B0083; Sun, 22 Dec 2024 06:51:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AB4C6B0088; Sun, 22 Dec 2024 06:51:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5737D6B0089; Sun, 22 Dec 2024 06:51:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 391246B0083 for ; Sun, 22 Dec 2024 06:51:14 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BD8081612F8 for ; Sun, 22 Dec 2024 11:51:13 +0000 (UTC) X-FDA: 82922428890.29.0F1DF2F Received: from m16.mail.126.com (m16.mail.126.com [117.135.210.7]) by imf21.hostedemail.com (Postfix) with ESMTP id AD0E01C0002 for ; Sun, 22 Dec 2024 11:49:59 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=mA4TqJkT; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf21.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.7 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734868237; a=rsa-sha256; cv=none; b=m1H/2qamebUg5KM8GRnkTTtJDpyuNUxP1NV/byNbtIsPDOSxiDR6svtEbIWWquLu8BBRtl zIkitfsmt2xx8SOIILbu/qoaC4qo1TXyr6Q/XdaiycCRUPOrZK75Dj8DeGD5DIFvZVOPwN hYHlhW5wV/okkoT1JM6BJV/ULlkWMvo= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=mA4TqJkT; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf21.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.7 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734868237; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8iBGyhnfqgjCTgk/XlJi8b/O6yi4Lyu+Du98lMusV1Q=; b=YCGFsi2CIVYneUzQ+4c0r1t6OINp7FEhdqxMgCFPD8OlS/nrwW31Ky8AU/0JYhiMU04pC2 TIvmTq40T88vvtW1jd6MEfnhc/taVGvCWJZYATesPvQ8hq0n5jxnfJdppUH/tqOx+HjPAG iK8Z8zXZ5mvF/wjJyy34wDcdCWXjTkM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=8iBGyhnfqgjCTgk/XlJi8b/O6yi4Lyu+Du98lMusV1Q=; b=mA4TqJkTQ1jDNlM4x2V8vxnYPgBIu7fijV3qnE4rwkNWcbehuMzDJDFYuxyQYc cXv9oywXb9kTcBhSekqX3IR/OU7waPc/ALvnr/r5vL+dX2/wd+SrObgbwM+4BejB f/Tlj/cX0sdjLxOReTaQEOQfFRtU3fAm1wvF+fGzRM/sM= Received: from [172.20.10.3] (unknown [39.144.107.68]) by gzsmtp4 (Coremail) with SMTP id qCkvCgDHf1gb_WdnUD9jCw--.18188S2; Sun, 22 Dec 2024 19:50:52 +0800 (CST) Message-ID: Date: Sun, 22 Dec 2024 19:50:45 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] replace free hugepage folios after migration To: David Hildenbrand , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, muchun.song@linux.dev, liuzixing@hygon.cn, Oscar Salvador , Michal Hocko References: <1734503588-16254-1-git-send-email-yangge1116@126.com> <0b41cc6b-5c93-408f-801f-edd9793cb979@redhat.com> <1241b567-88b6-462c-9088-8f72a45788b7@126.com> <333e584c-2688-4a3f-bc1f-2e84d5215005@126.com> <433fb64d-80e1-4d96-904e-10b51e40898d@redhat.com> From: Ge Yang In-Reply-To: <433fb64d-80e1-4d96-904e-10b51e40898d@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:qCkvCgDHf1gb_WdnUD9jCw--.18188S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3JF1xWw4rGFyUZw1DGw4fGrg_yoWxJryrpF W8Ga17trWDJr9xJrn2qw1DCr10yrsrWrWjgF1rtFWFvrsIvry7Kr12y3WY93yrAr1fGF40 vrWvqws7uF1UZa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UKFAJUUUUU= X-Originating-IP: [39.144.107.68] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWRO9G2dnuXaZzgABs1 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: AD0E01C0002 X-Stat-Signature: 9uggyk7uw4ha84sw83gyup6txgktt9d1 X-Rspam-User: X-HE-Tag: 1734868199-98524 X-HE-Meta: U2FsdGVkX19c6zbr+RodeQ+tMpl47IKpxYyDmzBg+MOaKdFGIkhLs4iH0CtdRwAUnmeHTcujMwZuOJYkpo8nc7D3jtZLDEkLf83J7OTu+yRv+Xdqh0R/Pm/R2Re3rHCb2YyvIs89mDLj3p1GpuOm0gSf/kgjDpooz6eYMJa5etFPSqPiomj735iJOoyeivf7aH4va31tswGd9BzHE5EJo5+MARLD3SpKoxLwLmxVa8vRPZHZrdD1SD7TlDL8bgQciCwNkC8tkZe1YUA1SD4Uq4unUh5tKHyfH7PU8aa1sDLlUXhohgrlfVHLp7ZkXq4ZiYkc8yXduN1sSguXhvYuha7naijVw0j5tEjtVqAUv0OvWtWjcQyTEa1YddT/TsYA9Rif5aLaYbrFseFRYDlX5u/yYj13NfEp288IYIAEgPf1eLwF8ATSbmS5TqrCdUfV0jWPVPZGBmOOZxdxge+TFykZAuBdUu0qNAGWJjrhdD0z/QZWjBcnm0IFr2kaq7fY9CmRk+p2993p4JCHE6+gGOYqw9QIOkw8neufsEjeNaEgqwmTwRJEz10PBKtQXTntQoO72TTB/QqIgJ/CpHGeljs42bmZ4Mc08zigSpd1ezlyvkM13mosUuHLuLol/kX25ZVs4w35XK2mAU9IdXgkMkAk6sRGTQnWPZx8naHkZMxf7f1d3IIeCMnj2xETa4h8Ny8y3DT+2r1dpgwZiNzvMWKXTdUP02/bcIuTkOA63TBmFn/2xtM5iEL0dmLRhVP9wIjArCiyFbP/SxO1zVhIxYPwmI3jluNmtbQqm6CrYRvNVNeqfy8GdoTU49ayxUWf2Ergso0zdmEw6tRBvQENB1joZbU1eD94tvz3DVEPcahufPjLIJvswMLj+0fLDcKe7OofmTnDdMstQpxF5lbIDbPBOk5NLlqQpWg/vGizdXInAurBO8RYaZZ7lHBMTDxQIjUi5zOimow9bha/UcA HFf6sWd4 Cw9B4DmTF2U8ICMMDCFLfcqLrU16mgBamIO+TCC4b4JNg3JVym5RHL6JATjM/ONPsA+saO8901lWW6lz+XxM0zpZF+i9f+Lmoy88OPeJ11754Xq+Ci5JaFCGYq/aPnyYDAACdssF+mGMZrVG2000PHQ5eo4j4doO+U37T0YobsB7lNFewnXGOlOimxN0ilfk34tIJPo4MG1yI1JneAYzZuJbRPulUCQ2Dz4RJeE1TKugADWxpjmo5hLa2vPaRfMn7JVwCKy+YqA9G0BErxKFdUUBijuR2IFtHqBN3vkxYUcs+4CqRpBQocHWrDbCWj4HBjF4uAaMyV1fpf1wzTs0LLneK4vNOjuk8zqw5jYsExJNnJaHbw0sm3fQNYe5mAO7BUKjoKKCdjD6OclJYdVzl3kskQYvNlXTY3xXhq3vFsfOCDQTACoVxzsDGKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.006523, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/21 22:32, David Hildenbrand 写道: > On 21.12.24 13:04, Ge Yang wrote: >> >> >> 在 2024/12/21 0:30, David Hildenbrand 写道: >>> On 20.12.24 09:56, Ge Yang wrote: >>>> >>>> >>>> 在 2024/12/20 0:40, David Hildenbrand 写道: >>>>> On 18.12.24 07:33, yangge1116@126.com wrote: >>>>>> From: yangge >>>>> >>>>> CCing Oscar, who worked on migrating these pages during memory >>>>> offlining >>>>> and alloc_contig_range(). >>>>> >>>>>> >>>>>> My machine has 4 NUMA nodes, each equipped with 32GB of memory. I >>>>>> have configured each NUMA node with 16GB of CMA and 16GB of in-use >>>>>> hugetlb pages. The allocation of contiguous memory via the >>>>>> cma_alloc() function can fail probabilistically. >>>>>> >>>>>> The cma_alloc() function may fail if it sees an in-use hugetlb page >>>>>> within the allocation range, even if that page has already been >>>>>> migrated. When in-use hugetlb pages are migrated, they may simply >>>>>> be released back into the free hugepage pool instead of being >>>>>> returned to the buddy system. This can cause the >>>>>> test_pages_isolated() function check to fail, ultimately leading >>>>>> to the failure of the cma_alloc() function: >>>>>> cma_alloc() >>>>>>        __alloc_contig_migrate_range() // migrate in-use hugepage >>>>>>        test_pages_isolated() >>>>>>            __test_page_isolated_in_pageblock() >>>>>>                 PageBuddy(page) // check if the page is in buddy >>>>> >>>>> I thought this would be working as expected, at least we tested it >>>>> with >>>>> alloc_contig_range / virtio-mem a while ago. >>>>> >>>>> On the memory_offlining path, we migrate hugetlb folios, but also >>>>> dissolve any remaining free folios even if it means that we will going >>>>> below the requested number of hugetlb pages in our pool. >>>>> >>>>> During alloc_contig_range(), we only migrate them, to then free >>>>> them up >>>>> after migration. >>>>> >>>>> Under which circumstances doe sit apply that "they may simply be >>>>> released back into the free hugepage pool instead of being returned to >>>>> the buddy system"? >>>>> >>>> >>>> After migration, in-use hugetlb pages are only released back to the >>>> hugetlb pool and are not returned to the buddy system. >>> >>> We had >>> >>> commit ae37c7ff79f1f030e28ec76c46ee032f8fd07607 >>> Author: Oscar Salvador >>> Date:   Tue May 4 18:35:29 2021 -0700 >>> >>>       mm: make alloc_contig_range handle in-use hugetlb pages >>>       alloc_contig_range() will fail if it finds a HugeTLB page >>> within the >>>       range, without a chance to handle them.  Since HugeTLB pages >>> can be >>>       migrated as any LRU or Movable page, it does not make sense to >>> bail >>> out >>>       without trying.  Enable the interface to recognize in-use HugeTLB >>> pages so >>>       we can migrate them, and have much better chances to succeed >>> the call. >>> >>> >>> And I am trying to figure out if it never worked correctly, or if >>> something changed that broke it. >>> >>> >>> In start_isolate_page_range()->isolate_migratepages_block(), we do the >>> >>>       ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); >>> >>> to add these folios to the cc->migratepages list. >>> >>> In __alloc_contig_migrate_range(), we migrate the pages using >>> migrate_pages(). >>> >>> >>> After that, the src hugetlb folios should still be isolated? >> Yes. >> >> But I'm >>> getting >>> confused when these pages get un-silated and putback to hugetlb/freed. >>> >> If the migration is successful, call folio_putback_active_hugetlb to >> release the src hugetlb folios back to the free hugetlb pool. >> >> trace: >> unmap_and_move_huge_page >>       folio_putback_active_hugetlb >>           folio_put >>               free_huge_folio >> >> alloc_contig_range_noprof >>       __alloc_contig_migrate_range >>       if (test_pages_isolated())  //to determine if hugetlb pages in >> buddy >>           isolate_freepages_range //grab isolated pages from freelists. >>       else >>           undo_isolate_page_range //undo isolate > > Ah, now I remember, thanks. > > So when we free an ordinary page, we put it onto the buddy isolate list, > from where we can grab it later and nobody can allocate it in the meantime. > > In case of hugetlb, we simply free it back to hugetlb, from where it can > likely even get allocated immediately again. > > I think that can actually happen in your proposal: the now-free page > will get reallocated, for example for migrating the next folio. Or some > concurrent system activity can simply allocate the now-free folio. Or am > I missing something that prevents these now-free hugetlb folios from > getting re-allocated after migration succeeded? > > > Conceptually, I think we would want migration code in the case of > alloc_contig_range() to allocate a new folio from the buddy, and to free > the old one back to the buddy immediately, without ever allowing re- > allocation of it. > > What needs to be handled is detecting that > > (a) we want to allocate a fresh hugetlb folio as migration target > (b) if migration succeeds, we have to free the hugetlb folio back to the > buddy > (c) if migation fails, we have to free the allocated hugetlb foliio back > to the buddy > > > We could provide a custom alloc_migration_target that we pass to > migrate_page to allocate a fresh hugetlb folio to handle (a). Using the > put_new_folio callback we could handle (c). (b) would need some thought. It seems that if we allocate a fresh hugetlb folio as the migration target, the source hugetlb folio will be automatically released back to the buddy system. > > Maybe we can also just mark the source folio as we isolate it, and > enlighten migration+freeing code to handle it automatically? Can we determine whether a hugetlb page is isolated when allocating it from the free hugetlb pool? dequeue_hugetlb_folio_node_exact() { list_for_each_entry(folio, &h->hugepage_freelists[nid], lru) { if (is_migrate_isolate_page(folio)) { //determine whether a hugetlb page is isolated continue; } } } > > Hoping to get some feedback from hugetlb maintainers. >