From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCA9DEE49A4 for ; Mon, 11 Sep 2023 03:10:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E12B6B0161; Sun, 10 Sep 2023 23:10:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 091746B0162; Sun, 10 Sep 2023 23:10:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9B286B0163; Sun, 10 Sep 2023 23:10:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D75AE6B0161 for ; Sun, 10 Sep 2023 23:10:37 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9E3FD1407FC for ; Mon, 11 Sep 2023 03:10:37 +0000 (UTC) X-FDA: 81222838914.12.4545790 Received: from out-227.mta1.migadu.com (out-227.mta1.migadu.com [95.215.58.227]) by imf11.hostedemail.com (Postfix) with ESMTP id AF0754000C for ; Mon, 11 Sep 2023 03:10:35 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=j7U67lS0; spf=pass (imf11.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694401836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X7uvq5FhhnljMN3Um+NMD1L9HxaijOwDWaM8/TobTFo=; b=v+xC3OcUSvc5A3uYnGz7oEljfqlvWjH7/TIftb+ViB1Tu1sNV1uhycXLxxvcvHC1a5mv65 E2cLcWIX/Un1oItUJpnOI0CVjmu0KsCbD+9UxV2x2YZBIVLBkmC4nNBA6hUncagSEpmOct JoYLLhaKFhJskQ9VLwT0EaPsy086lo0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694401836; a=rsa-sha256; cv=none; b=gUu7HUHZVShgZGJ6xkMOIohbNgyXMsFvmq64a7VvkHX3GqpeMTMBgVFGMJckpAJXfAZD19 ZCYYKZP1m85PMYIU5AX/cYNCpg1iEpcv2dzeBQU/nCpMYdlAhBQJ5ohZuKHR33orn7pvZt FVCdVudCKSIzENAnLPGH/+sr+xntuSo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=j7U67lS0; spf=pass (imf11.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1694401833; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X7uvq5FhhnljMN3Um+NMD1L9HxaijOwDWaM8/TobTFo=; b=j7U67lS0wfc2hHBLw72gaCHQEi0Eujt+ayOWyXCZeuwGBGx0Mv2vOKHyCh1z8plGBlgPsw LKMD3BTXrypijV/6QIoL+g6erYBfwO8bnC8i3ehQCIaDZQOwvsY3GmJ+JlQ8t6UoALtGmz Io2ssCgauX8CWcLcaUMfVPvdrEeAHRc= Mime-Version: 1.0 Subject: Re: [PATCH v2 07/11] hugetlb: perform vmemmap restoration on a list of pages X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230908205340.GA62663@monkey> Date: Mon, 11 Sep 2023 11:10:18 +0800 Cc: Linux-MM , LKML , Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song , Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230905214412.89152-1-mike.kravetz@oracle.com> <20230905214412.89152-8-mike.kravetz@oracle.com> <5e091211-9a32-8480-55fb-faff6a0fadef@linux.dev> <38E2F051-E00B-4104-A616-0EEB2729386F@linux.dev> <20230906211234.GC3612@monkey> <20230907185421.GD3640@monkey> <20230908205340.GA62663@monkey> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: AF0754000C X-Rspam-User: X-Stat-Signature: omsnmcx588qjs7ydhpxy1cpfabmg98mk X-Rspamd-Server: rspam03 X-HE-Tag: 1694401835-625991 X-HE-Meta: U2FsdGVkX1+iuSpZ9ylBnmV2gUbIcou9A7YfY0KpdBJG3Q05f4e8p0RTOSMmOYnsygP4RycgIvuYakIqz0lcj/9fRHYm7wqXGlUJW4j8bVhgNvn/FERvTjrTphDLdZyjWacJfDGUEUVeuRBNyzZ+77KkEisc/wblMD0sCXeXdxCo194hUVhWtLAbZrAjKDXRrkJjOcenUyHhdY3M2YVXpnk6rM0pQJ0H3sTAKDJ8n76897cKlug3phElqysio/boMNlyzjCN2eUn9TwVcv2OusP7nv+G9+uqPYDV9gP2/75CFjKez7ilzIo6DYKqOlNsLaoLq+M+o8uxBryxo3GOzMcJsiJM5NyfPSKctqJtb7u29UIp6wJ2XTU6s92Vp1LaddPZOoaVwBB4ULWboWs3TVGX96sIpzzwGTEnrVx3a9EC1GkoFQkmEiricEleRyaeO2uZQs/59jwibS/K9sw7Z40GAqEeKvC35+EbdYJQhmGIVhp1Wzs52UtPDN4cR2eDJsuqsQK0T9acasbemQsyIXWnCfejBqvMxg10/2gfJRH+gKoTgWGDEgiszMyoH4gRxVXABjnrFlahxwKMmMqfIBBQrejonYqP4GfLh8m62wJ6A688CaDXCpsa3WIFYtYKpxeo8KeTFe6m30whXa9eOSgYdh5qzlAHabxDT1nScJziIPaV9Q0ldATcjQCIrHAwgIgdBIzmkFzIt+hMpqZN8bTCv4sAuu0wSP+f31QKw+tacjWEQvGyqM+SCNXijENY/Os2qrFr1a8asD3nOLovjJ3YeMUOq4B/n7mAR5KXfv/5D49u29XLyr+1/Uu6r2QSFGSjXl302yqseEGg5dlz/c6ZGvv+/CgxtXq7e4nOLigKNj6Glw4fz7mfG/ngXrAvE2oBYJ5u932/B+2kU3+3kCaTjfkJT3AaXc+XgDfSaJqZoHVvyrBgASqE0e/Z6P0puCIbjE0p7W/PqhCq5J/ n4ZmmS7t gUU1Ocrkd5w7Qz7GBvIfYpIDqUahUUlMwfAQrMarcT2tm0bofJ80cowYesDW2F9/COi14CnK6844BIspmkzAItlH5MhyHZVFdT31iWSdlpzrgB9Wod7iCgcF5/a3pAfx4oc2SojW4N7BFN5N3VtpUqag3A+l+vDBOqUhGwQDmgCdxf34ZG7HKPin8S3Etcs9VMRlCTcMBCWGQ4gI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 9, 2023, at 04:53, Mike Kravetz = wrote: >=20 > On 09/07/23 11:54, Mike Kravetz wrote: >> On 09/07/23 11:33, Muchun Song wrote: >>>=20 >>>=20 >>>> On Sep 7, 2023, at 05:12, Mike Kravetz = wrote: >>>>=20 >>>> On 09/06/23 16:07, Muchun Song wrote: >>>>>> On Sep 6, 2023, at 15:33, Muchun Song = wrote: >>>>>> On 2023/9/6 05:44, Mike Kravetz wrote: >>>>>>> When removing hugetlb pages from the pool, we first create a = list >>>>>>> of removed pages and then free those pages back to low level = allocators. >>>>>>> Part of the 'freeing process' is to restore vmemmap for all base = pages >>>>>>> if necessary. Pass this list of pages to a new routine >>>>>>> hugetlb_vmemmap_restore_folios() so that vmemmap restoration can = be >>>>>>> performed in bulk. >>>>>>>=20 >>>>>>> Signed-off-by: Mike Kravetz >>>>>>> --- >>>>>>> mm/hugetlb.c | 3 +++ >>>>>>> mm/hugetlb_vmemmap.c | 13 +++++++++++++ >>>>>>> mm/hugetlb_vmemmap.h | 5 +++++ >>>>>>> 3 files changed, 21 insertions(+) >>>>>>>=20 >>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>>>>> index 554be94b07bd..dd2dbc256172 100644 >>>>>>> --- a/mm/hugetlb.c >>>>>>> +++ b/mm/hugetlb.c >>>>>>> @@ -1838,6 +1838,9 @@ static void = update_and_free_pages_bulk(struct hstate *h, struct list_head *list) >>>>>>> { >>>>>>> struct folio *folio, *t_folio; >>>>>>> + /* First restore vmemmap for all pages on list. */ >>>>>>> + hugetlb_vmemmap_restore_folios(h, list); >>>>>>> + >>>>>>> list_for_each_entry_safe(folio, t_folio, list, lru) { >>>>>>> update_and_free_hugetlb_folio(h, folio, false); >>>>>>> cond_resched(); >>>>>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>>>>> index ac5577d372fe..79de984919ef 100644 >>>>>>> --- a/mm/hugetlb_vmemmap.c >>>>>>> +++ b/mm/hugetlb_vmemmap.c >>>>>>> @@ -481,6 +481,19 @@ int hugetlb_vmemmap_restore(const struct = hstate *h, struct page *head) >>>>>>> return ret; >>>>>>> } >>>>>>> +/* >>>>>>> + * This function will attempt to resore vmemmap for a list of = folios. There >>>>>>> + * is no guarantee that restoration will be successful for all = or any folios. >>>>>>> + * This is used in bulk operations, and no feedback is given to = the caller. >>>>>>> + */ >>>>>>> +void hugetlb_vmemmap_restore_folios(const struct hstate *h, = struct list_head *folio_list) >>>>>>> +{ >>>>>>> + struct folio *folio; >>>>>>> + >>>>>>> + list_for_each_entry(folio, folio_list, lru) >>>>>>> + (void)hugetlb_vmemmap_restore(h, &folio->page); >>>>>>=20 >>>>>> I am curious about the purpose of "void" here, seems it it not = necessnary, >>>>>> ritgh? We cound see so many palces where we do not add the void = if the caller >>>>>> does not care about the return value of the callee. >>>>>=20 >>>>> Another question: should we stop restoring vmemmap pages when >>>>> hugetlb_vmemmap_restore() fails? In which case, I suspect there >>>>> is no memory probably, there is no need to continue, right? >>>>=20 >>>> Recall that the list of hugetlb pages may be from multiple nodes. = My first >>>> thought was that we should continue because memory allocation may = fail on one >>>> node but succeed on another. However, with >>>> = https://lore.kernel.org/linux-mm/20230905031312.91929-1-yuancan@huawei.com= / >>>> memory allocation should fall back to other nodes. So, yes I do = believe it >>>> would make sense to stop when hugetlb_vmemmap_restore returns = ENOMEM as >>>> we are unlikely to make forward progress. >>>=20 >>> Agree. >>>=20 >>>>=20 >>>> Today's behavior will try to restore vmemmap for all pages. No = stopping >>>> on error. >>>>=20 >>>> I have mixed thoughts on this. Quitting on error 'seems = reasonable'. >>>> However, if we continue we 'might' be able to allocate vmemmap for = one >>>> hugetlb page. And, if we free one hugetlb page that should provide >>>> vmemmap for several more and we may be able to free most pages on = the >>>> list. >>>=20 >>> Yes. A good point. But there should be a non-optimized huge page = been >>> freed somewhere in parallel, otherwise we still cannot allocate = memory. >>=20 >> It does not have to be another huge page being freed in parallel. It >> could be that when allocating vmemmap for a 1G hugetlb page we were = one >> (4K) page short of what was required. If someone else frees a 4K = page, >> freeing the next 1G page may succeed. Right. I missed that. >> --=20 >> Mike Kravetz >>=20 >>> However, the freeing operation happens after = hugetlb_vmemmap_restore_folios. >>> If we want to handle this, we should rework = update_and_free_pages_bulk() >>> to do a try when at least a huge pages is freed. >=20 > This seemed familiar. Recall this patch which Muchun Reviewed and = James Acked, > = https://lore.kernel.org/linux-mm/20230718004942.113174-3-mike.kravetz@orac= le.com/ >=20 > If we can not restore vmemmap for a page, then it must be turned into = a > surplus huge page. In this patch (not the previous one referenced), = we > will try to restore vmemmap one more time in a later call to > update_and_free_hugetlb_folio. Certainly, we do not want to try = twice! >=20 > My 'plan' is to include the previous patch as part of this series. = With > that patch in place, the list_for_each_entry calls to = hugetlb_vmemmap_restore > can be replaced with a call to hugetlb_vmemmap_restore_folios. We = would > change the behavior of hugetlb_vmemmap_restore_folios to return an = error > instead of being of type void. If an error is returned, then we will > make another pass through the list looking for unoptimized pages and = add > them as surplus. >=20 > I think it best if we try to restore vmemmap at least once before > converting to a surplus page. Make sense. Muchun > --=20 > Mike Kravetz