From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9722ACE79A8 for ; Wed, 20 Sep 2023 02:56:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 069A06B00F2; Tue, 19 Sep 2023 22:56:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0195D6B00F8; Tue, 19 Sep 2023 22:56:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4A5A6B00F9; Tue, 19 Sep 2023 22:56:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D4F4D6B00F2 for ; Tue, 19 Sep 2023 22:56:48 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8D02D121341 for ; Wed, 20 Sep 2023 02:56:48 +0000 (UTC) X-FDA: 81255463296.01.0921833 Received: from out-225.mta0.migadu.com (out-225.mta0.migadu.com [91.218.175.225]) by imf20.hostedemail.com (Postfix) with ESMTP id C2C7A1C0016 for ; Wed, 20 Sep 2023 02:56:46 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=WGdmrN18; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.225 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695178606; a=rsa-sha256; cv=none; b=dvpSpYuRRxnroee47Jax/Nj6EEESPrOupUQqBCP1JrUDF0L4jPAZ87gaY8K/0eZwIo53gO jMu6EU1MS5MfuIecyy/NnROt0I+gw9/978hxs1gxUZUjfBuVk6eVlVFOMr+Q9wHA7iICDd MgCOxNohXnYtgAMj4AfiHnvywV1tpBI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=WGdmrN18; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.225 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695178606; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B8LSTp3wsxfzcFZzC31jRTbSVU6OJwnSu3sTbOPn0G8=; b=01COuHy2gEqNTdz/BkrwV4coC4E3T8B9v86Z05v+p4irZpKqbcEUOxOKUF9wREzmn2KajL ZqaxM+mkw+T65RewoLblsRSmQZmuyc/xsq8+CMshSgJvsPeANo58f5wA12lF8541S9neXL JoIKSxbVcvaoLaPiOEufWtC4Kzvihhc= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695178605; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B8LSTp3wsxfzcFZzC31jRTbSVU6OJwnSu3sTbOPn0G8=; b=WGdmrN18qWd9ekOdaHd9nYKDNZV5azPdEY/azeGCfAMn1OLThE+qdnt3AdgiMmzwv5ZbHU q5UAyfIq6URew3pvGQxbcqkajr7RaAAgDqsMSAQjEZqJUEMzutPK5OStKIMe66NEHcB8W2 TjGWhXKj0YNatNWSwvXxIadN8Tejixw= Mime-Version: 1.0 Subject: Re: [PATCH v4 4/8] hugetlb: perform vmemmap restoration on a list of pages X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230919205756.GB425719@monkey> Date: Wed, 20 Sep 2023 10:56:17 +0800 Cc: Linux-MM , LKML , Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-5-mike.kravetz@oracle.com> <20230919205756.GB425719@monkey> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C2C7A1C0016 X-Stat-Signature: id7dapfg5nx84sahdt9h8xed7iomxzhz X-HE-Tag: 1695178606-139706 X-HE-Meta: U2FsdGVkX19RNez3mkYN7z5bhTF5RnYcHt5+Ns2xLqCX1ln0uh8AWGbw4UgkPbSEzXHi0pyzHjRkO7eYdLHeXwYwg+WfsxT4TZ7IoHT5011Ssz5VSJ4srQBgTX7AZzs6VPMFFdY0i1UKJ2Sdb4FxDTuluMg4um/6TbO158Q6bgGafthyC8iSebJhgTjuv+ZUR6Gl1HzlJk2uWHfSSWR6wPVo/kUjPhWsOr1tFXe+kE1odyJPvffLHcIGN8S8v4KmGILg/ELy87zP2R1z76VUe/EsbZpE6gUzU5dPlsl+qA3jBnzDD7qwTuewsSCpYx4sJFN/mFNqoyMkGolt5KeVMRIzZVn9JKroWxvaOJLBD7ihX9md1CBVTcX3FXChCA94ST73BwLSQ4i6zCgaCM39Q7Rdam7abZU9c56uUwuqlt3SUxmeUEIzPvKarno0EG8vVmwLFhQAIb5jPLs018WdkkdUEZgHFrwpGWkXBrkVOn/V6X2LnB5MDeg8YCG4Dxb+eoRFzNfVI2iBb+e/8cKcnZb/F379xn4niaqpBdDy8tTj8Q6PennbsXccmT+90bugVLad7Ak8bUVwBpWtXgbhlSj1vKQwmYvfXgamcCuW1b66L4ZH4lsmDfPIuUVp6TUhqt7nQL/7T5i/TJqaaYCfk/SUntXePLG+XgqrX9hXyjsn7r0/lWkhCt5K8TtvTazKwUPnQYN5Qqzq4x1H2m/dk7+gRih0o01UaOT/bb0vD5OwPspR1vx34JhqcGijrcI27YkTPxBr/7ksdjYKzfFIvdOAPkjZGbmPchWhEgpA5TN/jVrSAnflSLy7tdo7G34oPBtbnctkH1UaBjASEH6azg6Qb7eM1jVFgq04Se9+XU78304SldC4CndpFV/iEJ61tUBtenTYn6fr8EUcyE7kpWTwj3Cttin1EUgrPSUEf8Gvu4HG8GJYtG2YnKcK2vXMTCZiab+Aslh9jeMsB2+ M82A4sNk MMnZksIS7H53YvaJqmOcBoMVB5cwPTeVCb5tVlRgKrnjoYpV2yxUVey2NG2eviEjy8ThZRNZvoC4wpk5Fz6UPrbON1WXt4UtDOe8hkfGQwWkC3+FKGwQjl+U9+hlC7saIE6yB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 20, 2023, at 04:57, Mike Kravetz = wrote: >=20 > On 09/19/23 17:52, Muchun Song wrote: >>=20 >>=20 >> On 2023/9/19 07:01, Mike Kravetz wrote: >>> The routine update_and_free_pages_bulk already performs vmemmap >>> restoration on the list of hugetlb pages in a separate step. In >>> preparation for more functionality to be added in this step, create = a >>> new routine hugetlb_vmemmap_restore_folios() that will restore >>> vmemmap for a list of folios. >>>=20 >>> This new routine must provide sufficient feedback about errors and >>> actual restoration performed so that update_and_free_pages_bulk can >>> perform optimally. >>>=20 >>> Signed-off-by: Mike Kravetz >>> --- >>> mm/hugetlb.c | 36 ++++++++++++++++++------------------ >>> mm/hugetlb_vmemmap.c | 37 +++++++++++++++++++++++++++++++++++++ >>> mm/hugetlb_vmemmap.h | 11 +++++++++++ >>> 3 files changed, 66 insertions(+), 18 deletions(-) >>>=20 >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index d6f3db3c1313..814bb1982274 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1836,36 +1836,36 @@ static void = update_and_free_hugetlb_folio(struct hstate *h, struct folio *folio, >>> static void update_and_free_pages_bulk(struct hstate *h, struct = list_head *list) >>> { >>> + int ret; >>> + unsigned long restored; >>> struct folio *folio, *t_folio; >>> - bool clear_dtor =3D false; >>> /* >>> - * First allocate required vmemmmap (if necessary) for all folios = on >>> - * list. If vmemmap can not be allocated, we can not free folio = to >>> - * lower level allocator, so add back as hugetlb surplus page. >>> - * add_hugetlb_folio() removes the page from THIS list. >>> - * Use clear_dtor to note if vmemmap was successfully allocated = for >>> - * ANY page on the list. >>> + * First allocate required vmemmmap (if necessary) for all folios. >>> */ >>> - list_for_each_entry_safe(folio, t_folio, list, lru) { >>> - if (folio_test_hugetlb_vmemmap_optimized(folio)) { >>> - if (hugetlb_vmemmap_restore(h, &folio->page)) { >>> - spin_lock_irq(&hugetlb_lock); >>> + ret =3D hugetlb_vmemmap_restore_folios(h, list, &restored); >>> + >>> + /* >>> + * If there was an error restoring vmemmap for ANY folios on the = list, >>> + * add them back as surplus hugetlb pages. add_hugetlb_folio() = removes >>> + * the folio from THIS list. >>> + */ >>> + if (ret < 0) { >>> + spin_lock_irq(&hugetlb_lock); >>> + list_for_each_entry_safe(folio, t_folio, list, lru) >>> + if (folio_test_hugetlb_vmemmap_optimized(folio)) >>> add_hugetlb_folio(h, folio, true); >>> - spin_unlock_irq(&hugetlb_lock); >>> - } else >>> - clear_dtor =3D true; >>> - } >>> + spin_unlock_irq(&hugetlb_lock); >>> } >>> /* >>> - * If vmemmmap allocation was performed on any folio above, take = lock >>> - * to clear destructor of all folios on list. This avoids the = need to >>> + * If vmemmmap allocation was performed on ANY folio , take lock = to >>> + * clear destructor of all folios on list. This avoids the need = to >>> * lock/unlock for each individual folio. >>> * The assumption is vmemmap allocation was performed on all or = none >>> * of the folios on the list. This is true expect in VERY rare = cases. >>> */ >>> - if (clear_dtor) { >>> + if (restored) { >>> spin_lock_irq(&hugetlb_lock); >>> list_for_each_entry(folio, list, lru) >>> __clear_hugetlb_destructor(h, folio); >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index 4558b814ffab..463a4037ec6e 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -480,6 +480,43 @@ int hugetlb_vmemmap_restore(const struct hstate = *h, struct page *head) >>> return ret; >>> } >>> +/** >>> + * hugetlb_vmemmap_restore_folios - restore vmemmap for every folio = on the list. >>> + * @h: struct hstate. >>> + * @folio_list: list of folios. >>> + * @restored: Set to number of folios for which vmemmap was = restored >>> + * successfully if caller passes a non-NULL pointer. >>> + * >>> + * Return: %0 if vmemmap exists for all folios on the list. If an = error is >>> + * encountered restoring vmemmap for ANY folio, an error code >>> + * will be returned to the caller. It is then the responsibility >>> + * of the caller to check the hugetlb vmemmap optimized flag of >>> + * each folio to determine if vmemmap was actually restored. >>> + */ >>> +int hugetlb_vmemmap_restore_folios(const struct hstate *h, >>> + struct list_head *folio_list, >>> + unsigned long *restored) >>> +{ >>> + unsigned long num_restored; >>> + struct folio *folio; >>> + int ret =3D 0, t_ret; >>> + >>> + num_restored =3D 0; >>> + list_for_each_entry(folio, folio_list, lru) { >>> + if (folio_test_hugetlb_vmemmap_optimized(folio)) { >>> + t_ret =3D hugetlb_vmemmap_restore(h, &folio->page); >>=20 >> I still think we should free a non-optimized HugeTLB page if we >> encounter an OOM situation instead of continue to restore >> vemmmap pages. Restoring vmemmmap pages will only aggravate >> the OOM situation. The suitable appraoch is to free a non-optimized >> HugeTLB page to satisfy our allocation of vmemmap pages, what's >> your opinion, Mike? >=20 > I agree. >=20 > As you mentioned previously, this may complicate this code path a bit. > I will rewrite to make this happen. Maybe we could introduced two list passed to update_and_free_pages_bulk = (this will be easy for the callers of it), one is for non-optimized huge page, another is optimized one. In update_and_free_pages_bulk, we could first free those non-optimized huge page, and then restore vemmmap pages for those optimized ones, in which case, the code could be simple. hugetlb_vmemmap_restore_folios() dose not need to add complexity, which still continue to restore vmemmap pages and will stop once we encounter an OOM situation. Thanks. > --=20 > Mike Kravetz