From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 92CB7C48291
	for <linux-mm@archiver.kernel.org>; Mon,  5 Feb 2024 09:09:59 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DDC286B0072; Mon,  5 Feb 2024 04:09:58 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D8CF26B0075; Mon,  5 Feb 2024 04:09:58 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C54276B0078; Mon,  5 Feb 2024 04:09:58 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id B1F3F6B0072
	for <linux-mm@kvack.org>; Mon,  5 Feb 2024 04:09:58 -0500 (EST)
Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 8401140974
	for <linux-mm@kvack.org>; Mon,  5 Feb 2024 09:09:58 +0000 (UTC)
X-FDA: 81757178076.28.3C0AF2F
Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175])
	by imf30.hostedemail.com (Postfix) with ESMTP id 7BEF88001C
	for <linux-mm@kvack.org>; Mon,  5 Feb 2024 09:09:56 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=F8fWMFda;
	spf=pass (imf30.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=muchun.song@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1707124197;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=mhxb5dmwj+zBjovPcZizdw/GjpdxEbz8KzJs8VAi2LU=;
	b=aYJ6YtrMQsUM9zh+y7BXY1hLXdyrXgmJONQp6JwvnT5VSl92waXehvBBZ2JPzpfucwGFOS
	Ntq3ENLGk/4KwEEOC95mnBJktjHqNeHmASOjjPKypEG5gPcFJuANZkS1wWxVZQ4r3yRNwu
	tZ3ZTzlkmgZ/ezMzXC1PP8a3kdygoLw=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=F8fWMFda;
	spf=pass (imf30.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=muchun.song@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707124197; a=rsa-sha256;
	cv=none;
	b=mZq+L7w/QHGNMMKmPVEtE4lLg24C3MZ40X2SH6u+wqYsuxgyP8HGGpNnISW68dL8p7fdg4
	77aCd3BkKWwfHvLUf4F6x+c/tFjVbmjACaRmmSr1aVKiiPlJti0I1JpdHf1WWfzE/BCm7V
	P5NpmkFXcOVD+/F2Rka9WyCbdtyY9kE=
Content-Type: text/plain;
	charset=us-ascii
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1707124194;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=mhxb5dmwj+zBjovPcZizdw/GjpdxEbz8KzJs8VAi2LU=;
	b=F8fWMFdaE00IekCFhGVK7459OpbK1CyWGXZLC/Rg7BQAmW/DCF9tA6ACZaVTqOp77KNkcy
	Yz2JfszubKe2hrWmyefcbVkezQ1V3i8QmwBKosMv63sfU0Zku58u1CByDafiS+kNd5pRfy
	LxPMiCH5osyM5YZ4vg9W/R4Sq2/4XAE=
Mime-Version: 1.0
Subject: Re: [PATCH v5 7/7] hugetlb: parallelize 1G hugetlb initialization
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Muchun Song <muchun.song@linux.dev>
In-Reply-To: <277e0eed-918f-414f-b19d-219bd155ac14@linux.dev>
Date: Mon, 5 Feb 2024 17:09:15 +0800
Cc: David Hildenbrand <david@redhat.com>,
 David Rientjes <rientjes@google.com>,
 Mike Kravetz <mike.kravetz@oracle.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 Tim Chen <tim.c.chen@linux.intel.com>,
 Linux-MM <linux-mm@kvack.org>,
 LKML <linux-kernel@vger.kernel.org>,
 ligang.bdlg@bytedance.com
Content-Transfer-Encoding: quoted-printable
Message-Id: <6A148F29-68B2-4365-872C-E6AB599C55F6@linux.dev>
References: <20240126152411.1238072-1-gang.li@linux.dev>
 <20240126152411.1238072-8-gang.li@linux.dev>
 <f8c89a25-e7f2-4f3c-a99a-a1945e18e026@linux.dev>
 <277e0eed-918f-414f-b19d-219bd155ac14@linux.dev>
To: Gang Li <gang.li@linux.dev>
X-Migadu-Flow: FLOW_OUT
X-Rspamd-Queue-Id: 7BEF88001C
X-Rspam-User: 
X-Stat-Signature: 7dcy51oa4bp5rwpbre9eu4odhjpgi4rj
X-Rspamd-Server: rspam01
X-HE-Tag: 1707124196-385804
X-HE-Meta: U2FsdGVkX19vxoqaAwb5xeU0cHdkNCf0uKcfTS+71E0Mm7CBfwNzKwXN1oH5y7TKTG7dTzdgv0WUrjme1AuwHXO1KmxW8z7NlmLllUAGA6ucWSS2TIn/L4M+YurL3yeF/EicDE94XF0ucw746RhW/cCTHsJVHr0s2z5QL0OofRPmWwd859ayzbrtDI2rmFJtMcUD9Airv33WJa4vFBdp3cV+P9xZ1GS/cy7GNTK/p/+rTmQ7Z1ueBjyu2FX5KikCt+1QwGrV9KVQea9BUovpIQhPmwhOtcCN9zlvSorH0Hx5xA7Sz7LByxPvjIXj10K8W62W8dwmlOob4aF93ONS32yzynAAQJh5mO74UjCSrhx7iB1GCuuOyOPvFG5UAkH/7iJ2BRCTbfJ0uFbWTCuwy3WbhaBVXwtK6ZFyAwSOdVR2DTDUQQGuHQDQ00glPqWBkvdsKiyj4t8j9/WL/4BVlJ81x5LK4WL/qObFe0a9fDhT9N0s7iYtoSgXg/yThz7rESGQFgZse2P3bWlGA+eNETHTEFPOdnHadUIIH0ja0a6falPjb3iThxK7GSE3ISeqc3PF4yeIxOjmLqhT2Ve+MepWUid5wGvP2yUK6+JJCZbwakk+8XOaK/weTvqWzlqtBI/Dhdx4FGKNgfOsJLivGee5KkIJWL9T0J6UJfxgfoMHca7xAZlK09zrNXyqDaSYXTq21ilzU+bdcFM3XiHxWTWr7pQ9/ZrwHfKhzSGXIttFsXTM6daga6h3Ul6ZS+OaEjc5/koEN0Tuku8xdz7SwLTcV7GLpJ4G8O5/nSXU9Qe789cwvRdHgeK8VU3swAsL0yYXiyx+tfTcjIz9I9SWX5rDAABlWJ0RIPHbD558JHFQ4jMznOoSRqT9qVoz6zevkD+9jcyP1s8iqsAJYkPMjxHc4zgH6Za6uR/WV5zKAvXIp/QLcofBZcaL/QzeTslK6Sg2zM5RFGL/MQ7uF/h
 dxbWODCc
 4ltdKbf/pocf1ViEo8nPje1tt7uwr/3uNeaVYEHCukx/wVtMoAibo8OhiZ+MFWZPlmYhfw73Puc/LJXf+XCsTIsFR2cuYzBcWfGYgaakspQbhazCEWhFu1rGB0g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


> On Feb 5, 2024, at 16:26, Gang Li <gang.li@linux.dev> wrote:
>=20
>=20
>=20
> On 2024/2/5 15:28, Muchun Song wrote:
>> On 2024/1/26 23:24, Gang Li wrote:
>>> @@ -3390,8 +3390,6 @@ static void __init =
prep_and_add_bootmem_folios(struct hstate *h,
>>>       /* Send list for bulk vmemmap optimization processing */
>>>       hugetlb_vmemmap_optimize_folios(h, folio_list);
>>> -    /* Add all new pool pages to free lists in one lock cycle */
>>> -    spin_lock_irqsave(&hugetlb_lock, flags);
>>>       list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
>>>           if (!folio_test_hugetlb_vmemmap_optimized(folio)) {
>>>               /*
>>> @@ -3404,23 +3402,27 @@ static void __init =
prep_and_add_bootmem_folios(struct hstate *h,
>>>                       HUGETLB_VMEMMAP_RESERVE_PAGES,
>>>                       pages_per_huge_page(h));
>>>           }
>>> +        /* Subdivide locks to achieve better parallel performance =
*/
>>> +        spin_lock_irqsave(&hugetlb_lock, flags);
>>>           __prep_account_new_huge_page(h, folio_nid(folio));
>>>           enqueue_hugetlb_folio(h, folio);
>>> +        spin_unlock_irqrestore(&hugetlb_lock, flags);
>>>       }
>>> -    spin_unlock_irqrestore(&hugetlb_lock, flags);
>>>   }
>>>   /*
>>>    * Put bootmem huge pages into the standard lists after mem_map is =
up.
>>>    * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) =
pages.
>>>    */
>>> -static void __init gather_bootmem_prealloc(void)
>>> +static void __init gather_bootmem_prealloc_node(unsigned long =
start, unsigned long end, void *arg)
>>> +
>>>   {
>>> +    int nid =3D start;
>> Sorry for so late to notice an issue here. I have seen a comment from
>> PADATA, whcih says:
>>     @max_threads: Max threads to use for the job, actual number may =
be less
>>                   depending on task size and minimum chunk size.
>> PADATA will not guarantee gather_bootmem_prealloc_node() will be =
called
>> ->max_threads times (You have initialized it to the number of NUMA =
nodes in
>> gather_bootmem_prealloc). Therefore, we should add a loop here to =
initialize
>> multiple nodes, namely (@end - @start) here. Otherwise, we will miss
>> initializing some nodes.
>> Thanks.
>>=20
> In padata_do_multithreaded:
>=20
> ```
> /* Ensure at least one thread when size < min_chunk. */
> nworks =3D max(job->size / max(job->min_chunk, job->align), 1ul);
> nworks =3D min(nworks, job->max_threads);
>=20
> ps.nworks      =3D padata_work_alloc_mt(nworks, &ps, &works);
> ```
>=20
> So we have works <=3D max_threads, but >=3D size/min_chunk.

Given a 4-node system, the current implementation will schedule
4 threads to call gather_bootmem_prealloc() respectively, and
there is no problems here. But what if PADATA schedules 2
threads and each thread aims to handle 2 nodes? I think
it is possible for PADATA in the future, because it does not
break any semantics exposed to users. The comment about @min_chunk:

	The minimum chunk size in job-specific units. This
	allows the client to communicate the minimum amount
	of work that's appropriate for one worker thread to
	do at once.

It only defines the minimum chunk size but not maximum size,
so it is possible to let each ->thread_fn handle multiple
minimum chunk size. Right? Therefore, I am not concerned
about the current implementation of PADATA but that of future.

Maybe a separate patch is acceptable since it is an improving
patch instead of a fix one (at least there is no bug currently).

Thanks.