From: Muchun Song <muchun.song@linux.dev>
To: Gang Li <gang.li@linux.dev>
Cc: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
ligang.bdlg@bytedance.com
Subject: Re: [PATCH v5 7/7] hugetlb: parallelize 1G hugetlb initialization
Date: Mon, 5 Feb 2024 17:09:15 +0800 [thread overview]
Message-ID: <6A148F29-68B2-4365-872C-E6AB599C55F6@linux.dev> (raw)
In-Reply-To: <277e0eed-918f-414f-b19d-219bd155ac14@linux.dev>
> On Feb 5, 2024, at 16:26, Gang Li <gang.li@linux.dev> wrote:
>
>
>
> On 2024/2/5 15:28, Muchun Song wrote:
>> On 2024/1/26 23:24, Gang Li wrote:
>>> @@ -3390,8 +3390,6 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
>>> /* Send list for bulk vmemmap optimization processing */
>>> hugetlb_vmemmap_optimize_folios(h, folio_list);
>>> - /* Add all new pool pages to free lists in one lock cycle */
>>> - spin_lock_irqsave(&hugetlb_lock, flags);
>>> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
>>> if (!folio_test_hugetlb_vmemmap_optimized(folio)) {
>>> /*
>>> @@ -3404,23 +3402,27 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
>>> HUGETLB_VMEMMAP_RESERVE_PAGES,
>>> pages_per_huge_page(h));
>>> }
>>> + /* Subdivide locks to achieve better parallel performance */
>>> + spin_lock_irqsave(&hugetlb_lock, flags);
>>> __prep_account_new_huge_page(h, folio_nid(folio));
>>> enqueue_hugetlb_folio(h, folio);
>>> + spin_unlock_irqrestore(&hugetlb_lock, flags);
>>> }
>>> - spin_unlock_irqrestore(&hugetlb_lock, flags);
>>> }
>>> /*
>>> * Put bootmem huge pages into the standard lists after mem_map is up.
>>> * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) pages.
>>> */
>>> -static void __init gather_bootmem_prealloc(void)
>>> +static void __init gather_bootmem_prealloc_node(unsigned long start, unsigned long end, void *arg)
>>> +
>>> {
>>> + int nid = start;
>> Sorry for so late to notice an issue here. I have seen a comment from
>> PADATA, whcih says:
>> @max_threads: Max threads to use for the job, actual number may be less
>> depending on task size and minimum chunk size.
>> PADATA will not guarantee gather_bootmem_prealloc_node() will be called
>> ->max_threads times (You have initialized it to the number of NUMA nodes in
>> gather_bootmem_prealloc). Therefore, we should add a loop here to initialize
>> multiple nodes, namely (@end - @start) here. Otherwise, we will miss
>> initializing some nodes.
>> Thanks.
>>
> In padata_do_multithreaded:
>
> ```
> /* Ensure at least one thread when size < min_chunk. */
> nworks = max(job->size / max(job->min_chunk, job->align), 1ul);
> nworks = min(nworks, job->max_threads);
>
> ps.nworks = padata_work_alloc_mt(nworks, &ps, &works);
> ```
>
> So we have works <= max_threads, but >= size/min_chunk.
Given a 4-node system, the current implementation will schedule
4 threads to call gather_bootmem_prealloc() respectively, and
there is no problems here. But what if PADATA schedules 2
threads and each thread aims to handle 2 nodes? I think
it is possible for PADATA in the future, because it does not
break any semantics exposed to users. The comment about @min_chunk:
The minimum chunk size in job-specific units. This
allows the client to communicate the minimum amount
of work that's appropriate for one worker thread to
do at once.
It only defines the minimum chunk size but not maximum size,
so it is possible to let each ->thread_fn handle multiple
minimum chunk size. Right? Therefore, I am not concerned
about the current implementation of PADATA but that of future.
Maybe a separate patch is acceptable since it is an improving
patch instead of a fix one (at least there is no bug currently).
Thanks.
next prev parent reply other threads:[~2024-02-05 9:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 15:24 [PATCH v5 0/7] hugetlb: parallelize hugetlb page init on boot Gang Li
2024-01-26 15:24 ` [PATCH v5 1/7] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
2024-01-26 15:24 ` [PATCH v5 2/7] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
2024-01-26 15:24 ` [PATCH v5 3/7] padata: dispatch works on different nodes Gang Li
2024-01-26 22:23 ` Tim Chen
2024-01-26 15:24 ` [PATCH v5 4/7] hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc Gang Li
2024-01-26 15:24 ` [PATCH v5 5/7] hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA Gang Li
2024-01-26 15:24 ` [PATCH v5 6/7] hugetlb: parallelize 2M hugetlb allocation and initialization Gang Li
2024-01-29 3:44 ` Muchun Song
2024-01-26 15:24 ` [PATCH v5 7/7] hugetlb: parallelize 1G hugetlb initialization Gang Li
2024-01-29 3:56 ` Muchun Song
2024-02-05 7:28 ` Muchun Song
2024-02-05 8:26 ` Gang Li
2024-02-05 9:09 ` Muchun Song [this message]
2024-02-07 1:53 ` Jane Chu
2024-02-09 17:17 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6A148F29-68B2-4365-872C-E6AB599C55F6@linux.dev \
--to=muchun.song@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=gang.li@linux.dev \
--cc=ligang.bdlg@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=rientjes@google.com \
--cc=tim.c.chen@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox