From: "Yajun Deng" <yajun.deng@linux.dev>
To: "Mike Rapoport" <rppt@kernel.org>
Cc: "Greg KH" <gregkh@linuxfoundation.org>,
rafael@kernel.org, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm/mm_init.c: remove spinlock in early_pfn_to_nid()
Date: Thu, 15 Jun 2023 06:36:48 +0000 [thread overview]
Message-ID: <d8a0a36fbf88497e051ce7610678ce5c@linux.dev> (raw)
In-Reply-To: <20230615062021.GI52412@kernel.org>
June 15, 2023 2:20 PM, "Mike Rapoport" <rppt@kernel.org> wrote:
> On Thu, Jun 15, 2023 at 03:02:58AM +0000, Yajun Deng wrote:
>
>> June 14, 2023 7:53 PM, "Mike Rapoport" <rppt@kernel.org> wrote:
>>
>> Hi,
>>
>> On Wed, Jun 14, 2023 at 11:28:32AM +0000, Yajun Deng wrote:
>>
>> June 14, 2023 7:09 PM, "Greg KH" <gregkh@linuxfoundation.org> wrote:
>>
>> On Wed, Jun 14, 2023 at 07:03:24PM +0800, Yajun Deng wrote:
>>
>> When the system boots, only one cpu is enabled before smp_init().
>> So the spinlock is not needed in most cases, remove it.
>>
>> Add spinlock in get_nid_for_pfn() because it is after smp_init().
>>
>> So this is two different things at once in the same patch?
>>
>> Or are they the same problem and both need to go in to solve it?
>>
>> And if a spinlock is not needed at early boot, is it really causing any
>> problems?
>>
>> They are the same problem.
>> I added pr_info in early_pfn_to_nid(), found get_nid_for_pfn() is the only
>> case need to add spinlock.
>> This patch tested on my x86 system.
>>
>> Are you sure it'll work on !x86?
>>
>> I'm probably sure of that, although I don't have a !x86 machine.
>>
>> early_pfn_to_nid() is called in smp_init() and kasan_init() on
>> different architectures. If it works well on x86, it'll work on
>> !x86.
>
> This is often not true. Please verify that other architectures do not call
> early_pfn_to_nid() after smp_init(). The explanation why it is safe should
> be a part of the changelog.
>
>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>> ---
>> drivers/base/node.c | 11 +++++++++--
>> mm/mm_init.c | 18 +++---------------
>> 2 files changed, 12 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 9de524e56307..844102570ff2 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -748,8 +748,15 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>> static int __ref get_nid_for_pfn(unsigned long pfn)
>> {
>> #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> - if (system_state < SYSTEM_RUNNING)
>> - return early_pfn_to_nid(pfn);
>> + static DEFINE_SPINLOCK(early_pfn_lock);
>> + int nid;
>> +
>> + if (system_state < SYSTEM_RUNNING) {
>> + spin_lock(&early_pfn_lock);
>> + nid = early_pfn_to_nid(pfn);
>> + spin_unlock(&early_pfn_lock);
>>
>> Adding an external lock for when you call a function is VERY dangerous
>> as you did not document this anywhere, and there's no way to enforce it
>> properly at all.
>>
>> I should add a comment before early_pfn_to_nid().
>>
>> Does your change actually result in any boot time changes? How was this
>> tested?
>>
>> Just a bit.
>>
>> Just a bit tested? Or just a bit of boot time changes?
>> For the latter, do you have numbers?
>>
>> For the latter, the most beneficial function is memmap_init_reserved_pages(),
>> the boot time changes depending on whether DEFERRED_STRUCT_PAGE_INIT
>> is defined or not.
>>
>> -->memmap_init_reserved_pages()
>> -->for_each_reserved_mem_range()
>> reserve_bootmem_region()
>> -->for()
>> init_reserved_page()
>> --> early_pfn_to_nid()
>
> A better solution would be to pass nid to reserve_bootmem_range() and drop
> the call to early_pfn_to_nid() in init_reserved_page().
>
> Then there won't be lock contention and no need for fragile changes in the
> locking.
>
Great, I will try it.
>> If define CONFIG_DEFERRED_STRUCT_PAGE_INIT:
>>
>> before:
>> memmap_init_reserved_pages() 1.87 seconds
>> after:
>> memmap_init_reserved_pages() 1.27 seconds
>>
>> 32% time reduction.
>
> These measurements should be part of the changelog.
>
>> If not define CONFIG_DEFERRED_STRUCT_PAGE_INIT:
>>
>> early_pfn_to_nid() is called by few,
>> boot time didn't change.
>>
>> By the way, this machine has 190GB RAM.
>>
>> --
>> Sincerely yours,
>> Mike.
>
> --
> Sincerely yours,
> Mike.
prev parent reply other threads:[~2023-06-15 6:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-14 11:03 Yajun Deng
2023-06-14 11:09 ` Greg KH
2023-06-14 11:28 ` Yajun Deng
2023-06-14 11:53 ` Mike Rapoport
2023-06-15 3:02 ` Yajun Deng
2023-06-15 6:20 ` Mike Rapoport
2023-06-15 6:36 ` Yajun Deng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d8a0a36fbf88497e051ce7610678ce5c@linux.dev \
--to=yajun.deng@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rafael@kernel.org \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox