From: Yinghai Lu <yinghai@kernel.org>
To: Russ Anderson <rja@sgi.com>, Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com
Subject: Re: [patch] mm: speedup in __early_pfn_to_nid
Date: Sat, 23 Mar 2013 13:37:37 -0700 [thread overview]
Message-ID: <CAE9FiQUjVRUs02-ymmtO+5+SgqTWK8Ae6jJwD08uRbgR=eLJgw@mail.gmail.com> (raw)
In-Reply-To: <20130323152948.GA3036@sgi.com>
[-- Attachment #1: Type: text/plain, Size: 4440 bytes --]
On Sat, Mar 23, 2013 at 8:29 AM, Russ Anderson <rja@sgi.com> wrote:
> On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote:
> ------------------------------------------------------------
> When booting on a large memory system, the kernel spends
> considerable time in memmap_init_zone() setting up memory zones.
> Analysis shows significant time spent in __early_pfn_to_nid().
>
> The routine memmap_init_zone() checks each PFN to verify the
> nid is valid. __early_pfn_to_nid() sequentially scans the list of
> pfn ranges to find the right range and returns the nid. This does
> not scale well. On a 4 TB (single rack) system there are 308
> memory ranges to scan. The higher the PFN the more time spent
> sequentially spinning through memory ranges.
>
> Since memmap_init_zone() increments pfn, it will almost always be
> looking for the same range as the previous pfn, so check that
> range first. If it is in the same range, return that nid.
> If not, scan the list as before.
>
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code. This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
>
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.
Interesting. but only have 308 entries in memblock...
Did you try to extend memblock_search() to search nid back?
Something like attached patch. That should save more time.
>
> Signed-off-by: Russ Anderson <rja@sgi.com>
> ---
> arch/ia64/mm/numa.c | 15 ++++++++++++++-
> mm/page_alloc.c | 15 ++++++++++++++-
> 2 files changed, 28 insertions(+), 2 deletions(-)
>
> Index: linux/mm/page_alloc.c
> ===================================================================
> --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500
> +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500
> @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
> {
> unsigned long start_pfn, end_pfn;
> int i, nid;
> + /*
> + NOTE: The following SMP-unsafe globals are only used early
> + in boot when the kernel is running single-threaded.
> + */
> + static unsigned long last_start_pfn, last_end_pfn;
> + static int last_nid;
> +
> + if (last_start_pfn <= pfn && pfn < last_end_pfn)
> + return last_nid;
>
> for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> - if (start_pfn <= pfn && pfn < end_pfn)
> + if (start_pfn <= pfn && pfn < end_pfn) {
> + last_start_pfn = start_pfn;
> + last_end_pfn = end_pfn;
> + last_nid = nid;
> return nid;
> + }
> /* This is a memory hole */
> return -1;
> }
> Index: linux/arch/ia64/mm/numa.c
> ===================================================================
> --- linux.orig/arch/ia64/mm/numa.c 2013-02-25 15:49:44.000000000 -0600
> +++ linux/arch/ia64/mm/numa.c 2013-03-22 16:09:44.662268239 -0500
> @@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr)
> int __meminit __early_pfn_to_nid(unsigned long pfn)
> {
> int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
> + /*
> + NOTE: The following SMP-unsafe globals are only used early
> + in boot when the kernel is running single-threaded.
> + */
> + static unsigned long last_start_pfn, last_end_pfn;
last_ssec, last_esec?
> + static int last_nid;
> +
> + if (section >= last_ssec && section < last_esec)
> + return last_nid;
>
> for (i = 0; i < num_node_memblks; i++) {
> ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
> esec = (node_memblk[i].start_paddr + node_memblk[i].size +
> ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
> - if (section >= ssec && section < esec)
> + if (section >= ssec && section < esec) {
> + last_ssec = ssec;
> + last_esec = esec;
> + last_nid = node_memblk[i].nid
> return node_memblk[i].nid;
> + }
> }
>
> return -1;
>
also looks like you forget to put IA maintainers in the To list.
may just put ia64 part in separated patch?
Thanks
Yinghai
[-- Attachment #2: memblock_search_pfn_nid.patch --]
[-- Type: application/octet-stream, Size: 2370 bytes --]
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 18 ++++++++++++++++++
mm/page_alloc.c | 14 ++++++++------
3 files changed, 28 insertions(+), 6 deletions(-)
Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -60,6 +60,8 @@ int memblock_reserve(phys_addr_t base, p
void memblock_trim_memory(phys_addr_t align);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
+ unsigned long *end_pfn);
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
unsigned long *out_end_pfn, int *out_nid);
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -910,6 +910,24 @@ int __init_memblock memblock_is_memory(p
return memblock_search(&memblock.memory, addr) != -1;
}
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
+ unsigned long *start_pfn, unsigned long *end_pfn)
+{
+ struct memblock_type *type = &memblock.memory;
+ int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT);
+
+ if (mid == -1)
+ return -1;
+
+ *start_pfn = type->regions[mid].base >> PAGE_SHIFT;
+ *end_pfn = (type->regions[mid].base + type->regions[mid].size)
+ >> PAGE_SHIFT;
+
+ return type->regions[mid].nid;
+}
+#endif
+
/**
* memblock_is_region_memory - check if a region is a subset of memory
* @base: base of region to check
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4160,13 +4160,15 @@ int __meminit init_currently_empty_zone(
int __meminit __early_pfn_to_nid(unsigned long pfn)
{
unsigned long start_pfn, end_pfn;
- int i, nid;
+ int nid;
- for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
- if (start_pfn <= pfn && pfn < end_pfn)
- return nid;
- /* This is a memory hole */
- return -1;
+ nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn);
+
+ if (nid != -1) {
+ /* save start_pfn, and end_pfn ?*/
+ }
+
+ return nid;
}
#endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
next prev parent reply other threads:[~2013-03-23 20:37 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-18 15:56 Russ Anderson
2013-03-19 3:56 ` David Rientjes
2013-03-20 22:32 ` Andrew Morton
2013-03-21 10:55 ` Ingo Molnar
2013-03-21 12:35 ` Michal Hocko
2013-03-21 18:03 ` Ingo Molnar
2013-03-25 21:26 ` Andrew Morton
2013-03-26 8:05 ` Ingo Molnar
2013-03-21 18:40 ` David Rientjes
2013-03-22 7:25 ` Ingo Molnar
2013-03-23 15:29 ` Russ Anderson
2013-03-23 20:37 ` Yinghai Lu [this message]
2013-03-25 2:11 ` Lin Feng
2013-03-25 21:56 ` Russ Anderson
2013-03-25 22:17 ` Yinghai Lu
2013-03-23 22:24 ` KOSAKI Motohiro
2013-03-25 0:28 ` David Rientjes
2013-03-25 21:34 ` Andrew Morton
2013-03-25 22:36 ` David Rientjes
2013-03-25 22:42 ` Andrew Morton
2013-03-24 7:43 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAE9FiQUjVRUs02-ymmtO+5+SgqTWK8Ae6jJwD08uRbgR=eLJgw@mail.gmail.com' \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=rientjes@google.com \
--cc=rja@sgi.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox