From: Yinghai Lu <yinghai@kernel.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave@sr71.net>,
Seth Jennings <sjenning@linux.vnet.ibm.com>,
Nathan Fontenot <nfont@linux.vnet.ibm.com>,
Cody P Schafer <cody@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>
Subject: Re: [RFC][PATCH] drivers: base: dynamic memory block creation
Date: Wed, 14 Aug 2013 14:37:26 -0700 [thread overview]
Message-ID: <CAE9FiQUz6Ev0nbCoSbH7E=+zeJr6GKwR4B-z8+zJTRDPeF=jeA@mail.gmail.com> (raw)
In-Reply-To: <20130814203546.GA6200@kroah.com>
[-- Attachment #1: Type: text/plain, Size: 2132 bytes --]
On Wed, Aug 14, 2013 at 1:35 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Aug 14, 2013 at 01:05:33PM -0700, Dave Hansen wrote:
>> On 08/14/2013 12:43 PM, Greg Kroah-Hartman wrote:
>> > On Wed, Aug 14, 2013 at 02:31:45PM -0500, Seth Jennings wrote:
>> >> ppc64 has a normal memory block size of 256M (however sometimes as low
>> >> as 16M depending on the system LMB size), and (I think) x86 is 128M. With
>> >> 1TB of RAM and a 256M block size, that's 4k memory blocks with 20 sysfs
>> >> entries per block that's around 80k items that need be created at boot
>> >> time in sysfs. Some systems go up to 16TB where the issue is even more
>> >> severe.
>> >
>> > The x86 developers are working with larger memory sizes and they haven't
>> > seen the problem in this area, for them it's in other places, as I
>> > referred to in my other email.
>>
>> The SGI guys don't run normal distro kernels and don't turn on memory
>> hotplug, so they don't see this. I do the same in my testing of
>> large-memory x86 systems to speed up my boots. I'll go stick it back in
>> there and see if I can generate some numbers for a 1TB machine.
>>
>> But, the problem on x86 is at _worst_ 1/8 of the problem on ppc64 since
>> the SECTION_SIZE is so 8x bigger by default.
>>
>> Also, the cost of creating sections on ppc is *MUCH* higher than x86
>> when amortized across the number of pages that you're initializing. A
>> section on ppc64 has to be created for each (2^24/2^16)=256 pages while
>> one on x86 is created for each (2^27/2^12)=32768 pages.
>>
>> Thus, x86 folks with our small pages and large sections tend to be
>> focused on per-page costs. The ppc folks with their small sections and
>> larger pages tend to be focused on the per-section costs.
>
> Ah, thanks for the explaination, now it makes more sense why they are
> both optimizing in different places.
I had one local patch that sent before, it will probe block size for
generic x86_64.
set it to 2G looks more reasonable for system with 1T+ ram.
Also can we add block_size in that /sys directly so could generate
less entries ?
Thanks
Yinghai
[-- Attachment #2: block_size_x86_64.patch --]
[-- Type: application/octet-stream, Size: 1824 bytes --]
Subject: [PATCH -v2] x86, mm: Probe memory block size for generic x86 64bit
Usually if the system support memory remapping to get back memory for mmio
range, we will have 128M ... 2G at the end.
Try to probe that size.
So we can get less entries in /sys/devices/system/memory/
-v2: don't probe it every time when /sys/../block_size_byte is showed...
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/mm/init_64.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1263,17 +1263,43 @@ const char *arch_vma_name(struct vm_area
return NULL;
}
-#ifdef CONFIG_X86_UV
-unsigned long memory_block_size_bytes(void)
+static unsigned long probe_memory_block_size(void)
{
+ /* start from 2g */
+ unsigned long bz = 1UL<<31;
+
+#ifdef CONFIG_X86_UV
if (is_uv_system()) {
printk(KERN_INFO "UV: memory block size 2GB\n");
return 2UL * 1024 * 1024 * 1024;
}
- return MIN_MEMORY_BLOCK_SIZE;
-}
#endif
+ /* less than 64g installed */
+ if ((max_pfn << PAGE_SHIFT) < (16UL << 32))
+ return MIN_MEMORY_BLOCK_SIZE;
+
+ /* get the tail size */
+ while (bz > MIN_MEMORY_BLOCK_SIZE) {
+ if (!((max_pfn << PAGE_SHIFT) & (bz - 1)))
+ break;
+ bz >>= 1;
+ }
+
+ printk(KERN_DEBUG "memory block size : %ldMB\n", bz >> 20);
+
+ return bz;
+}
+
+static unsigned long memory_block_size_probed;
+unsigned long memory_block_size_bytes(void)
+{
+ if (!memory_block_size_probed)
+ memory_block_size_probed = probe_memory_block_size();
+
+ return memory_block_size_probed;
+}
+
#ifdef CONFIG_SPARSEMEM_VMEMMAP
/*
* Initialise the sparsemem vmemmap using huge-pages at the PMD level.
next prev parent reply other threads:[~2013-08-14 21:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-14 19:31 Seth Jennings
2013-08-14 19:40 ` Greg Kroah-Hartman
2013-08-16 19:07 ` Seth Jennings
2013-08-14 19:43 ` Greg Kroah-Hartman
2013-08-14 20:05 ` Dave Hansen
2013-08-14 20:35 ` Greg Kroah-Hartman
2013-08-14 21:16 ` Seth Jennings
2013-08-14 21:37 ` Yinghai Lu [this message]
2013-08-14 21:52 ` Seth Jennings
2013-08-14 23:20 ` Yinghai Lu
2013-08-15 2:12 ` Michael Ellerman
2013-08-14 20:40 ` Nathan Fontenot
2013-08-14 20:47 ` Dave Hansen
2013-08-14 21:14 ` Seth Jennings
2013-08-14 21:36 ` Dave Hansen
2013-08-14 21:37 ` Cody P Schafer
2013-08-14 21:49 ` Dave Hansen
2013-08-15 0:01 ` Rafael J. Wysocki
2013-08-16 18:41 ` Seth Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAE9FiQUz6Ev0nbCoSbH7E=+zeJr6GKwR4B-z8+zJTRDPeF=jeA@mail.gmail.com' \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cody@linux.vnet.ibm.com \
--cc=dave@sr71.net \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nfont@linux.vnet.ibm.com \
--cc=rafael.j.wysocki@intel.com \
--cc=sjenning@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox