* Re: [patch 2/2] x86_64: Configure stack size [not found] <Pine.LNX.4.64.0711121147350.27017@schroedinger.engr.sgi.com> @ 2007-11-19 18:19 ` Mike Travis 2007-11-19 20:05 ` Andi Kleen 2007-11-19 22:35 ` Christoph Lameter 0 siblings, 2 replies; 14+ messages in thread From: Mike Travis @ 2007-11-19 18:19 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Christoph Lameter, apw, Jack Steiner, Paul Jackson, linux-mm, Mike Travis Andi Kleen writes: > >> What else can we do? Change all sites to do some dynamic allocation if >> (NR_CPUS >= lots), I guess. > > I think that's an reasonable alternative. Perhaps push one or two into > task_struct and grab them from there, then go dynamic. Only issue > is error handling and making it look nice in the source. I've been looking into this issue of cpumasks quite closely. The idea of having one or two "scratch" cpumask variables available is a good one. Integrating it into the current cpumask API is a big issue. One of the problem areas is cpumask_of_cpu(). This pushes not only a large array onto the stack, but the zeroing of all but 1 bit is expensive in cpu cycles. The predominant uses center on the following. (usage counts are only based on x86 and ia64 at the moment - 78 total references): * Modifying a task's CPU affinity: (29 usages) set_cpus_allowed(current, cpumask_of_cpu(cpu)) * Initialization of arrays: (32 usages) = cpumask_of_cpu(0) = cpumask_of_cpu(cpu) = cpumask_of_cpu(smp_processor_id()) * other random instances in balance_irq, smp_send_reschedule, !SMP target_cpu replacement macro, etc. I think adding another api call or an optional interface to include a scalar cpu # avoids this fairly easily. Whether the cpumask primitives need this optional scalar operation is still a bit unclear. > >> As for timing: we might as well merge it now so that 2.6.25 has at least a >> chance of running on 16384-way. > > x86 is still limited to 256 virtual CPUs. What makes you think that changed? > With x2APIC from Intel it will be higher, but I haven't seen code for > that yet. Yes, there will be more support needed for this new APIC as well as new ACPI tables. > >> otoh, I doubt if anyone will actually ship an NR_CPUS=16384 kernel, so it >> isn't terribly pointful. Ideally, NR_CPUS would just go away, and become a startup initialization problem... ;-) > NR_CPUS==4096 might happen. Of course that still needs eliminating > a lot of NR_CPUS arrays and fixing up of NR_INTERRUPTS and some other > things. I've also looked at the irq problems with cpumask in the irq_desc and irq_cfg arrays all being on node 0. The code in ia64 seems to be a fairly good model to base changes on...? Thanks, Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-19 18:19 ` [patch 2/2] x86_64: Configure stack size Mike Travis @ 2007-11-19 20:05 ` Andi Kleen 2007-11-19 22:35 ` Christoph Lameter 1 sibling, 0 replies; 14+ messages in thread From: Andi Kleen @ 2007-11-19 20:05 UTC (permalink / raw) To: Mike Travis Cc: Andrew Morton, Christoph Lameter, apw, Jack Steiner, Paul Jackson, linux-mm > * Modifying a task's CPU affinity: (29 usages) > set_cpus_allowed(current, cpumask_of_cpu(cpu)) They're usually matched with a set_cpus_allowed(current, oldmask) with oldmask being a full arbitrary mask. So eliminating them would not directly help. But I suppose just passing a pointer would work. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-19 18:19 ` [patch 2/2] x86_64: Configure stack size Mike Travis 2007-11-19 20:05 ` Andi Kleen @ 2007-11-19 22:35 ` Christoph Lameter 1 sibling, 0 replies; 14+ messages in thread From: Christoph Lameter @ 2007-11-19 22:35 UTC (permalink / raw) To: Mike Travis Cc: Andi Kleen, Andrew Morton, apw, Jack Steiner, Paul Jackson, linux-mm Here is a simple patch to use a per cpu cpumask instead of constructing one on the stack. I have been running awhile with this one: Do not use stack to allocate cpumask for cpumask_of_cpu Signed-off-by: Christoph Lameter <clameter@sgi.com> --- include/linux/cpumask.h | 12 +----------- include/linux/percpu.h | 2 ++ kernel/sched.c | 6 ++++++ 3 files changed, 9 insertions(+), 11 deletions(-) Index: linux-2.6/include/linux/cpumask.h =================================================================== --- linux-2.6.orig/include/linux/cpumask.h 2007-11-17 17:10:13.508534650 -0800 +++ linux-2.6/include/linux/cpumask.h 2007-11-17 17:11:34.816785513 -0800 @@ -222,17 +222,7 @@ int __next_cpu(int n, const cpumask_t *s #define next_cpu(n, src) 1 #endif -#define cpumask_of_cpu(cpu) \ -({ \ - typeof(_unused_cpumask_arg_) m; \ - if (sizeof(m) == sizeof(unsigned long)) { \ - m.bits[0] = 1UL<<(cpu); \ - } else { \ - cpus_clear(m); \ - cpu_set((cpu), m); \ - } \ - m; \ -}) +#define cpumask_of_cpu(cpu) per_cpu(cpu_mask, cpu) #define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS) Index: linux-2.6/include/linux/percpu.h =================================================================== --- linux-2.6.orig/include/linux/percpu.h 2007-11-17 17:10:13.516534409 -0800 +++ linux-2.6/include/linux/percpu.h 2007-11-17 17:11:34.816785513 -0800 @@ -21,6 +21,8 @@ (__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE) #endif /* PERCPU_ENOUGH_ROOM */ +DECLARE_PER_CPU(cpumask_t, cpu_mask); + /* * Must be an lvalue. Since @var must be a simple identifier, * we force a syntax error here if it isn't. Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c 2007-11-17 17:10:13.524534454 -0800 +++ linux-2.6/kernel/sched.c 2007-11-17 17:11:34.816785513 -0800 @@ -6725,6 +6725,9 @@ static void init_cfs_rq(struct cfs_rq *c cfs_rq->min_vruntime = (u64)(-(1LL << 20)); } +DEFINE_PER_CPU(cpumask_t, cpu_mask); +EXPORT_PER_CPU_SYMBOL(cpu_mask); + void __init sched_init(void) { int highest_cpu = 0; @@ -6734,6 +6737,9 @@ void __init sched_init(void) struct rt_prio_array *array; struct rq *rq; + /* This makes cpumask_of_cpu work */ + cpu_set(i, per_cpu(cpu_mask, i)); + rq = cpu_rq(i); spin_lock_init(&rq->lock); lockdep_set_class(&rq->lock, &rq->rq_lock_key); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 0/2] X86_64 configurable stack size @ 2007-11-07 0:43 clameter 2007-11-07 0:43 ` [patch 2/2] x86_64: Configure " clameter 0 siblings, 1 reply; 14+ messages in thread From: clameter @ 2007-11-07 0:43 UTC (permalink / raw) To: akpm; +Cc: linux-mm These two patches the configuration of the stack size on x86_64. Prior discussion on these (this version does not provide a fallback): http://marc.info/?l=linux-mm&m=119147073128193&w=2 http://marc.info/?l=linux-mm&m=119147072506052&w=2 -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 2/2] x86_64: Configure stack size 2007-11-07 0:43 [patch 0/2] X86_64 configurable " clameter @ 2007-11-07 0:43 ` clameter 2007-11-07 19:14 ` Andy Whitcroft 0 siblings, 1 reply; 14+ messages in thread From: clameter @ 2007-11-07 0:43 UTC (permalink / raw) To: akpm; +Cc: linux-mm, ak, travis [-- Attachment #1: x86_64_config_stack_size --] [-- Type: text/plain, Size: 2621 bytes --] Make the stack size configurable necessary. SGI NUMA configurations may need more stack because cpumasks and nodemasks are at times kept on the stack. This patch allows to run with 16k or 32k kernel stacks. Cc: ak@suse.de Cc: travis@sgi.com Signed-off-by: Christoph Lameter <clameter@sgi.com> --- arch/x86/Kconfig.x86_64 | 6 ++++++ include/asm-x86/page_64.h | 3 +-- include/asm-x86/thread_info_64.h | 4 ++-- 3 files changed, 9 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86/Kconfig.x86_64 =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.x86_64 2007-11-06 12:34:13.000000000 -0800 +++ linux-2.6/arch/x86/Kconfig.x86_64 2007-11-06 15:44:36.000000000 -0800 @@ -378,6 +378,12 @@ config NODES_SHIFT default "6" depends on NEED_MULTIPLE_NODES +config THREAD_ORDER + int "Kernel stack size (in page order)" + default "1" + help + Page order for the thread stack. + # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig. config X86_64_ACPI_NUMA Index: linux-2.6/include/asm-x86/page_64.h =================================================================== --- linux-2.6.orig/include/asm-x86/page_64.h 2007-10-17 13:35:53.000000000 -0700 +++ linux-2.6/include/asm-x86/page_64.h 2007-11-06 15:44:36.000000000 -0800 @@ -9,8 +9,7 @@ #define PAGE_MASK (~(PAGE_SIZE-1)) #define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & __PHYSICAL_MASK) -#define THREAD_ORDER 1 -#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) +#define THREAD_SIZE (PAGE_SIZE << CONFIG_THREAD_ORDER) #define CURRENT_MASK (~(THREAD_SIZE-1)) #define EXCEPTION_STACK_ORDER 0 Index: linux-2.6/include/asm-x86/thread_info_64.h =================================================================== --- linux-2.6.orig/include/asm-x86/thread_info_64.h 2007-11-06 15:44:31.000000000 -0800 +++ linux-2.6/include/asm-x86/thread_info_64.h 2007-11-06 15:44:36.000000000 -0800 @@ -80,9 +80,9 @@ static inline struct thread_info *stack_ #endif #define alloc_thread_info(tsk) \ - ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER)) + ((struct thread_info *) __get_free_pages(THREAD_FLAGS, CONFIG_THREAD_ORDER)) -#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER) +#define free_thread_info(ti) free_pages((unsigned long) (ti), CONFIG_THREAD_ORDER) #else /* !__ASSEMBLY__ */ -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-07 0:43 ` [patch 2/2] x86_64: Configure " clameter @ 2007-11-07 19:14 ` Andy Whitcroft 2007-11-07 23:12 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Andy Whitcroft @ 2007-11-07 19:14 UTC (permalink / raw) To: clameter; +Cc: akpm, linux-mm, ak, travis On Tue, Nov 06, 2007 at 04:43:59PM -0800, clameter@sgi.com wrote: > Make the stack size configurable necessary. SGI NUMA configurations may need > more stack because cpumasks and nodemasks are at times kept on the stack. > This patch allows to run with 16k or 32k kernel stacks. > > Cc: ak@suse.de > Cc: travis@sgi.com > Signed-off-by: Christoph Lameter <clameter@sgi.com> > > --- > arch/x86/Kconfig.x86_64 | 6 ++++++ > include/asm-x86/page_64.h | 3 +-- > include/asm-x86/thread_info_64.h | 4 ++-- > 3 files changed, 9 insertions(+), 4 deletions(-) > > Index: linux-2.6/arch/x86/Kconfig.x86_64 > =================================================================== > --- linux-2.6.orig/arch/x86/Kconfig.x86_64 2007-11-06 12:34:13.000000000 -0800 > +++ linux-2.6/arch/x86/Kconfig.x86_64 2007-11-06 15:44:36.000000000 -0800 > @@ -378,6 +378,12 @@ config NODES_SHIFT > default "6" > depends on NEED_MULTIPLE_NODES > > +config THREAD_ORDER > + int "Kernel stack size (in page order)" > + default "1" > + help > + Page order for the thread stack. > + > # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig. > > config X86_64_ACPI_NUMA > Index: linux-2.6/include/asm-x86/page_64.h > =================================================================== > --- linux-2.6.orig/include/asm-x86/page_64.h 2007-10-17 13:35:53.000000000 -0700 > +++ linux-2.6/include/asm-x86/page_64.h 2007-11-06 15:44:36.000000000 -0800 > @@ -9,8 +9,7 @@ > #define PAGE_MASK (~(PAGE_SIZE-1)) > #define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & __PHYSICAL_MASK) > > -#define THREAD_ORDER 1 > -#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) > +#define THREAD_SIZE (PAGE_SIZE << CONFIG_THREAD_ORDER) > #define CURRENT_MASK (~(THREAD_SIZE-1)) > > #define EXCEPTION_STACK_ORDER 0 > Index: linux-2.6/include/asm-x86/thread_info_64.h > =================================================================== > --- linux-2.6.orig/include/asm-x86/thread_info_64.h 2007-11-06 15:44:31.000000000 -0800 > +++ linux-2.6/include/asm-x86/thread_info_64.h 2007-11-06 15:44:36.000000000 -0800 > @@ -80,9 +80,9 @@ static inline struct thread_info *stack_ > #endif > > #define alloc_thread_info(tsk) \ > - ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER)) > + ((struct thread_info *) __get_free_pages(THREAD_FLAGS, CONFIG_THREAD_ORDER)) > > -#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER) > +#define free_thread_info(ti) free_pages((unsigned long) (ti), CONFIG_THREAD_ORDER) > > #else /* !__ASSEMBLY__ */ We seem to be growing two different mechanisms here for 32bit and 64bit. This does seem a better option than that in 32bit CONFIG_4KSTACKS etc. IMO when these two merge we should consolidate on this version. Reviewed-by: Andy Whitcroft <apw@shadowen.org> -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-07 19:14 ` Andy Whitcroft @ 2007-11-07 23:12 ` Andi Kleen 2007-11-08 0:42 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2007-11-07 23:12 UTC (permalink / raw) To: Andy Whitcroft; +Cc: clameter, akpm, linux-mm, travis > We seem to be growing two different mechanisms here for 32bit and 64bit. > This does seem a better option than that in 32bit CONFIG_4KSTACKS etc. > IMO when these two merge we should consolidate on this version. Best would be to not change it at all for 64bit for now. We can worry about the 16k CPU systems when they appear, but shorter term it would just lead to other crappy kernel code relying on large stacks when it shouldn't. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-07 23:12 ` Andi Kleen @ 2007-11-08 0:42 ` Christoph Lameter 2007-11-09 20:13 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-11-08 0:42 UTC (permalink / raw) To: Andi Kleen; +Cc: Andy Whitcroft, akpm, linux-mm, travis On Thu, 8 Nov 2007, Andi Kleen wrote: > > > We seem to be growing two different mechanisms here for 32bit and 64bit. > > This does seem a better option than that in 32bit CONFIG_4KSTACKS etc. > > IMO when these two merge we should consolidate on this version. > > Best would be to not change it at all for 64bit for now. > > We can worry about the 16k CPU systems when they appear, but shorter term > it would just lead to other crappy kernel code relying on large stacks when > it shouldn't. Well we cannot really test these systems without these patches and when they become officially available then its too late for merging. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-08 0:42 ` Christoph Lameter @ 2007-11-09 20:13 ` Andrew Morton 2007-11-09 20:45 ` Christoph Lameter 2007-11-10 17:21 ` Andi Kleen 0 siblings, 2 replies; 14+ messages in thread From: Andrew Morton @ 2007-11-09 20:13 UTC (permalink / raw) To: Christoph Lameter; +Cc: ak, apw, linux-mm, travis On Wed, 7 Nov 2007 16:42:04 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Thu, 8 Nov 2007, Andi Kleen wrote: > > > > > > We seem to be growing two different mechanisms here for 32bit and 64bit. > > > This does seem a better option than that in 32bit CONFIG_4KSTACKS etc. > > > IMO when these two merge we should consolidate on this version. > > > > Best would be to not change it at all for 64bit for now. > > > > We can worry about the 16k CPU systems when they appear, but shorter term > > it would just lead to other crappy kernel code relying on large stacks when > > it shouldn't. > > Well we cannot really test these systems without these patches and when > they become officially available then its too late for merging. It doesn't take many 2kb cpumasks to use up a lot of stack. What else can we do? Change all sites to do some dynamic allocation if (NR_CPUS >= lots), I guess. As for timing: we might as well merge it now so that 2.6.25 has at least a chance of running on 16384-way. otoh, I doubt if anyone will actually ship an NR_CPUS=16384 kernel, so it isn't terribly pointful. So I'm wobbly. Could we please examine the alternatives before proceeding? Is there any plan in anyone's mind to fix this problem in a better but probably more intrusive fashion? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 20:13 ` Andrew Morton @ 2007-11-09 20:45 ` Christoph Lameter 2007-11-09 21:10 ` Andrew Morton 2007-11-10 17:21 ` Andi Kleen 1 sibling, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-11-09 20:45 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, apw, linux-mm, travis On Fri, 9 Nov 2007, Andrew Morton wrote: > otoh, I doubt if anyone will actually ship an NR_CPUS=16384 kernel, so it > isn't terribly pointful. Our competition (Cray) just announced a product featuring up to 21k cpus although that is a cluster. We are definitely getting there... > So I'm wobbly. Could we please examine the alternatives before proceeding? This works fine with a 32k stack on IA64 with 4k processors. So I tend to think of this as a solution that is already working on another platform. An 8k stack is also going to be tough with 4k processors on x86_64 which we will have soon. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 20:45 ` Christoph Lameter @ 2007-11-09 21:10 ` Andrew Morton 2007-11-09 21:19 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2007-11-09 21:10 UTC (permalink / raw) To: Christoph Lameter; +Cc: ak, apw, linux-mm, travis On Fri, 9 Nov 2007 12:45:06 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Fri, 9 Nov 2007, Andrew Morton wrote: > > > otoh, I doubt if anyone will actually ship an NR_CPUS=16384 kernel, so it > > isn't terribly pointful. > > Our competition (Cray) just announced a product featuring up to 21k > cpus although that is a cluster. We are definitely getting there... I'm talking about software, not hardware. I'd expect that you'll have trouble talking RH/suse/etc into general shipping of an NR_CPUS=16384 kernel. If I'm correct than I'd have thought that this will be a significant problem for SGI, so we should find other solutions. > > So I'm wobbly. Could we please examine the alternatives before proceeding? > > This works fine with a 32k stack on IA64 with 4k processors. yeah, but that's an order-1 allocation on ia64, not an order-3. > So I tend to > think of this as a solution that is already working on another platform. > An 8k stack is also going to be tough with 4k processors on x86_64 which > we will have soon. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 21:10 ` Andrew Morton @ 2007-11-09 21:19 ` Christoph Lameter 2007-11-09 21:46 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-11-09 21:19 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, apw, linux-mm, travis On Fri, 9 Nov 2007, Andrew Morton wrote: > I'm talking about software, not hardware. I'd expect that you'll have > trouble talking RH/suse/etc into general shipping of an NR_CPUS=16384 > kernel. Yeah that is one reason why I removed all percpu arrays from the kernel with the cpu_alloc patchset. I think we can get to a point where this does not hurt that much. We want to be as close as possible to a distro kernel as possible. > > If I'm correct than I'd have thought that this will be a significant > problem for SGI, so we should find other solutions. Maybe a special kernel from the distros is unavoidable but then they have done that in the past for us too. Certainly we do not want to have the kernel patches just for HPC apps. This is an option after all and not a default. Mike Travis is working on reducing the per cpu overhead in the x86_64 arch code. So we should be getting to a pretty good situation even if we have to leave the cpumasks alone. > > > So I'm wobbly. Could we please examine the alternatives before proceeding? > > > > This works fine with a 32k stack on IA64 with 4k processors. > > yeah, but that's an order-1 allocation on ia64, not an order-3. Well the default is also an order-1 allocation on x86_64. The order-3 alloc is not going to be that much of a problem if you have a system with several terabytes of RAM. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 21:19 ` Christoph Lameter @ 2007-11-09 21:46 ` Andrew Morton 2007-11-09 21:50 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2007-11-09 21:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: ak, apw, linux-mm, travis On Fri, 9 Nov 2007 13:19:11 -0800 (PST) Christoph Lameter <clameter@sgi.com> wrote: > On Fri, 9 Nov 2007, Andrew Morton wrote: > > > I'm talking about software, not hardware. I'd expect that you'll have > > trouble talking RH/suse/etc into general shipping of an NR_CPUS=16384 > > kernel. > > Yeah that is one reason why I removed all percpu arrays from the kernel > with the cpu_alloc patchset. I think we can get to a point where this does > not hurt that much. We want to be as close as possible to a distro kernel > as possible. > > > > If I'm correct than I'd have thought that this will be a significant > > problem for SGI, so we should find other solutions. > > Maybe a special kernel from the distros is unavoidable but then they have > done that in the past for us too. Certainly we do not want to have the > kernel patches just for HPC apps. This is an option after all and not a > default. Mike Travis is working on reducing the per cpu overhead in the > x86_64 arch code. So we should be getting to a pretty good situation even > if we have to leave the cpumasks alone. > > > > > So I'm wobbly. Could we please examine the alternatives before proceeding? > > > > > > This works fine with a 32k stack on IA64 with 4k processors. > > > > yeah, but that's an order-1 allocation on ia64, not an order-3. > > Well the default is also an order-1 allocation on x86_64. The order-3 > alloc is not going to be that much of a problem if you have a system with > several terabytes of RAM. Fair enough. Did you consider making the stack size a calculated-in-Kconfig-arithmetic thing rather than an offered-to-humans thing? Derive it from CONFIG_NR_CPUS? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 21:46 ` Andrew Morton @ 2007-11-09 21:50 ` Christoph Lameter 0 siblings, 0 replies; 14+ messages in thread From: Christoph Lameter @ 2007-11-09 21:50 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, apw, linux-mm, travis On Fri, 9 Nov 2007, Andrew Morton wrote: > Did you consider making the stack size a calculated-in-Kconfig-arithmetic > thing rather than an offered-to-humans thing? Derive it from CONFIG_NR_CPUS? Estimating stack use based on NR_CPUS is a difficult thing. The estimates likely have to change as the use of the stack changes in the kernel. I'd rather have a constant there now. Maybe in the future we can come up with such a scheme. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/2] x86_64: Configure stack size 2007-11-09 20:13 ` Andrew Morton 2007-11-09 20:45 ` Christoph Lameter @ 2007-11-10 17:21 ` Andi Kleen 1 sibling, 0 replies; 14+ messages in thread From: Andi Kleen @ 2007-11-10 17:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Lameter, ak, apw, linux-mm, travis > What else can we do? Change all sites to do some dynamic allocation if > (NR_CPUS >= lots), I guess. I think that's an reasonable alternative. Perhaps push one or two into task_struct and grab them from there, then go dynamic. Only issue is error handling and making it look nice in the source. > As for timing: we might as well merge it now so that 2.6.25 has at least a > chance of running on 16384-way. x86 is still limited to 256 virtual CPUs. What makes you think that changed? With x2APIC from Intel it will be higher, but I haven't seen code for that yet. > otoh, I doubt if anyone will actually ship an NR_CPUS=16384 kernel, so it > isn't terribly pointful. NR_CPUS==4096 might happen. Of course that still needs eliminating a lot of NR_CPUS arrays and fixing up of NR_INTERRUPTS and some other things. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-11-19 22:35 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0711121147350.27017@schroedinger.engr.sgi.com>
2007-11-19 18:19 ` [patch 2/2] x86_64: Configure stack size Mike Travis
2007-11-19 20:05 ` Andi Kleen
2007-11-19 22:35 ` Christoph Lameter
2007-11-07 0:43 [patch 0/2] X86_64 configurable " clameter
2007-11-07 0:43 ` [patch 2/2] x86_64: Configure " clameter
2007-11-07 19:14 ` Andy Whitcroft
2007-11-07 23:12 ` Andi Kleen
2007-11-08 0:42 ` Christoph Lameter
2007-11-09 20:13 ` Andrew Morton
2007-11-09 20:45 ` Christoph Lameter
2007-11-09 21:10 ` Andrew Morton
2007-11-09 21:19 ` Christoph Lameter
2007-11-09 21:46 ` Andrew Morton
2007-11-09 21:50 ` Christoph Lameter
2007-11-10 17:21 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox