From: Hugh Dickins <hughd@google.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Christoph Lameter <cl@linux.com>,
linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: 4.12-rc ppc64 4k-page needs costly allocations
Date: Thu, 1 Jun 2017 09:57:30 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.11.1706010952100.3014@eggly.anvils> (raw)
In-Reply-To: <87wp8wpcg9.fsf@skywalker.in.ibm.com>
On Thu, 1 Jun 2017, Aneesh Kumar K.V wrote:
> Hugh Dickins <hughd@google.com> writes:
>
> > Since f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
> > I find that swapping loads on ppc64 on G5 with 4k pages are failing:
> >
> > SLUB: Unable to allocate memory on node -1, gfp=0x14000c0(GFP_KERNEL)
> > cache: pgtable-2^12, object size: 32768, buffer size: 65536, default order: 4, min order: 4
> > pgtable-2^12 debugging increased min order, use slub_debug=O to disable.
> > node 0: slabs: 209, objs: 209, free: 8
> > gcc: page allocation failure: order:4, mode:0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
> > CPU: 1 PID: 6225 Comm: gcc Not tainted 4.12.0-rc2 #1
> > Call Trace:
> > [c00000000090b5c0] [c0000000004f8478] .dump_stack+0xa0/0xcc (unreliable)
> > [c00000000090b650] [c0000000000eb194] .warn_alloc+0xf0/0x178
> > [c00000000090b710] [c0000000000ebc9c] .__alloc_pages_nodemask+0xa04/0xb00
> > [c00000000090b8b0] [c00000000013921c] .new_slab+0x234/0x608
> > [c00000000090b980] [c00000000013b59c] .___slab_alloc.constprop.64+0x3dc/0x564
> > [c00000000090bad0] [c0000000004f5a84] .__slab_alloc.isra.61.constprop.63+0x54/0x70
> > [c00000000090bb70] [c00000000013b864] .kmem_cache_alloc+0x140/0x288
> > [c00000000090bc30] [c00000000004d934] .mm_init.isra.65+0x128/0x1c0
> > [c00000000090bcc0] [c000000000157810] .do_execveat_common.isra.39+0x294/0x690
> > [c00000000090bdb0] [c000000000157e70] .SyS_execve+0x28/0x38
> > [c00000000090be30] [c00000000000a118] system_call+0x38/0xfc
> >
> > I did try booting with slub_debug=O as the message suggested, but that
> > made no difference: it still hoped for but failed on order:4 allocations.
> >
> > I wanted to try removing CONFIG_SLUB_DEBUG, but didn't succeed in that:
> > it seemed to be a hard requirement for something, but I didn't find what.
> >
> > I did try CONFIG_SLAB=y instead of SLUB: that lowers these allocations to
> > the expected order:3, which then results in OOM-killing rather than direct
> > allocation failure, because of the PAGE_ALLOC_COSTLY_ORDER 3 cutoff. But
> > makes no real difference to the outcome: swapping loads still abort early.
> >
> > Relying on order:3 or order:4 allocations is just too optimistic: ppc64
> > with 4k pages would do better not to expect to support a 128TB userspace.
> >
> > I tried the obvious partial revert below, but it's not good enough:
> > the system did not boot beyond
> >
> > Starting init: /sbin/init exists but couldn't execute it (error -7)
> > Starting init: /bin/sh exists but couldn't execute it (error -7)
> > Kernel panic - not syncing: No working init found. ...
> >
>
> Can you try this patch.
Thanks! By the time I got to try it, you'd sent another later in the
day. Fractionally different, and I didn't spend any time working out
whether the difference was significant or cosmetic, I just tried that
second one instead. No problems with it so far, hasn't been running
long, but long enough to say that it definitely fixes the problems
I was getting - thank you.
Hugh
>
> commit fc55c0dc8b23446f937c1315aa61e74673de5ee6
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date: Thu Jun 1 08:06:40 2017 +0530
>
> powerpc/mm/4k: Limit 4k page size to 64TB
>
> Supporting 512TB requires us to do a order 3 allocation for level 1 page
> table(pgd). Limit 4k to 64TB for now.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b671ca..0c4e470571ca 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -8,7 +8,7 @@
> #define H_PTE_INDEX_SIZE 9
> #define H_PMD_INDEX_SIZE 7
> #define H_PUD_INDEX_SIZE 9
> -#define H_PGD_INDEX_SIZE 12
> +#define H_PGD_INDEX_SIZE 9
>
> #ifndef __ASSEMBLY__
> #define H_PTE_TABLE_SIZE (sizeof(pte_t) << H_PTE_INDEX_SIZE)
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index a2123f291ab0..5de3271026f1 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -110,13 +110,15 @@ void release_thread(struct task_struct *);
> #define TASK_SIZE_128TB (0x0000800000000000UL)
> #define TASK_SIZE_512TB (0x0002000000000000UL)
>
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
> /*
> * Max value currently used:
> */
> -#define TASK_SIZE_USER64 TASK_SIZE_512TB
> +#define TASK_SIZE_USER64 TASK_SIZE_512TB
> +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB
> #else
> -#define TASK_SIZE_USER64 TASK_SIZE_64TB
> +#define TASK_SIZE_USER64 TASK_SIZE_64TB
> +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB
> #endif
>
> /*
> @@ -132,7 +134,7 @@ void release_thread(struct task_struct *);
> * space during mmap's.
> */
> #define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4))
> -#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(TASK_SIZE_128TB / 4))
> +#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(DEFAULT_MAP_WINDOW_USER64 / 4))
>
> #define TASK_UNMAPPED_BASE ((is_32bit_task()) ? \
> TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 )
> @@ -143,8 +145,8 @@ void release_thread(struct task_struct *);
> * with 128TB and conditionally enable upto 512TB
> */
> #ifdef CONFIG_PPC_BOOK3S_64
> -#define DEFAULT_MAP_WINDOW ((is_32bit_task()) ? \
> - TASK_SIZE_USER32 : TASK_SIZE_128TB)
> +#define DEFAULT_MAP_WINDOW ((is_32bit_task()) ? \
> + TASK_SIZE_USER32 : DEFAULT_MAP_WINDOW_USER64)
> #else
> #define DEFAULT_MAP_WINDOW TASK_SIZE
> #endif
> @@ -153,7 +155,7 @@ void release_thread(struct task_struct *);
>
> #ifdef CONFIG_PPC_BOOK3S_64
> /* Limit stack to 128TB */
> -#define STACK_TOP_USER64 TASK_SIZE_128TB
> +#define STACK_TOP_USER64 DEFAULT_MAP_WINDOW_USER64
> #else
> #define STACK_TOP_USER64 TASK_SIZE_USER64
> #endif
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 8389ff5ac002..77062461c469 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -921,7 +921,7 @@ void __init setup_arch(char **cmdline_p)
>
> #ifdef CONFIG_PPC_MM_SLICES
> #ifdef CONFIG_PPC64
> - init_mm.context.addr_limit = TASK_SIZE_128TB;
> + init_mm.context.addr_limit = DEFAULT_MAP_WINDOW_USER64;
> #else
> #error "context.addr_limit not initialized."
> #endif
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index c6dca2ae78ef..a3edf813d455 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -99,7 +99,7 @@ static int hash__init_new_context(struct mm_struct *mm)
> * mm->context.addr_limit. Default to max task size so that we copy the
> * default values to paca which will help us to handle slb miss early.
> */
> - mm->context.addr_limit = TASK_SIZE_128TB;
> + mm->context.addr_limit = DEFAULT_MAP_WINDOW_USER64;
>
> /*
> * The old code would re-promote on fork, we don't do that when using
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2017-06-01 16:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-30 19:43 Hugh Dickins
2017-05-31 6:46 ` Michael Ellerman
2017-05-31 14:09 ` Christoph Lameter
2017-05-31 18:44 ` Hugh Dickins
2017-05-31 19:02 ` Mathieu Malaterre
2017-06-01 15:31 ` Christoph Lameter
2017-06-01 17:22 ` Hugh Dickins
2017-06-01 18:16 ` Christoph Lameter
2017-06-01 18:37 ` Hugh Dickins
2017-06-02 3:09 ` Michael Ellerman
2017-06-02 4:00 ` Hugh Dickins
2017-06-02 14:33 ` Christoph Lameter
2017-06-08 5:44 ` Michael Ellerman
2017-06-02 14:32 ` Christoph Lameter
2017-06-08 5:52 ` Michael Ellerman
2017-05-31 14:06 ` Christoph Lameter
2017-06-01 4:19 ` Aneesh Kumar K.V
2017-06-01 16:57 ` Hugh Dickins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.1706010952100.3014@eggly.anvils \
--to=hughd@google.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cl@linux.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox