* [PATCH 0/4] no MAX_ARG_PAGES
@ 2007-06-05 15:05 Peter Zijlstra
2007-06-05 15:05 ` [PATCH 1/4] arch: personality independent stack top Peter Zijlstra
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Peter Zijlstra @ 2007-06-05 15:05 UTC (permalink / raw)
To: linux-kernel, parisc-linux, linux-mm, linux-arch
Cc: Ollie Wild, Peter Zijlstra, Andrew Morton, Ingo Molnar, Andi Kleen
This patch-set aims at removing the current limit on argv+env space aka.
MAX_ARG_PAGES.
The new mm is created before the binfmt code runs, the stack is placed at the
highest address supported by that architecture.
The argv+env data is then copied from the old mm into the new mm (which is
inactive at that time - this introduces some cache coherency issues).
Then we run the binfmt code, which will compute the final stack address. The
existing stack will be moved downwards (or upwards on PA-RISC) to the desired
place.
This 'trick' heavily relies on the MMU, so for no-MMU archs we stay with the
old approach.
Tested-on: i386, ia64, frv, parisc, ppc64, um, x86_64
Patches against: .22-rc4
--
Changes since last time:
- fixed ia64
( va-space constraints )
- fixed ia64 CONFIG_IA32_SUPPORT
( brown paper bags and more va-space constraints)
- fixed powerpc
( address limit on stack extention )
- removed audit_kill() in favour of audit_panic()
- nicely spit out the patches
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* [PATCH 1/4] arch: personality independent stack top 2007-06-05 15:05 [PATCH 0/4] no MAX_ARG_PAGES Peter Zijlstra @ 2007-06-05 15:05 ` Peter Zijlstra 2007-06-05 15:05 ` [PATCH 2/4] audit: rework execve audit Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2007-06-05 15:05 UTC (permalink / raw) To: linux-kernel, parisc-linux, linux-mm, linux-arch Cc: Ollie Wild, Peter Zijlstra, Andrew Morton, Ingo Molnar, Andi Kleen [-- Attachment #1: stack_top_max.patch --] [-- Type: text/plain, Size: 11537 bytes --] New arch macro STACK_TOP_MAX it gives the larges valid stack address for the architecture in question. It differs from STACK_TOP in that it will not distinguish between personalities but will always return the largest possible address. This is used to create the initial stack on execve, which we will move down to the proper location once the binfmt code has figured out where that is. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> --- fs/exec.c | 2 +- include/asm-alpha/a.out.h | 2 ++ include/asm-arm/a.out.h | 1 + include/asm-arm26/a.out.h | 1 + include/asm-avr32/a.out.h | 1 + include/asm-cris/a.out.h | 1 + include/asm-frv/mem-layout.h | 1 + include/asm-h8300/a.out.h | 1 + include/asm-i386/a.out.h | 1 + include/asm-ia64/ustack.h | 1 + include/asm-m32r/a.out.h | 1 + include/asm-m68k/a.out.h | 1 + include/asm-mips/a.out.h | 1 + include/asm-parisc/a.out.h | 1 + include/asm-powerpc/a.out.h | 3 +++ include/asm-s390/a.out.h | 1 + include/asm-sh/a.out.h | 1 + include/asm-sh64/a.out.h | 1 + include/asm-sparc/a.out.h | 1 + include/asm-sparc64/a.out.h | 2 ++ include/asm-um/a.out.h | 2 ++ include/asm-x86_64/a.out.h | 3 ++- include/asm-xtensa/a.out.h | 1 + 23 files changed, 29 insertions(+), 2 deletions(-) Index: linux-2.6-2/include/asm-alpha/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-alpha/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-alpha/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -101,6 +101,8 @@ struct exec #define STACK_TOP \ (current->personality & ADDR_LIMIT_32BIT ? 0x80000000 : 0x00120000000UL) +#define STACK_TOP_MAX 0x00120000000UL + #endif #endif /* __A_OUT_GNU_H__ */ Index: linux-2.6-2/include/asm-arm/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-arm/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-arm/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -30,6 +30,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP ((current->personality == PER_LINUX_32BIT) ? \ TASK_SIZE : TASK_SIZE_26) +#define STACK_TOP_MAX TASK_SIZE #endif #ifndef LIBRARY_START_TEXT Index: linux-2.6-2/include/asm-arm26/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-arm26/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-arm26/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -29,6 +29,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif #ifndef LIBRARY_START_TEXT Index: linux-2.6-2/include/asm-avr32/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-avr32/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-avr32/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-cris/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-cris/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-cris/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -8,6 +8,7 @@ /* grabbed from the intel stuff */ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP struct exec Index: linux-2.6-2/include/asm-frv/mem-layout.h =================================================================== --- linux-2.6-2.orig/include/asm-frv/mem-layout.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-frv/mem-layout.h 2007-06-01 10:27:30.000000000 +0200 @@ -60,6 +60,7 @@ */ #define BRK_BASE __UL(2 * 1024 * 1024 + PAGE_SIZE) #define STACK_TOP __UL(2 * 1024 * 1024) +#define STACK_TOP_MAX STACK_TOP /* userspace process size */ #ifdef CONFIG_MMU Index: linux-2.6-2/include/asm-h8300/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-h8300/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-h8300/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-i386/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-i386/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-i386/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-ia64/ustack.h =================================================================== --- linux-2.6-2.orig/include/asm-ia64/ustack.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-ia64/ustack.h 2007-06-01 10:27:30.000000000 +0200 @@ -11,6 +11,7 @@ /* The absolute hard limit for stack size is 1/2 of the mappable space in the region */ #define MAX_USER_STACK_SIZE (RGN_MAP_LIMIT/2) #define STACK_TOP (0x6000000000000000UL + RGN_MAP_LIMIT) +#define STACK_TOP_MAX STACK_TOP #endif /* Make a default stack size of 2GiB */ Index: linux-2.6-2/include/asm-m32r/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-m32r/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-m32r/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-m68k/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-m68k/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-m68k/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-mips/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-mips/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-mips/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -40,6 +40,7 @@ struct exec #ifdef CONFIG_64BIT #define STACK_TOP (current->thread.mflags & MF_32BIT_ADDR ? TASK_SIZE32 : TASK_SIZE) #endif +#define STACK_TOP_MAX TASK_SIZE #endif Index: linux-2.6-2/include/asm-parisc/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-parisc/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-parisc/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -23,6 +23,7 @@ struct exec * prumpf */ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX DEFAULT_TASK_SIZE #endif Index: linux-2.6-2/include/asm-powerpc/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-powerpc/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-powerpc/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -26,9 +26,12 @@ struct exec #define STACK_TOP (test_thread_flag(TIF_32BIT) ? \ STACK_TOP_USER32 : STACK_TOP_USER64) +#define STACK_TOP_MAX STACK_TOP_USER64 + #else /* __powerpc64__ */ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif /* __powerpc64__ */ #endif /* __KERNEL__ */ Index: linux-2.6-2/include/asm-s390/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-s390/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-s390/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -32,6 +32,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX DEFAULT_TASK_SIZE #endif Index: linux-2.6-2/include/asm-sh/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-sh/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-sh/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -20,6 +20,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-sh64/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-sh64/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-sh64/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -31,6 +31,7 @@ struct exec #ifdef __KERNEL__ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP #endif Index: linux-2.6-2/include/asm-sparc/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-sparc/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-sparc/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -92,6 +92,7 @@ struct relocation_info /* used when head #include <asm/page.h> #define STACK_TOP (PAGE_OFFSET - PAGE_SIZE) +#define STACK_TOP_MAX STACK_TOP #endif /* __KERNEL__ */ Index: linux-2.6-2/include/asm-sparc64/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-sparc64/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-sparc64/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -101,6 +101,8 @@ struct relocation_info /* used when head #define STACK_TOP (test_thread_flag(TIF_32BIT) ? \ STACK_TOP32 : STACK_TOP64) +#define STACK_TOP_MAX STACK_TOP64 + #endif #endif /* !(__ASSEMBLY__) */ Index: linux-2.6-2/include/asm-um/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-um/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-um/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -16,4 +16,6 @@ extern int honeypot; #define STACK_TOP \ CHOOSE_MODE((honeypot ? host_task_size : task_size), task_size) +#define STACK_TOP_MAX STACK_TOP + #endif Index: linux-2.6-2/include/asm-x86_64/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-x86_64/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-x86_64/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -21,7 +21,8 @@ struct exec #ifdef __KERNEL__ #include <linux/thread_info.h> -#define STACK_TOP TASK_SIZE +#define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX TASK_SIZE64 #endif #endif /* __A_OUT_GNU_H__ */ Index: linux-2.6-2/include/asm-xtensa/a.out.h =================================================================== --- linux-2.6-2.orig/include/asm-xtensa/a.out.h 2007-06-01 10:27:27.000000000 +0200 +++ linux-2.6-2/include/asm-xtensa/a.out.h 2007-06-01 10:27:30.000000000 +0200 @@ -17,6 +17,7 @@ /* Note: the kernel needs the a.out definitions, even if only ELF is used. */ #define STACK_TOP TASK_SIZE +#define STACK_TOP_MAX STACK_TOP struct exec { -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/4] audit: rework execve audit 2007-06-05 15:05 [PATCH 0/4] no MAX_ARG_PAGES Peter Zijlstra 2007-06-05 15:05 ` [PATCH 1/4] arch: personality independent stack top Peter Zijlstra @ 2007-06-05 15:05 ` Peter Zijlstra 2007-06-05 23:39 ` Andrew Morton 2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra 2007-06-05 15:05 ` [PATCH 4/4] mm: variable length argument support Peter Zijlstra, Ollie Wild 3 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-05 15:05 UTC (permalink / raw) To: linux-kernel, parisc-linux, linux-mm, linux-arch Cc: Ollie Wild, Peter Zijlstra, Andrew Morton, Ingo Molnar, Andi Kleen, linux-audit [-- Attachment #1: execve_audit.patch --] [-- Type: text/plain, Size: 6561 bytes --] The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Cc: linux-audit@redhat.com --- fs/exec.c | 3 + include/linux/binfmts.h | 1 include/linux/sysctl.h | 1 kernel/audit.c | 16 +++++++++ kernel/audit.h | 1 kernel/auditsc.c | 82 ++++++++++++++++++++++++++++++++---------------- kernel/sysctl.c | 11 ++++++ 7 files changed, 89 insertions(+), 26 deletions(-) Index: linux-2.6-2/kernel/auditsc.c =================================================================== --- linux-2.6-2.orig/kernel/auditsc.c 2007-06-05 09:51:53.000000000 +0200 +++ linux-2.6-2/kernel/auditsc.c 2007-06-05 10:03:31.000000000 +0200 @@ -156,7 +156,7 @@ struct audit_aux_data_execve { struct audit_aux_data d; int argc; int envc; - char mem[0]; + struct mm_struct *mm; }; struct audit_aux_data_socketcall { @@ -834,6 +834,47 @@ static int audit_log_pid_context(struct return rc; } +static void audit_log_execve_info(struct audit_buffer *ab, + struct audit_aux_data_execve *axi) +{ + int i; + long len; + const char __user *p = (const char __user *)axi->mm->arg_start; + + if (axi->mm != current->mm) + return; /* execve failed, no additional info */ + + for (i = 0; i < axi->argc; i++, p += len) { + long ret; + char *tmp; + + len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE); + /* + * We just created this mm, if we can't find the strings + * we just copied in something is _very_ wrong. + */ + BUG_ON(!len); + + tmp = kmalloc(len, GFP_KERNEL); + if (!tmp) { + audit_panic("out of memory for argv string\n"); + break; + } + + ret = copy_from_user(tmp, p, len); + /* + * There is no reason for this copy to be short. + */ + BUG_ON(ret); + + audit_log_format(ab, "a%d=", i); + audit_log_untrustedstring(ab, tmp); + audit_log_format(ab, "\n"); + + kfree(tmp); + } +} + static void audit_log_exit(struct audit_context *context, struct task_struct *tsk) { int i, call_panic = 0; @@ -974,13 +1016,7 @@ static void audit_log_exit(struct audit_ case AUDIT_EXECVE: { struct audit_aux_data_execve *axi = (void *)aux; - int i; - const char *p; - for (i = 0, p = axi->mem; i < axi->argc; i++) { - audit_log_format(ab, "a%d=", i); - p = audit_log_untrustedstring(ab, p); - audit_log_format(ab, "\n"); - } + audit_log_execve_info(ab, axi); break; } case AUDIT_SOCKETCALL: { @@ -1824,32 +1860,31 @@ int __audit_ipc_set_perm(unsigned long q return 0; } +int audit_argv_kb = 32; + int audit_bprm(struct linux_binprm *bprm) { struct audit_aux_data_execve *ax; struct audit_context *context = current->audit_context; - unsigned long p, next; - void *to; if (likely(!audit_enabled || !context || context->dummy)) return 0; - ax = kmalloc(sizeof(*ax) + PAGE_SIZE * MAX_ARG_PAGES - bprm->p, - GFP_KERNEL); + /* + * Even though the stack code doesn't limit the arg+env size any more, + * the audit code requires that _all_ arguments be logged in a single + * netlink skb. Hence cap it :-( + */ + if (bprm->argv_len > (audit_argv_kb << 10)) + return -E2BIG; + + ax = kmalloc(sizeof(*ax), GFP_KERNEL); if (!ax) return -ENOMEM; ax->argc = bprm->argc; ax->envc = bprm->envc; - for (p = bprm->p, to = ax->mem; p < MAX_ARG_PAGES*PAGE_SIZE; p = next) { - struct page *page = bprm->page[p / PAGE_SIZE]; - void *kaddr = kmap(page); - next = (p + PAGE_SIZE) & ~(PAGE_SIZE - 1); - memcpy(to, kaddr + (p & (PAGE_SIZE - 1)), next - p); - to += next - p; - kunmap(page); - } - + ax->mm = bprm->mm; ax->d.type = AUDIT_EXECVE; ax->d.next = context->aux; context->aux = (void *)ax; Index: linux-2.6-2/fs/exec.c =================================================================== --- linux-2.6-2.orig/fs/exec.c 2007-06-05 09:51:42.000000000 +0200 +++ linux-2.6-2/fs/exec.c 2007-06-05 10:03:11.000000000 +0200 @@ -1154,6 +1154,7 @@ int do_execve(char * filename, { struct linux_binprm *bprm; struct file *file; + unsigned long tmp; int retval; int i; @@ -1208,9 +1209,11 @@ int do_execve(char * filename, if (retval < 0) goto out; + tmp = bprm->p; retval = copy_strings(bprm->argc, argv, bprm); if (retval < 0) goto out; + bprm->argv_len = tmp - bprm->p; retval = search_binary_handler(bprm,regs); if (retval >= 0) { Index: linux-2.6-2/include/linux/binfmts.h =================================================================== --- linux-2.6-2.orig/include/linux/binfmts.h 2007-06-05 09:51:44.000000000 +0200 +++ linux-2.6-2/include/linux/binfmts.h 2007-06-05 10:03:11.000000000 +0200 @@ -40,6 +40,7 @@ struct linux_binprm{ unsigned interp_flags; unsigned interp_data; unsigned long loader, exec; + unsigned long argv_len; }; #define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0 Index: linux-2.6-2/kernel/sysctl.c =================================================================== --- linux-2.6-2.orig/kernel/sysctl.c 2007-06-05 09:51:53.000000000 +0200 +++ linux-2.6-2/kernel/sysctl.c 2007-06-05 10:04:05.000000000 +0200 @@ -78,6 +78,7 @@ extern int percpu_pagelist_fraction; extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +extern int audit_argv_kb; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -615,6 +616,16 @@ static ctl_table kern_table[] = { .proc_handler = &proc_dointvec, }, #endif +#ifdef CONFIG_AUDITSYSCALL + { + .ctl_name = CTL_UNNUMBERED, + .procname = "audit_argv_kb", + .data = &audit_argv_kb, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, +#endif { .ctl_name = 0 } }; -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] audit: rework execve audit 2007-06-05 15:05 ` [PATCH 2/4] audit: rework execve audit Peter Zijlstra @ 2007-06-05 23:39 ` Andrew Morton 2007-06-06 5:52 ` Peter Zijlstra 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2007-06-05 23:39 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen, linux-audit On Tue, 05 Jun 2007 17:05:25 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > The purpose of audit_bprm() is to log the argv array to a userspace daemon at > the end of the execve system call. Since user-space hasn't had time to run, > this array is still in pristine state on the process' stack; so no need to copy > it, we can just grab it from there. > > In order to minimize the damage to audit_log_*() copy each string into a > temporary kernel buffer first. > > Currently the audit code requires that the full argument vector fits in a > single packet. So currently it does clip the argv size to a (sysctl) limit, but > only when execve auditing is enabled. > > If the audit protocol gets extended to allow for multiple packets this check > can be removed. > > ... > Please try to avoid trigger-happiness with the BUG_ON()s.. > struct audit_aux_data_socketcall { > @@ -834,6 +834,47 @@ static int audit_log_pid_context(struct > return rc; > } > > +static void audit_log_execve_info(struct audit_buffer *ab, > + struct audit_aux_data_execve *axi) > +{ > + int i; > + long len; > + const char __user *p = (const char __user *)axi->mm->arg_start; > + > + if (axi->mm != current->mm) > + return; /* execve failed, no additional info */ > + > + for (i = 0; i < axi->argc; i++, p += len) { > + long ret; > + char *tmp; > + > + len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE); > + /* > + * We just created this mm, if we can't find the strings > + * we just copied in something is _very_ wrong. > + */ > + BUG_ON(!len); > + > + tmp = kmalloc(len, GFP_KERNEL); > + if (!tmp) { > + audit_panic("out of memory for argv string\n"); > + break; > + } > + > + ret = copy_from_user(tmp, p, len); > + /* > + * There is no reason for this copy to be short. > + */ > + BUG_ON(ret); You sure? What happens if another thread does munmap() in parallel? I think I'll make this WARN_ON just out of principle. > + audit_log_format(ab, "a%d=", i); > + audit_log_untrustedstring(ab, tmp); > + audit_log_format(ab, "\n"); > + > + kfree(tmp); > + } > +} > + > > ... > > =================================================================== > --- linux-2.6-2.orig/fs/exec.c 2007-06-05 09:51:42.000000000 +0200 > +++ linux-2.6-2/fs/exec.c 2007-06-05 10:03:11.000000000 +0200 > @@ -1154,6 +1154,7 @@ int do_execve(char * filename, > { > struct linux_binprm *bprm; > struct file *file; > + unsigned long tmp; > int retval; > int i; > > @@ -1208,9 +1209,11 @@ int do_execve(char * filename, > if (retval < 0) > goto out; > > + tmp = bprm->p; > retval = copy_strings(bprm->argc, argv, bprm); > if (retval < 0) > goto out; > + bprm->argv_len = tmp - bprm->p; --- a/include/linux/kernel.h~a +++ a/include/linux/kernel.h @@ -5,6 +5,8 @@ * 'kernel.h' contains some often-used function prototypes etc */ +#define tmp don't call your variables tmp! + #ifdef __KERNEL__ #include <stdarg.h> _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] audit: rework execve audit 2007-06-05 23:39 ` Andrew Morton @ 2007-06-06 5:52 ` Peter Zijlstra 0 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 5:52 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen, linux-audit On Tue, 2007-06-05 at 16:39 -0700, Andrew Morton wrote: > On Tue, 05 Jun 2007 17:05:25 +0200 > Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > The purpose of audit_bprm() is to log the argv array to a userspace daemon at > > the end of the execve system call. Since user-space hasn't had time to run, > > this array is still in pristine state on the process' stack; so no need to copy > > it, we can just grab it from there. > > > > In order to minimize the damage to audit_log_*() copy each string into a > > temporary kernel buffer first. > > > > Currently the audit code requires that the full argument vector fits in a > > single packet. So currently it does clip the argv size to a (sysctl) limit, but > > only when execve auditing is enabled. > > > > If the audit protocol gets extended to allow for multiple packets this check > > can be removed. > > > > ... > > > > Please try to avoid trigger-happiness with the BUG_ON()s.. > > > struct audit_aux_data_socketcall { > > @@ -834,6 +834,47 @@ static int audit_log_pid_context(struct > > return rc; > > } > > > > +static void audit_log_execve_info(struct audit_buffer *ab, > > + struct audit_aux_data_execve *axi) > > +{ > > + int i; > > + long len; > > + const char __user *p = (const char __user *)axi->mm->arg_start; > > + > > + if (axi->mm != current->mm) > > + return; /* execve failed, no additional info */ > > + > > + for (i = 0; i < axi->argc; i++, p += len) { > > + long ret; > > + char *tmp; > > + > > + len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE); > > + /* > > + * We just created this mm, if we can't find the strings > > + * we just copied in something is _very_ wrong. > > + */ > > + BUG_ON(!len); > > + > > + tmp = kmalloc(len, GFP_KERNEL); > > + if (!tmp) { > > + audit_panic("out of memory for argv string\n"); > > + break; > > + } > > + > > + ret = copy_from_user(tmp, p, len); > > + /* > > + * There is no reason for this copy to be short. > > + */ > > + BUG_ON(ret); > > You sure? What happens if another thread does munmap() in parallel? > > I think I'll make this WARN_ON just out of principle. This is right after the execve call, and before we've hit userspace, so at this time there is no runnable context with access to the memory (except this one). > > @@ -1208,9 +1209,11 @@ int do_execve(char * filename, > > if (retval < 0) > > goto out; > > > > + tmp = bprm->p; > > retval = copy_strings(bprm->argc, argv, bprm); > > if (retval < 0) > > goto out; > > + bprm->argv_len = tmp - bprm->p; > > > > > > --- a/include/linux/kernel.h~a > +++ a/include/linux/kernel.h > @@ -5,6 +5,8 @@ > * 'kernel.h' contains some often-used function prototypes etc > */ > > +#define tmp don't call your variables tmp! > + > #ifdef __KERNEL__ > > #include <stdarg.h> Fair enough. :-/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-05 15:05 [PATCH 0/4] no MAX_ARG_PAGES Peter Zijlstra 2007-06-05 15:05 ` [PATCH 1/4] arch: personality independent stack top Peter Zijlstra 2007-06-05 15:05 ` [PATCH 2/4] audit: rework execve audit Peter Zijlstra @ 2007-06-05 15:05 ` Peter Zijlstra 2007-06-05 19:46 ` Christoph Lameter ` (2 more replies) 2007-06-05 15:05 ` [PATCH 4/4] mm: variable length argument support Peter Zijlstra, Ollie Wild 3 siblings, 3 replies; 26+ messages in thread From: Peter Zijlstra @ 2007-06-05 15:05 UTC (permalink / raw) To: linux-kernel, parisc-linux, linux-mm, linux-arch Cc: Ollie Wild, Peter Zijlstra, Andrew Morton, Ingo Molnar, Andi Kleen [-- Attachment #1: move_page_tables_up.patch --] [-- Type: text/plain, Size: 6050 bytes --] Provide functions for moving page tables upwards. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> --- include/linux/mm.h | 7 +++ mm/mremap.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 110 insertions(+), 2 deletions(-) Index: linux-2.6-2/include/linux/mm.h =================================================================== --- linux-2.6-2.orig/include/linux/mm.h 2007-06-01 10:50:58.000000000 +0200 +++ linux-2.6-2/include/linux/mm.h 2007-06-01 10:57:26.000000000 +0200 @@ -788,6 +787,12 @@ int FASTCALL(set_page_dirty(struct page int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); +extern unsigned long move_page_tables(struct vm_area_struct *vma, + unsigned long old_addr, struct vm_area_struct *new_vma, + unsigned long new_addr, unsigned long len); +extern unsigned long move_page_tables_up(struct vm_area_struct *vma, + unsigned long old_addr, struct vm_area_struct *new_vma, + unsigned long new_addr, unsigned long len); extern unsigned long do_mremap(unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags, unsigned long new_addr); Index: linux-2.6-2/mm/mremap.c =================================================================== --- linux-2.6-2.orig/mm/mremap.c 2007-06-01 10:50:58.000000000 +0200 +++ linux-2.6-2/mm/mremap.c 2007-06-01 10:57:45.000000000 +0200 @@ -118,9 +118,63 @@ static void move_ptes(struct vm_area_str spin_unlock(&mapping->i_mmap_lock); } +static void move_ptes_up(struct vm_area_struct *vma, pmd_t *old_pmd, + unsigned long old_addr, unsigned long old_end, + struct vm_area_struct *new_vma, pmd_t *new_pmd, + unsigned long new_addr) +{ + struct address_space *mapping = NULL; + struct mm_struct *mm = vma->vm_mm; + pte_t *old_pte, *new_pte, pte; + spinlock_t *old_ptl, *new_ptl; + unsigned long new_end = new_addr + (old_end - old_addr); + + if (vma->vm_file) { + /* + * Subtle point from Rajesh Venkatasubramanian: before + * moving file-based ptes, we must lock vmtruncate out, + * since it might clean the dst vma before the src vma, + * and we propagate stale pages into the dst afterward. + */ + mapping = vma->vm_file->f_mapping; + spin_lock(&mapping->i_mmap_lock); + if (new_vma->vm_truncate_count && + new_vma->vm_truncate_count != vma->vm_truncate_count) + new_vma->vm_truncate_count = 0; + } + + /* + * We don't have to worry about the ordering of src and dst + * pte locks because exclusive mmap_sem prevents deadlock. + */ + old_pte = pte_offset_map_lock(mm, old_pmd, old_end-1, &old_ptl); + new_pte = pte_offset_map_nested(new_pmd, new_end-1); + new_ptl = pte_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + arch_enter_lazy_mmu_mode(); + + for (; old_end > old_addr; old_pte--, old_end -= PAGE_SIZE, + new_pte--, new_end -= PAGE_SIZE) { + if (pte_none(*old_pte)) + continue; + pte = ptep_clear_flush(vma, old_end-1, old_pte); + pte = move_pte(pte, new_vma->vm_page_prot, old_end-1, new_end-1); + set_pte_at(mm, new_end-1, new_pte, pte); + } + + arch_leave_lazy_mmu_mode(); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + pte_unmap_nested(new_pte - 1); + pte_unmap_unlock(old_pte - 1, old_ptl); + if (mapping) + spin_unlock(&mapping->i_mmap_lock); +} + #define LATENCY_LIMIT (64 * PAGE_SIZE) -static unsigned long move_page_tables(struct vm_area_struct *vma, +unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len) { @@ -132,21 +186,25 @@ static unsigned long move_page_tables(st for (; old_addr < old_end; old_addr += extent, new_addr += extent) { cond_resched(); + next = (old_addr + PMD_SIZE) & PMD_MASK; if (next - 1 > old_end) next = old_end; extent = next - old_addr; + old_pmd = get_old_pmd(vma->vm_mm, old_addr); if (!old_pmd) continue; new_pmd = alloc_new_pmd(vma->vm_mm, new_addr); if (!new_pmd) break; + next = (new_addr + PMD_SIZE) & PMD_MASK; if (extent > next - new_addr) extent = next - new_addr; if (extent > LATENCY_LIMIT) extent = LATENCY_LIMIT; + move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, new_pmd, new_addr); } @@ -154,6 +212,51 @@ static unsigned long move_page_tables(st return len + old_addr - old_end; /* how much done */ } +unsigned long move_page_tables_up(struct vm_area_struct *vma, + unsigned long old_addr, struct vm_area_struct *new_vma, + unsigned long new_addr, unsigned long len) +{ + unsigned long extent, prev, old_end, new_end; + pmd_t *old_pmd, *new_pmd; + + old_end = old_addr + len; + new_end = new_addr + len; + flush_cache_range(vma, old_addr, old_end); + + for (; old_end > old_addr; old_end -= extent, new_end -= extent) { + cond_resched(); + + /* + * calculate how far till prev PMD boundary for old + */ + prev = (old_end - 1) & PMD_MASK; + if (prev < old_addr) + prev = old_addr; + extent = old_end - prev; + + old_pmd = get_old_pmd(vma->vm_mm, old_end-1); + if (!old_pmd) + continue; + new_pmd = alloc_new_pmd(vma->vm_mm, new_end-1); + if (!new_pmd) + break; + + /* + * calculate and clip to prev PMD boundary for new + */ + prev = (new_end - 1) & PMD_MASK; + if (extent > new_end - prev) + extent = new_end - prev; + if (extent > LATENCY_LIMIT) + extent = LATENCY_LIMIT; + + move_ptes_up(vma, old_pmd, old_end - extent, old_end, + new_vma, new_pmd, new_end - extent); + } + + return old_addr + len - old_end; +} + static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr) -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra @ 2007-06-05 19:46 ` Christoph Lameter 2007-06-05 23:39 ` Andrew Morton 2007-06-06 19:06 ` Ollie Wild 2 siblings, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2007-06-05 19:46 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Andrew Morton, Ingo Molnar, Andi Kleen, Mel Gorman On Tue, 5 Jun 2007, Peter Zijlstra wrote: > Provide functions for moving page tables upwards. Could you make this more general so that it allows arbitrary page table pages moving? That would be useful for Mel's memory defragmentation since it increases the types of pages that can be moved. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra 2007-06-05 19:46 ` Christoph Lameter @ 2007-06-05 23:39 ` Andrew Morton 2007-06-06 19:06 ` Ollie Wild 2 siblings, 0 replies; 26+ messages in thread From: Andrew Morton @ 2007-06-05 23:39 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Tue, 05 Jun 2007 17:05:26 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > Provide functions for moving page tables upwards. > > ... > > +extern unsigned long move_page_tables(struct vm_area_struct *vma, > + unsigned long old_addr, struct vm_area_struct *new_vma, > + unsigned long new_addr, unsigned long len); > +extern unsigned long move_page_tables_up(struct vm_area_struct *vma, > + unsigned long old_addr, struct vm_area_struct *new_vma, > + unsigned long new_addr, unsigned long len); > extern unsigned long do_mremap(unsigned long addr, > unsigned long old_len, unsigned long new_len, > unsigned long flags, unsigned long new_addr); They become kernel-wide > +static void move_ptes_up(struct vm_area_struct *vma, pmd_t *old_pmd, > + unsigned long old_addr, unsigned long old_end, > + struct vm_area_struct *new_vma, pmd_t *new_pmd, > + unsigned long new_addr) So some documentation might be in order... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra 2007-06-05 19:46 ` Christoph Lameter 2007-06-05 23:39 ` Andrew Morton @ 2007-06-06 19:06 ` Ollie Wild 2007-06-06 19:12 ` Peter Zijlstra 2 siblings, 1 reply; 26+ messages in thread From: Ollie Wild @ 2007-06-06 19:06 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Andrew Morton, Ingo Molnar, Andi Kleen On 6/5/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > Provide functions for moving page tables upwards. Now that we're initializing the temporary stack location to STACK_TOP_MAX, do we still need move_page_tables_up() for variable length argument support? I originally added it into shift_arg_pages() to support 32-bit apps exec'ing 64-bit apps when we were using TASK_SIZE as our temporary location. Maybe we should decouple this patch from the others and submit it as an enhancement to support memory defragmentation. Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-06 19:06 ` Ollie Wild @ 2007-06-06 19:12 ` Peter Zijlstra 2007-06-06 19:50 ` Ollie Wild 0 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 19:12 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Andrew Morton, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 12:06 -0700, Ollie Wild wrote: > On 6/5/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > Provide functions for moving page tables upwards. > > Now that we're initializing the temporary stack location to > STACK_TOP_MAX, do we still need move_page_tables_up() for variable > length argument support? I originally added it into shift_arg_pages() > to support 32-bit apps exec'ing 64-bit apps when we were using > TASK_SIZE as our temporary location. > > Maybe we should decouple this patch from the others and submit it as > an enhancement to support memory defragmentation. PA-RISC will still need it, right? On the defrag thingy, I talked with Mel today, and neither of us can see a usefull application of these functions to his defrag work. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-06 19:12 ` Peter Zijlstra @ 2007-06-06 19:50 ` Ollie Wild 2007-06-06 19:53 ` Peter Zijlstra 0 siblings, 1 reply; 26+ messages in thread From: Ollie Wild @ 2007-06-06 19:50 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Andrew Morton, Ingo Molnar, Andi Kleen On 6/6/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > PA-RISC will still need it, right? Originally, I thought since the PA-RISC stack grows up, we'd want to place the stack at the bottom of memory and have copy_strings() and friends work in the opposite direction. It turns out, though, that this ends up being way more headache than it's worth, so I just manually grow the stack down with expand_downwards(). Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] mm: move_page_tables{,_up} 2007-06-06 19:50 ` Ollie Wild @ 2007-06-06 19:53 ` Peter Zijlstra 0 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 19:53 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Andrew Morton, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 12:50 -0700, Ollie Wild wrote: > On 6/6/07, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > PA-RISC will still need it, right? > > Originally, I thought since the PA-RISC stack grows up, we'd want to > place the stack at the bottom of memory and have copy_strings() and > friends work in the opposite direction. It turns out, though, that > this ends up being way more headache than it's worth, so I just > manually grow the stack down with expand_downwards(). Ah, ok. I'll drop this whole patch then. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 4/4] mm: variable length argument support 2007-06-05 15:05 [PATCH 0/4] no MAX_ARG_PAGES Peter Zijlstra ` (2 preceding siblings ...) 2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra @ 2007-06-05 15:05 ` Peter Zijlstra, Ollie Wild 2007-06-05 23:39 ` Andrew Morton 2007-06-06 8:36 ` Andrew Morton 3 siblings, 2 replies; 26+ messages in thread From: Peter Zijlstra, Ollie Wild @ 2007-06-05 15:05 UTC (permalink / raw) To: linux-kernel, parisc-linux, linux-mm, linux-arch Cc: Ollie Wild, Peter Zijlstra, Andrew Morton, Ingo Molnar, Andi Kleen [-- Attachment #1: no_MAX_ARG_PAGES.patch --] [-- Type: text/plain, Size: 43729 bytes --] Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/ia64/ia32/binfmt_elf32.c | 61 ---- arch/um/kernel/trap.c | 2 arch/x86_64/ia32/ia32_aout.c | 2 arch/x86_64/ia32/ia32_binfmt.c | 58 --- fs/binfmt_elf.c | 28 + fs/binfmt_elf_fdpic.c | 8 fs/binfmt_misc.c | 4 fs/binfmt_script.c | 4 fs/compat.c | 130 +++----- fs/exec.c | 566 +++++++++++++++++++++++--------------- include/asm-um/processor-i386.h | 3 include/asm-um/processor-x86_64.h | 3 include/linux/binfmts.h | 17 - include/linux/mm.h | 7 kernel/auditsc.c | 2 mm/mmap.c | 56 ++- mm/mprotect.c | 2 17 files changed, 495 insertions(+), 458 deletions(-) Index: linux-2.6-2/arch/um/kernel/trap.c =================================================================== --- linux-2.6-2.orig/arch/um/kernel/trap.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/arch/um/kernel/trap.c 2007-06-05 16:29:45.000000000 +0200 @@ -61,8 +61,6 @@ int handle_page_fault(unsigned long addr goto good_area; else if(!(vma->vm_flags & VM_GROWSDOWN)) goto out; - else if(is_user && !ARCH_IS_STACKGROW(address)) - goto out; else if(expand_stack(vma, address)) goto out; Index: linux-2.6-2/arch/x86_64/ia32/ia32_binfmt.c =================================================================== --- linux-2.6-2.orig/arch/x86_64/ia32/ia32_binfmt.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/arch/x86_64/ia32/ia32_binfmt.c 2007-06-05 16:29:45.000000000 +0200 @@ -232,9 +232,6 @@ do { \ #define load_elf_binary load_elf32_binary #define ELF_PLAT_INIT(r, load_addr) elf32_init(r) -#define setup_arg_pages(bprm, stack_top, exec_stack) \ - ia32_setup_arg_pages(bprm, stack_top, exec_stack) -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack); #undef start_thread #define start_thread(regs,new_rip,new_rsp) do { \ @@ -286,61 +283,6 @@ static void elf32_init(struct pt_regs *r me->thread.es = __USER_DS; } -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, - int executable_stack) -{ - unsigned long stack_base; - struct vm_area_struct *mpnt; - struct mm_struct *mm = current->mm; - int i, ret; - - stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE; - mm->arg_start = bprm->p + stack_base; - - bprm->p += stack_base; - if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); - if (!mpnt) - return -ENOMEM; - - down_write(&mm->mmap_sem); - { - mpnt->vm_mm = mm; - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; - mpnt->vm_end = stack_top; - if (executable_stack == EXSTACK_ENABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; - else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? - PAGE_COPY_EXEC : PAGE_COPY; - if ((ret = insert_vm_struct(mm, mpnt))) { - up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); - return ret; - } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } - - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); - } - stack_base += PAGE_SIZE; - } - up_write(&mm->mmap_sem); - - return 0; -} -EXPORT_SYMBOL(ia32_setup_arg_pages); - #ifdef CONFIG_SYSCTL /* Register vsyscall32 into the ABI table */ #include <linux/sysctl.h> Index: linux-2.6-2/fs/binfmt_elf.c =================================================================== --- linux-2.6-2.orig/fs/binfmt_elf.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/fs/binfmt_elf.c 2007-06-05 16:29:45.000000000 +0200 @@ -148,6 +148,7 @@ create_elf_tables(struct linux_binprm *b elf_addr_t *elf_info; int ei_index = 0; struct task_struct *tsk = current; + struct vm_area_struct *vma; /* * If this architecture has a platform capability string, copy it @@ -234,6 +235,15 @@ create_elf_tables(struct linux_binprm *b sp = (elf_addr_t __user *)bprm->p; #endif + + /* + * Grow the stack manually; some architectures have a limit on how + * far ahead a user-space access may be in order to grow the stack. + */ + vma = find_extend_vma(current->mm, bprm->p); + if (!vma) + return -EFAULT; + /* Now, let's put argc (and argv, envp if appropriate) on the stack */ if (__put_user(argc, sp++)) return -EFAULT; @@ -254,8 +264,8 @@ create_elf_tables(struct linux_binprm *b size_t len; if (__put_user((elf_addr_t)p, argv++)) return -EFAULT; - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -266,8 +276,8 @@ create_elf_tables(struct linux_binprm *b size_t len; if (__put_user((elf_addr_t)p, envp++)) return -EFAULT; - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -777,10 +787,6 @@ static int load_elf_binary(struct linux_ } /* OK, This is the point of no return */ - current->mm->start_data = 0; - current->mm->end_data = 0; - current->mm->end_code = 0; - current->mm->mmap = NULL; current->flags &= ~PF_FORKNOEXEC; current->mm->def_flags = def_flags; @@ -988,9 +994,13 @@ static int load_elf_binary(struct linux_ compute_creds(bprm); current->flags &= ~PF_FORKNOEXEC; - create_elf_tables(bprm, &loc->elf_ex, + retval = create_elf_tables(bprm, &loc->elf_ex, (interpreter_type == INTERPRETER_AOUT), load_addr, interp_load_addr); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + goto out; + } /* N.B. passed_fileno might not be initialized? */ if (interpreter_type == INTERPRETER_AOUT) current->mm->arg_start += strlen(passed_fileno) + 1; Index: linux-2.6-2/fs/binfmt_elf_fdpic.c =================================================================== --- linux-2.6-2.orig/fs/binfmt_elf_fdpic.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/fs/binfmt_elf_fdpic.c 2007-06-05 16:29:45.000000000 +0200 @@ -621,8 +621,8 @@ static int create_elf_fdpic_tables(struc p = (char __user *) current->mm->arg_start; for (loop = bprm->argc; loop > 0; loop--) { __put_user((elf_caddr_t) p, argv++); - len = strnlen_user(p, PAGE_SIZE * MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE * MAX_ARG_PAGES) + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return -EINVAL; p += len; } @@ -633,8 +633,8 @@ static int create_elf_fdpic_tables(struc current->mm->env_start = (unsigned long) p; for (loop = bprm->envc; loop > 0; loop--) { __put_user((elf_caddr_t)(unsigned long) p, envp++); - len = strnlen_user(p, PAGE_SIZE * MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE * MAX_ARG_PAGES) + len = strnlen_user(p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return -EINVAL; p += len; } Index: linux-2.6-2/fs/binfmt_misc.c =================================================================== --- linux-2.6-2.orig/fs/binfmt_misc.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/fs/binfmt_misc.c 2007-06-05 16:29:45.000000000 +0200 @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux goto _ret; if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + goto _ret; } if (fmt->flags & MISC_FMT_OPEN_BINARY) { Index: linux-2.6-2/fs/binfmt_script.c =================================================================== --- linux-2.6-2.orig/fs/binfmt_script.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/fs/binfmt_script.c 2007-06-05 16:29:45.000000000 +0200 @@ -67,7 +67,9 @@ static int load_script(struct linux_binp * This is done in reverse order, because of how the * user environment and arguments are stored. */ - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + return retval; retval = copy_strings_kernel(1, &bprm->interp, bprm); if (retval < 0) return retval; bprm->argc++; Index: linux-2.6-2/fs/compat.c =================================================================== --- linux-2.6-2.orig/fs/compat.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/fs/compat.c 2007-06-05 16:29:45.000000000 +0200 @@ -1257,6 +1257,7 @@ static int compat_copy_strings(int argc, { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -1265,92 +1266,84 @@ static int compat_copy_strings(int argc, unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(compat_ptr(str), bprm->p))) { + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { + if (MAX_ARG_STRLEN < len) { ret = -E2BIG; goto out; } - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; - if (!page) { - ret = -ENOMEM; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + +#ifdef CONFIG_STACK_GROWSUP + ret = expand_downwards(bprm->vma, pos); + if (ret < 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; + goto out; + } +#endif + ret = get_user_pages(current, bprm->mm, pos, + 1, 1, 1, &page, NULL); + if (ret <= 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { + flush_kernel_dcache_page(kmapped_page); kunmap(kmapped_page); + put_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; + flush_cache_page(bprm->vma, kpos, + page_to_pfn(kmapped_page)); } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, compat_ptr(str), - bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, compat_ptr(str), + bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { + flush_kernel_dcache_page(kmapped_page); kunmap(kmapped_page); - return ret; -} - -#ifdef CONFIG_MMU - -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; + put_page(kmapped_page); } + return ret; } -#endif /* CONFIG_MMU */ - /* * compat_do_execve() is mostly a copy of do_execve(), with the exception * that it processes 32 bit argv and envp pointers. @@ -1363,7 +1356,6 @@ int compat_do_execve(char * filename, struct linux_binprm *bprm; struct file *file; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1377,24 +1369,19 @@ int compat_do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t)); + bprm->argc = compat_count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t)); + bprm->envc = compat_count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1419,10 +1406,8 @@ int compat_do_execve(char * filename, if (retval < 0) goto out; - retval = search_binary_handler(bprm, regs); + retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ security_bprm_free(bprm); acct_update_integrals(current); @@ -1431,19 +1416,12 @@ int compat_do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { Index: linux-2.6-2/fs/exec.c =================================================================== --- linux-2.6-2.orig/fs/exec.c 2007-06-05 16:29:41.000000000 +0200 +++ linux-2.6-2/fs/exec.c 2007-06-05 16:29:45.000000000 +0200 @@ -54,6 +54,7 @@ #include <asm/uaccess.h> #include <asm/mmu_context.h> +#include <asm/tlb.h> #ifdef CONFIG_KMOD #include <linux/kmod.h> @@ -178,6 +179,157 @@ exit: goto out; } +#ifdef CONFIG_MMU + +static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, + int write) +{ + struct page *page; + int ret; + +#ifdef CONFIG_STACK_GROWSUP + if (write) { + ret = expand_downwards(bprm->vma, pos); + if (ret < 0) + return NULL; + } +#endif + ret = get_user_pages(current, bprm->mm, pos, + 1, write, 1, &page, NULL); + if (ret <= 0) + return NULL; + + return page; +} + +static void put_arg_page(struct page *page) +{ + put_page(page); +} + +static void free_arg_page(struct linux_binprm *bprm, int i) +{ +} + +static void free_arg_pages(struct linux_binprm *bprm) +{ +} + +#else + +static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, + int write) +{ + struct page *page; + + page = bprm->page[pos / PAGE_SIZE]; + if (!page && write) { + page = alloc_page(GFP_HIGHUSER|__GFP_ZERO); + if (!page) + return NULL; + bprm->page[pos / PAGE_SIZE] = page; + } + + return page; +} + +static void put_arg_page(struct page *page) +{ +} + +static void free_arg_page(struct linux_binprm *bprm, int i) +{ + if (bprm->page[i]) { + __free_page(bprm->page[i]); + bprm->page[i] = NULL; + } +} + +static void free_arg_pages(struct linux_binprm *bprm) +{ + int i; + + for (i = 0; i < MAX_ARG_PAGES; i++) + free_arg_page(bprm, i); +} + +#endif /* CONFIG_MMU */ + +/* + * Create a new mm_struct and populate it with a temporary stack + * vm_area_struct. We don't have enough context at this point to set the stack + * flags, permissions, and offset, so we use temporary values. We'll update + * them later in setup_arg_pages(). + */ +int bprm_mm_init(struct linux_binprm *bprm) +{ + int err; + struct mm_struct *mm = NULL; + struct vm_area_struct *vma = NULL; + + bprm->mm = mm = mm_alloc(); + err = -ENOMEM; + if (!mm) + goto err; + + if ((err = init_new_context(current, mm))) + goto err; + +#ifdef CONFIG_MMU + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); + err = -ENOMEM; + if (!vma) + goto err; + + down_write(&mm->mmap_sem); + { + vma->vm_mm = mm; + + /* + * Place the stack at the top of user memory. Later, we'll + * move this to an appropriate place. We don't use STACK_TOP + * because that can depend on attributes which aren't + * configured yet. + */ + vma->vm_end = STACK_TOP_MAX; + vma->vm_start = vma->vm_end - PAGE_SIZE; + + vma->vm_flags = VM_STACK_FLAGS; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + if ((err = insert_vm_struct(mm, vma))) { + up_write(&mm->mmap_sem); + goto err; + } + + mm->stack_vm = mm->total_vm = 1; + } + up_write(&mm->mmap_sem); + + bprm->p = vma->vm_end - sizeof(void *); +#else + bprm->p = PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *); +#endif + + return 0; + +err: +#ifdef CONFIG_MMU + if (vma) { + bprm->vma = NULL; + kmem_cache_free(vm_area_cachep, vma); + } +#endif + + if (mm) { + bprm->mm = NULL; + mmdrop(mm); + } + + return err; +} + +EXPORT_SYMBOL(bprm_mm_init); + /* * count() counts the number of strings in array ARGV. */ @@ -203,15 +355,16 @@ static int count(char __user * __user * } /* - * 'copy_strings()' copies argument/environment strings from user - * memory to free pages in kernel mem. These are in a format ready - * to be put directly into the top of new user memory. + * 'copy_strings()' copies argument/environment strings from the old + * processes's memory to the new process's stack. The call to get_user_pages() + * ensures the destination page is created and not swapped out. */ static int copy_strings(int argc, char __user * __user * argv, struct linux_binprm *bprm) { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -220,69 +373,77 @@ static int copy_strings(int argc, char _ unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(str, bprm->p))) { + !(len = strnlen_user(str, MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { +#ifdef CONFIG_MMU + if (MAX_ARG_STRLEN < len) { + ret = -E2BIG; + goto out; + } +#else + if (bprm->p < len) { ret = -E2BIG; goto out; } +#endif - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + + page = get_arg_page(bprm, pos, 1); if (!page) { - ret = -ENOMEM; + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { + flush_kernel_dcache_page(kmapped_page); kunmap(kmapped_page); + put_arg_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; + flush_cache_page(bprm->vma, kpos, + page_to_pfn(kmapped_page)); } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, str, bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { + flush_kernel_dcache_page(kmapped_page); kunmap(kmapped_page); + put_arg_page(kmapped_page); + } return ret; } @@ -302,154 +463,157 @@ int copy_strings_kernel(int argc,char ** EXPORT_SYMBOL(copy_strings_kernel); #ifdef CONFIG_MMU -/* - * This routine is used to map in a page into an address space: needed by - * execve() for the initial stack and environment pages. - * - * vma->vm_mm->mmap_sem is held for writing. - */ -void install_arg_page(struct vm_area_struct *vma, - struct page *page, unsigned long address) + +static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) { struct mm_struct *mm = vma->vm_mm; - pte_t * pte; - spinlock_t *ptl; + unsigned long old_start = vma->vm_start; + unsigned long old_end = vma->vm_end; + unsigned long length = old_end - old_start; + unsigned long new_start = old_start + shift; + unsigned long new_end = old_end + shift; + struct mmu_gather *tlb; + + BUG_ON(new_start > new_end); + + if (new_start < old_start) { + if (vma != find_vma(mm, new_start)) + return -EFAULT; + + vma_adjust(vma, new_start, old_end, + vma->vm_pgoff - (-shift >> PAGE_SHIFT), NULL); + + if (length != move_page_tables(vma, old_start, + vma, new_start, length)) + return -ENOMEM; + + lru_add_drain(); + tlb = tlb_gather_mmu(mm, 0); + if (new_end > old_start) + free_pgd_range(&tlb, new_end, old_end, new_end, + vma->vm_next ? vma->vm_next->vm_start : 0); + else + free_pgd_range(&tlb, old_start, old_end, new_end, + vma->vm_next ? vma->vm_next->vm_start : 0); + tlb_finish_mmu(tlb, new_end, old_end); - if (unlikely(anon_vma_prepare(vma))) - goto out; + vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL); + } else { + struct vm_area_struct *tmp, *prev; - flush_dcache_page(page); - pte = get_locked_pte(mm, address, &ptl); - if (!pte) - goto out; - if (!pte_none(*pte)) { - pte_unmap_unlock(pte, ptl); - goto out; + tmp = find_vma_prev(mm, new_end, &prev); + if ((tmp && tmp->vm_start < new_end) || prev != vma) + return -EFAULT; + + find_vma_prev(mm, vma->vm_start, &prev); + + vma_adjust(vma, old_start, new_end, vma->vm_pgoff, NULL); + + if (length != move_page_tables_up(vma, old_start, + vma, new_start, length)) + return -ENOMEM; + + lru_add_drain(); + tlb = tlb_gather_mmu(mm, 0); + free_pgd_range(&tlb, old_start, new_start, + prev ? prev->vm_end: 0, new_start); + tlb_finish_mmu(tlb, old_start, new_start); + + vma_adjust(vma, new_start, new_end, + vma->vm_pgoff + (shift >> PAGE_SHIFT), NULL); } - inc_mm_counter(mm, anon_rss); - lru_cache_add_active(page); - set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte( - page, vma->vm_page_prot)))); - page_add_new_anon_rmap(page, vma, address); - pte_unmap_unlock(pte, ptl); - /* no need for flush_tlb */ - return; -out: - __free_page(page); - force_sig(SIGKILL, current); + return 0; } #define EXTRA_STACK_VM_PAGES 20 /* random */ +/* Finalizes the stack vm_area_struct. The flags and permissions are updated, + * the stack is optionally relocated, and some extra space is added. + */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; + unsigned long ret; + unsigned long stack_base, stack_shift; struct mm_struct *mm = current->mm; - int i, ret; - long arg_size; + struct vm_area_struct *vma = bprm->vma; #ifdef CONFIG_STACK_GROWSUP - /* Move the argument and environment strings to the bottom of the - * stack space. - */ - int offset, j; - char *to, *from; - - /* Start by shifting all the pages down */ - i = 0; - for (j = 0; j < MAX_ARG_PAGES; j++) { - struct page *page = bprm->page[j]; - if (!page) - continue; - bprm->page[i++] = page; - } - - /* Now move them within their pages */ - offset = bprm->p % PAGE_SIZE; - to = kmap(bprm->page[0]); - for (j = 1; j < i; j++) { - memmove(to, to + offset, PAGE_SIZE - offset); - from = kmap(bprm->page[j]); - memcpy(to + PAGE_SIZE - offset, from, offset); - kunmap(bprm->page[j - 1]); - to = from; - } - memmove(to, to + offset, PAGE_SIZE - offset); - kunmap(bprm->page[j - 1]); - /* Limit stack size to 1GB */ stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; if (stack_base > (1 << 30)) stack_base = 1 << 30; - stack_base = PAGE_ALIGN(stack_top - stack_base); - /* Adjust bprm->p to point to the end of the strings. */ - bprm->p = stack_base + PAGE_SIZE * i - offset; + /* Make sure we didn't let the argument array grow too large. */ + if (vma->vm_end - vma->vm_start > stack_base) + return -ENOMEM; - mm->arg_start = stack_base; - arg_size = i << PAGE_SHIFT; + stack_base = PAGE_ALIGN(stack_top - stack_base); - /* zero pages that were copied above */ - while (i < MAX_ARG_PAGES) - bprm->page[i++] = NULL; + stack_shift = stack_base - vma->vm_start; + mm->arg_start = bprm->p + stack_shift; + bprm->p = vma->vm_end + stack_shift; #else - stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE); - stack_base = PAGE_ALIGN(stack_base); - bprm->p += stack_base; + BUG_ON(stack_top & ~PAGE_MASK); + + stack_top = arch_align_stack(stack_top); + stack_top = PAGE_ALIGN(stack_top); + stack_shift = stack_top - vma->vm_end; + + bprm->p += stack_shift; mm->arg_start = bprm->p; - arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start); #endif - arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; - if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); - if (!mpnt) - return -ENOMEM; + bprm->loader += stack_shift; + bprm->exec += stack_shift; down_write(&mm->mmap_sem); { - mpnt->vm_mm = mm; -#ifdef CONFIG_STACK_GROWSUP - mpnt->vm_start = stack_base; - mpnt->vm_end = stack_base + arg_size; -#else - mpnt->vm_end = stack_top; - mpnt->vm_start = mpnt->vm_end - arg_size; -#endif + struct vm_area_struct *prev = NULL; + unsigned long vm_flags = vma->vm_flags; + /* Adjust stack execute permissions; explicitly enable * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X * and leave alone (arch default) otherwise. */ if (unlikely(executable_stack == EXSTACK_ENABLE_X)) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; + vm_flags |= VM_EXEC; else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_flags |= mm->def_flags; - mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7]; - if ((ret = insert_vm_struct(mm, mpnt))) { + vm_flags &= ~VM_EXEC; + vm_flags |= mm->def_flags; + + ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, + vm_flags); + if (ret) { up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); return ret; } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } + BUG_ON(prev != vma); - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); + /* Move stack pages down in memory. */ + if (stack_shift) { + ret = shift_arg_pages(vma, stack_shift); + if (ret) { + up_write(&mm->mmap_sem); + return ret; + } + } + +#ifdef CONFIG_STACK_GROWSUP + if (expand_stack(vma, vma->vm_end + + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { + up_write(&mm->mmap_sem); + return -EFAULT; + } +#else + if (expand_stack(vma, vma->vm_start - + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { + up_write(&mm->mmap_sem); + return -EFAULT; } - stack_base += PAGE_SIZE; +#endif } up_write(&mm->mmap_sem); @@ -458,21 +622,6 @@ int setup_arg_pages(struct linux_binprm EXPORT_SYMBOL(setup_arg_pages); -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; - } -} - #endif /* CONFIG_MMU */ struct file *open_exec(const char *name) @@ -1000,44 +1149,44 @@ EXPORT_SYMBOL(compute_creds); * points to; chop off the first by relocating brpm->p to right after * the first '\0' encountered. */ -void remove_arg_zero(struct linux_binprm *bprm) +int remove_arg_zero(struct linux_binprm *bprm) { - if (bprm->argc) { - char ch; - - do { - unsigned long offset; - unsigned long index; - char *kaddr; - struct page *page; - - offset = bprm->p & ~PAGE_MASK; - index = bprm->p >> PAGE_SHIFT; - - page = bprm->page[index]; - kaddr = kmap_atomic(page, KM_USER0); - - /* run through page until we reach end or find NUL */ - do { - ch = *(kaddr + offset); - - /* discard that character... */ - bprm->p++; - offset++; - } while (offset < PAGE_SIZE && ch != '\0'); - - kunmap_atomic(kaddr, KM_USER0); - - /* free the old page */ - if (offset == PAGE_SIZE) { - __free_page(page); - bprm->page[index] = NULL; - } - } while (ch != '\0'); + int ret = 0; + unsigned long offset; + char *kaddr; + struct page *page; + + if (!bprm->argc) + return 0; + + do { + offset = bprm->p & ~PAGE_MASK; + page = get_arg_page(bprm, bprm->p, 0); + if (!page) { + ret = -EFAULT; + goto out; + } + kaddr = kmap_atomic(page, KM_USER0); + + for (; offset < PAGE_SIZE && kaddr[offset]; + offset++, bprm->p++) + ; + + kunmap_atomic(kaddr, KM_USER0); + put_arg_page(page); + + if (offset == PAGE_SIZE) + free_arg_page(bprm, (bprm->p >> PAGE_SHIFT) - 1); + } while (offset == PAGE_SIZE); + + bprm->p++; + bprm->argc--; + ret = 0; - bprm->argc--; - } +out: + return ret; } + EXPORT_SYMBOL(remove_arg_zero); /* @@ -1062,7 +1211,7 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; - loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); + loader = bprm->vma->vm_end - sizeof(void *); file = open_exec("/sbin/loader"); retval = PTR_ERR(file); @@ -1156,7 +1305,6 @@ int do_execve(char * filename, struct file *file; unsigned long tmp; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1170,25 +1318,19 @@ int do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); - bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = count(argv, bprm->p / sizeof(void *)); + bprm->argc = count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = count(envp, bprm->p / sizeof(void *)); + bprm->envc = count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1217,9 +1359,8 @@ int do_execve(char * filename, retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ + free_arg_pages(bprm); security_bprm_free(bprm); acct_update_integrals(current); kfree(bprm); @@ -1227,26 +1368,19 @@ int do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - + free_arg_pages(bprm); if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { allow_write_access(bprm->file); fput(bprm->file); } - out_kfree: kfree(bprm); Index: linux-2.6-2/include/asm-um/processor-i386.h =================================================================== --- linux-2.6-2.orig/include/asm-um/processor-i386.h 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/include/asm-um/processor-i386.h 2007-06-05 16:29:45.000000000 +0200 @@ -67,9 +67,6 @@ static inline void rep_nop(void) #define current_text_addr() \ ({ void *pc; __asm__("movl $1f,%0\n1:":"=g" (pc)); pc; }) -#define ARCH_IS_STACKGROW(address) \ - (address + 32 >= UPT_SP(¤t->thread.regs.regs)) - #define KSTK_EIP(tsk) KSTK_REG(tsk, EIP) #define KSTK_ESP(tsk) KSTK_REG(tsk, UESP) #define KSTK_EBP(tsk) KSTK_REG(tsk, EBP) Index: linux-2.6-2/include/asm-um/processor-x86_64.h =================================================================== --- linux-2.6-2.orig/include/asm-um/processor-x86_64.h 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/include/asm-um/processor-x86_64.h 2007-06-05 16:29:45.000000000 +0200 @@ -44,9 +44,6 @@ static inline void arch_copy_thread(stru #define current_text_addr() \ ({ void *pc; __asm__("movq $1f,%0\n1:":"=g" (pc)); pc; }) -#define ARCH_IS_STACKGROW(address) \ - (address + 128 >= UPT_SP(¤t->thread.regs.regs)) - #define KSTK_EIP(tsk) KSTK_REG(tsk, RIP) #define KSTK_ESP(tsk) KSTK_REG(tsk, RSP) Index: linux-2.6-2/include/linux/binfmts.h =================================================================== --- linux-2.6-2.orig/include/linux/binfmts.h 2007-06-05 16:29:41.000000000 +0200 +++ linux-2.6-2/include/linux/binfmts.h 2007-06-05 16:29:45.000000000 +0200 @@ -5,12 +5,9 @@ struct pt_regs; -/* - * MAX_ARG_PAGES defines the number of pages allocated for arguments - * and envelope for the new program. 32 should suffice, this gives - * a maximum env+arg of 128kB w/4KB pages! - */ -#define MAX_ARG_PAGES 32 +/* FIXME: Find real limits, or none. */ +#define MAX_ARG_STRLEN (PAGE_SIZE * 32) +#define MAX_ARG_STRINGS 0x7FFFFFFF /* sizeof(linux_binprm->buf) */ #define BINPRM_BUF_SIZE 128 @@ -24,7 +21,12 @@ struct pt_regs; */ struct linux_binprm{ char buf[BINPRM_BUF_SIZE]; +#ifdef CONFIG_MMU + struct vm_area_struct *vma; +#else +# define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; +#endif struct mm_struct *mm; unsigned long p; /* current top of mem */ int sh_bang; @@ -69,7 +71,7 @@ extern int register_binfmt(struct linux_ extern int unregister_binfmt(struct linux_binfmt *); extern int prepare_binprm(struct linux_binprm *); -extern void remove_arg_zero(struct linux_binprm *); +extern int __must_check remove_arg_zero(struct linux_binprm *); extern int search_binary_handler(struct linux_binprm *,struct pt_regs *); extern int flush_old_exec(struct linux_binprm * bprm); @@ -86,6 +88,7 @@ extern int suid_dumpable; extern int setup_arg_pages(struct linux_binprm * bprm, unsigned long stack_top, int executable_stack); +extern int bprm_mm_init(struct linux_binprm *bprm); extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm); extern void compute_creds(struct linux_binprm *binprm); extern int do_coredump(long signr, int exit_code, struct pt_regs * regs); Index: linux-2.6-2/include/linux/mm.h =================================================================== --- linux-2.6-2.orig/include/linux/mm.h 2007-06-05 16:29:43.000000000 +0200 +++ linux-2.6-2/include/linux/mm.h 2007-06-05 16:29:45.000000000 +0200 @@ -786,7 +786,6 @@ static inline int handle_mm_fault(struct extern int make_pages_present(unsigned long addr, unsigned long end); extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write); -void install_arg_page(struct vm_area_struct *, struct page *, unsigned long); int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, struct page **pages, struct vm_area_struct **vmas); @@ -812,6 +811,9 @@ extern unsigned long move_page_tables_up extern unsigned long do_mremap(unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags, unsigned long new_addr); +extern int mprotect_fixup(struct vm_area_struct *vma, + struct vm_area_struct **pprev, unsigned long start, + unsigned long end, unsigned long newflags); /* * Prototype to add a shrinker callback for ageable caches. @@ -1130,6 +1132,9 @@ extern int expand_stack(struct vm_area_s #ifdef CONFIG_IA64 extern int expand_upwards(struct vm_area_struct *vma, unsigned long address); #endif +#ifdef CONFIG_STACK_GROWSUP +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address); +#endif /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr); Index: linux-2.6-2/mm/mmap.c =================================================================== --- linux-2.6-2.orig/mm/mmap.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/mm/mmap.c 2007-06-05 16:29:45.000000000 +0200 @@ -1557,33 +1557,13 @@ int expand_upwards(struct vm_area_struct } #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */ -#ifdef CONFIG_STACK_GROWSUP -int expand_stack(struct vm_area_struct *vma, unsigned long address) -{ - return expand_upwards(vma, address); -} - -struct vm_area_struct * -find_extend_vma(struct mm_struct *mm, unsigned long addr) -{ - struct vm_area_struct *vma, *prev; - - addr &= PAGE_MASK; - vma = find_vma_prev(mm, addr, &prev); - if (vma && (vma->vm_start <= addr)) - return vma; - if (!prev || expand_stack(prev, addr)) - return NULL; - if (prev->vm_flags & VM_LOCKED) { - make_pages_present(addr, prev->vm_end); - } - return prev; -} -#else /* * vma is the first one with address < vma->vm_start. Have to extend vma. */ -int expand_stack(struct vm_area_struct *vma, unsigned long address) +#ifndef CONFIG_STACK_GROWSUP +static inline +#endif +int expand_downwards(struct vm_area_struct *vma, unsigned long address) { int error; @@ -1620,6 +1600,34 @@ int expand_stack(struct vm_area_struct * return error; } +#ifdef CONFIG_STACK_GROWSUP +int expand_stack(struct vm_area_struct *vma, unsigned long address) +{ + return expand_upwards(vma, address); +} + +struct vm_area_struct * +find_extend_vma(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma, *prev; + + addr &= PAGE_MASK; + vma = find_vma_prev(mm, addr, &prev); + if (vma && (vma->vm_start <= addr)) + return vma; + if (!prev || expand_stack(prev, addr)) + return NULL; + if (prev->vm_flags & VM_LOCKED) { + make_pages_present(addr, prev->vm_end); + } + return prev; +} +#else +int expand_stack(struct vm_area_struct *vma, unsigned long address) +{ + return expand_downwards(vma, address); +} + struct vm_area_struct * find_extend_vma(struct mm_struct * mm, unsigned long addr) { Index: linux-2.6-2/mm/mprotect.c =================================================================== --- linux-2.6-2.orig/mm/mprotect.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/mm/mprotect.c 2007-06-05 16:29:45.000000000 +0200 @@ -128,7 +128,7 @@ static void change_protection(struct vm_ flush_tlb_range(vma, start, end); } -static int +int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) { Index: linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c =================================================================== --- linux-2.6-2.orig/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:29:45.000000000 +0200 @@ -195,62 +195,23 @@ ia64_elf32_init (struct pt_regs *regs) ia32_load_state(current); } +#undef setup_arg_pages + int ia32_setup_arg_pages (struct linux_binprm *bprm, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; - struct mm_struct *mm = current->mm; - int i, ret; - - stack_base = IA32_STACK_TOP - MAX_ARG_PAGES*PAGE_SIZE; - mm->arg_start = bprm->p + stack_base; - - bprm->p += stack_base; - if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); - if (!mpnt) - return -ENOMEM; - - down_write(¤t->mm->mmap_sem); - { - mpnt->vm_mm = current->mm; - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; - mpnt->vm_end = IA32_STACK_TOP; - if (executable_stack == EXSTACK_ENABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; - else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC)? - PAGE_COPY_EXEC: PAGE_COPY; - if ((ret = insert_vm_struct(current->mm, mpnt))) { - up_write(¤t->mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); - return ret; - } - current->mm->stack_vm = current->mm->total_vm = vma_pages(mpnt); - } + int ret; - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); - } - stack_base += PAGE_SIZE; + ret = setup_arg_pages(bprm, IA32_STACK_TOP, executable_stack); + if (!ret) { + /* + * Can't do it in ia64_elf32_init(). Needs to be done before + * calls to elf32_map() + */ + current->thread.ppl = ia32_init_pp_list(); } - up_write(¤t->mm->mmap_sem); - /* Can't do it in ia64_elf32_init(). Needs to be done before calls to - elf32_map() */ - current->thread.ppl = ia32_init_pp_list(); - - return 0; + return ret; } static void Index: linux-2.6-2/arch/x86_64/ia32/ia32_aout.c =================================================================== --- linux-2.6-2.orig/arch/x86_64/ia32/ia32_aout.c 2007-06-05 16:23:16.000000000 +0200 +++ linux-2.6-2/arch/x86_64/ia32/ia32_aout.c 2007-06-05 16:29:45.000000000 +0200 @@ -404,7 +404,7 @@ beyond_if: set_brk(current->mm->start_brk, current->mm->brk); - retval = ia32_setup_arg_pages(bprm, IA32_STACK_TOP, EXSTACK_DEFAULT); + retval = setup_arg_pages(bprm, IA32_STACK_TOP, EXSTACK_DEFAULT); if (retval < 0) { /* Someone check-me: is this error path enough? */ send_sig(SIGKILL, current, 0); Index: linux-2.6-2/kernel/auditsc.c =================================================================== --- linux-2.6-2.orig/kernel/auditsc.c 2007-06-05 16:29:41.000000000 +0200 +++ linux-2.6-2/kernel/auditsc.c 2007-06-05 16:31:28.000000000 +0200 @@ -848,7 +848,7 @@ static void audit_log_execve_info(struct long ret; char *tmp; - len = strnlen_user(p, MAX_ARG_PAGES*PAGE_SIZE); + len = strnlen_user(p, MAX_ARG_STRLEN); /* * We just created this mm, if we can't find the strings * we just copied in something is _very_ wrong. -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-05 15:05 ` [PATCH 4/4] mm: variable length argument support Peter Zijlstra, Ollie Wild @ 2007-06-05 23:39 ` Andrew Morton 2007-06-06 0:48 ` Ollie Wild 2007-06-06 6:02 ` Peter Zijlstra 2007-06-06 8:36 ` Andrew Morton 1 sibling, 2 replies; 26+ messages in thread From: Andrew Morton @ 2007-06-05 23:39 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Tue, 05 Jun 2007 17:05:27 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > From: Ollie Wild <aaw@google.com> > > Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly > from the old mm into the new mm. > > We create the new mm before the binfmt code runs, and place the new stack > at the very top of the address space. Once the binfmt code runs and figures > out where the stack should be, we move it downwards. > > It is a bit peculiar in that we have one task with two mm's, one of which is > inactive. > > .. > > > Index: linux-2.6-2/fs/binfmt_elf.c > =================================================================== > --- linux-2.6-2.orig/fs/binfmt_elf.c 2007-06-05 16:23:16.000000000 +0200 > +++ linux-2.6-2/fs/binfmt_elf.c 2007-06-05 16:29:45.000000000 +0200 > @@ -148,6 +148,7 @@ create_elf_tables(struct linux_binprm *b > elf_addr_t *elf_info; > int ei_index = 0; > struct task_struct *tsk = current; > + struct vm_area_struct *vma; > > /* > * If this architecture has a platform capability string, copy it > @@ -234,6 +235,15 @@ create_elf_tables(struct linux_binprm *b > sp = (elf_addr_t __user *)bprm->p; > #endif > > + > + /* > + * Grow the stack manually; some architectures have a limit on how > + * far ahead a user-space access may be in order to grow the stack. > + */ > + vma = find_extend_vma(current->mm, bprm->p); > + if (!vma) > + return -EFAULT; > + > /* Now, let's put argc (and argv, envp if appropriate) on the stack */ > if (__put_user(argc, sp++)) > return -EFAULT; > @@ -254,8 +264,8 @@ create_elf_tables(struct linux_binprm *b > size_t len; > if (__put_user((elf_addr_t)p, argv++)) > return -EFAULT; > - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); > - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) > + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); > + if (!len || len > MAX_ARG_STRLEN) strnlen_user() is a scary function. Please do remember that if the memory we just strlen'ed is writeable by any user thread then that thread can at any time invalidate the number which the kernel now holds. > return 0; > p += len; > } > @@ -266,8 +276,8 @@ create_elf_tables(struct linux_binprm *b > size_t len; > if (__put_user((elf_addr_t)p, envp++)) > return -EFAULT; > - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); > - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) > + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); > + if (!len || len > MAX_ARG_STRLEN) > return 0; > p += len; > } > > ... > > Index: linux-2.6-2/fs/compat.c > =================================================================== > --- linux-2.6-2.orig/fs/compat.c 2007-06-05 16:23:16.000000000 +0200 > +++ linux-2.6-2/fs/compat.c 2007-06-05 16:29:45.000000000 +0200 > @@ -1257,6 +1257,7 @@ static int compat_copy_strings(int argc, > { > struct page *kmapped_page = NULL; > char *kaddr = NULL; > + unsigned long kpos = 0; > int ret; > > while (argc-- > 0) { > @@ -1265,92 +1266,84 @@ static int compat_copy_strings(int argc, > unsigned long pos; > > if (get_user(str, argv+argc) || > - !(len = strnlen_user(compat_ptr(str), bprm->p))) { > + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) { > ret = -EFAULT; > goto out; > } > > - if (bprm->p < len) { > + if (MAX_ARG_STRLEN < len) { > ret = -E2BIG; > goto out; > } Do we have an off-by-one here? Should it be <=? If not, then this code is relying upon the string's terminating \0 coming from userspace? If so, that's buggy: userspace can overwrite the \0 after we ran the strnlen_user(), perhaps, and confound the kernel? I could be complete crap, but please check all this very closely. > +/* > + * Create a new mm_struct and populate it with a temporary stack > + * vm_area_struct. We don't have enough context at this point to set the stack > + * flags, permissions, and offset, so we use temporary values. We'll update > + * them later in setup_arg_pages(). > + */ > +int bprm_mm_init(struct linux_binprm *bprm) > +{ > + int err; > + struct mm_struct *mm = NULL; > + struct vm_area_struct *vma = NULL; > + > + bprm->mm = mm = mm_alloc(); > + err = -ENOMEM; > + if (!mm) > + goto err; > + > + if ((err = init_new_context(current, mm))) > + goto err; err = init_new_context(current, mm)); if (err) goto err; > +#ifdef CONFIG_MMU > + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); > + err = -ENOMEM; > + if (!vma) > + goto err; > + > + down_write(&mm->mmap_sem); > + { > + vma->vm_mm = mm; Let's lose the unneeded brace and indent here? > + /* > + * Place the stack at the top of user memory. Later, we'll > + * move this to an appropriate place. We don't use STACK_TOP > + * because that can depend on attributes which aren't > + * configured yet. > + */ > + vma->vm_end = STACK_TOP_MAX; > + vma->vm_start = vma->vm_end - PAGE_SIZE; > + > + vma->vm_flags = VM_STACK_FLAGS; > + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; > + if ((err = insert_vm_struct(mm, vma))) { > + up_write(&mm->mmap_sem); > + goto err; > + } > + > + mm->stack_vm = mm->total_vm = 1; > + } > + up_write(&mm->mmap_sem); > + > + bprm->p = vma->vm_end - sizeof(void *); > +#else > + bprm->p = PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *); > +#endif > + > + return 0; > + > +err: > +#ifdef CONFIG_MMU > + if (vma) { > + bprm->vma = NULL; > + kmem_cache_free(vm_area_cachep, vma); > + } > +#endif > + > + if (mm) { > + bprm->mm = NULL; > + mmdrop(mm); > + } > + > + return err; > +} > + > +EXPORT_SYMBOL(bprm_mm_init); Preferred style is to have zero blank lines between the ^}$ and the EXPORT_SYMBOL(). > /* > - * 'copy_strings()' copies argument/environment strings from user > - * memory to free pages in kernel mem. These are in a format ready > - * to be put directly into the top of new user memory. > + * 'copy_strings()' copies argument/environment strings from the old > + * processes's memory to the new process's stack. The call to get_user_pages() > + * ensures the destination page is created and not swapped out. > */ > static int copy_strings(int argc, char __user * __user * argv, > struct linux_binprm *bprm) > { > struct page *kmapped_page = NULL; > char *kaddr = NULL; > + unsigned long kpos = 0; > int ret; > > while (argc-- > 0) { > @@ -220,69 +373,77 @@ static int copy_strings(int argc, char _ > unsigned long pos; > > if (get_user(str, argv+argc) || > - !(len = strnlen_user(str, bprm->p))) { > + !(len = strnlen_user(str, MAX_ARG_STRLEN))) { > ret = -EFAULT; > goto out; > } > > - if (bprm->p < len) { > +#ifdef CONFIG_MMU > + if (MAX_ARG_STRLEN < len) { > + ret = -E2BIG; > + goto out; > + } strnlen_user() scariness. Please check for off-by-ones. > +#else > + if (bprm->p < len) { > ret = -E2BIG; > goto out; > } > +#endif > > ... > > EXPORT_SYMBOL(copy_strings_kernel); > > #ifdef CONFIG_MMU > -/* > - * This routine is used to map in a page into an address space: needed by > - * execve() for the initial stack and environment pages. > - * > - * vma->vm_mm->mmap_sem is held for writing. > - */ > -void install_arg_page(struct vm_area_struct *vma, > - struct page *page, unsigned long address) > + > +static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) > { Needs a comment explaining why it exists, what it does, how it does it. For maintainability. > struct mm_struct *mm = vma->vm_mm; > - pte_t * pte; > - spinlock_t *ptl; > + unsigned long old_start = vma->vm_start; > + unsigned long old_end = vma->vm_end; > + unsigned long length = old_end - old_start; > + unsigned long new_start = old_start + shift; > + unsigned long new_end = old_end + shift; > + struct mmu_gather *tlb; > + > + BUG_ON(new_start > new_end); > + > + if (new_start < old_start) { > + if (vma != find_vma(mm, new_start)) > + return -EFAULT; > + > + vma_adjust(vma, new_start, old_end, > + vma->vm_pgoff - (-shift >> PAGE_SHIFT), NULL); hm, a right-shift of a negated unsigned value. That's pretty unusual. I hope you know what you're doing ;) > + if (length != move_page_tables(vma, old_start, > + vma, new_start, length)) > + return -ENOMEM; > + > + lru_add_drain(); > + tlb = tlb_gather_mmu(mm, 0); > + if (new_end > old_start) > + free_pgd_range(&tlb, new_end, old_end, new_end, > + vma->vm_next ? vma->vm_next->vm_start : 0); > + else > + free_pgd_range(&tlb, old_start, old_end, new_end, > + vma->vm_next ? vma->vm_next->vm_start : 0); > + tlb_finish_mmu(tlb, new_end, old_end); > > - if (unlikely(anon_vma_prepare(vma))) > - goto out; > + vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL); > + } else { > + struct vm_area_struct *tmp, *prev; > > - flush_dcache_page(page); > - pte = get_locked_pte(mm, address, &ptl); > - if (!pte) > - goto out; > - if (!pte_none(*pte)) { > - pte_unmap_unlock(pte, ptl); > - goto out; > + tmp = find_vma_prev(mm, new_end, &prev); > + if ((tmp && tmp->vm_start < new_end) || prev != vma) > + return -EFAULT; This reader is all lost. Perhaps a few comments explaining what tests like the above are doing would help him regain his bearings. Not having an overall description of what this code is doing doesn't help. > + find_vma_prev(mm, vma->vm_start, &prev); > + > + vma_adjust(vma, old_start, new_end, vma->vm_pgoff, NULL); > + > + if (length != move_page_tables_up(vma, old_start, > + vma, new_start, length)) > + return -ENOMEM; > + > + lru_add_drain(); > + tlb = tlb_gather_mmu(mm, 0); > + free_pgd_range(&tlb, old_start, new_start, > + prev ? prev->vm_end: 0, new_start); > + tlb_finish_mmu(tlb, old_start, new_start); > + > + vma_adjust(vma, new_start, new_end, > + vma->vm_pgoff + (shift >> PAGE_SHIFT), NULL); > } > - inc_mm_counter(mm, anon_rss); > - lru_cache_add_active(page); > - set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte( > - page, vma->vm_page_prot)))); > - page_add_new_anon_rmap(page, vma, address); > - pte_unmap_unlock(pte, ptl); > > - /* no need for flush_tlb */ > - return; > -out: > - __free_page(page); > - force_sig(SIGKILL, current); > + return 0; > } > > #define EXTRA_STACK_VM_PAGES 20 /* random */ > > +/* Finalizes the stack vm_area_struct. The flags and permissions are updated, > + * the stack is optionally relocated, and some extra space is added. > + */ That's better. But what extra space is added, and why? > int setup_arg_pages(struct linux_binprm *bprm, > unsigned long stack_top, > int executable_stack) > { > - unsigned long stack_base; > - struct vm_area_struct *mpnt; > + unsigned long ret; > + unsigned long stack_base, stack_shift; > struct mm_struct *mm = current->mm; > - int i, ret; > - long arg_size; > + struct vm_area_struct *vma = bprm->vma; > > #ifdef CONFIG_STACK_GROWSUP > - /* Move the argument and environment strings to the bottom of the > - * stack space. > - */ > - int offset, j; > - char *to, *from; > - > - /* Start by shifting all the pages down */ > - i = 0; > - for (j = 0; j < MAX_ARG_PAGES; j++) { > - struct page *page = bprm->page[j]; > - if (!page) > - continue; > - bprm->page[i++] = page; > - } > - > - /* Now move them within their pages */ > - offset = bprm->p % PAGE_SIZE; > - to = kmap(bprm->page[0]); > - for (j = 1; j < i; j++) { > - memmove(to, to + offset, PAGE_SIZE - offset); > - from = kmap(bprm->page[j]); > - memcpy(to + PAGE_SIZE - offset, from, offset); > - kunmap(bprm->page[j - 1]); > - to = from; > - } > - memmove(to, to + offset, PAGE_SIZE - offset); > - kunmap(bprm->page[j - 1]); > - > /* Limit stack size to 1GB */ > stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; > if (stack_base > (1 << 30)) > stack_base = 1 << 30; > - stack_base = PAGE_ALIGN(stack_top - stack_base); > > - /* Adjust bprm->p to point to the end of the strings. */ > - bprm->p = stack_base + PAGE_SIZE * i - offset; > + /* Make sure we didn't let the argument array grow too large. */ > + if (vma->vm_end - vma->vm_start > stack_base) > + return -ENOMEM; > > - mm->arg_start = stack_base; > - arg_size = i << PAGE_SHIFT; > + stack_base = PAGE_ALIGN(stack_top - stack_base); > > - /* zero pages that were copied above */ > - while (i < MAX_ARG_PAGES) > - bprm->page[i++] = NULL; > + stack_shift = stack_base - vma->vm_start; > + mm->arg_start = bprm->p + stack_shift; > + bprm->p = vma->vm_end + stack_shift; > #else > - stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE); > - stack_base = PAGE_ALIGN(stack_base); > - bprm->p += stack_base; > + BUG_ON(stack_top & ~PAGE_MASK); Is there much point in this BUG_ON, given that we're about to align stack_top anyway? > + stack_top = arch_align_stack(stack_top); > + stack_top = PAGE_ALIGN(stack_top); > + stack_shift = stack_top - vma->vm_end; > + > + bprm->p += stack_shift; > mm->arg_start = bprm->p; > - arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start); > #endif > > - arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; > - > if (bprm->loader) > - bprm->loader += stack_base; > - bprm->exec += stack_base; > - > - mpnt = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); > - if (!mpnt) > - return -ENOMEM; > + bprm->loader += stack_shift; > + bprm->exec += stack_shift; > > down_write(&mm->mmap_sem); > { > - mpnt->vm_mm = mm; > -#ifdef CONFIG_STACK_GROWSUP > - mpnt->vm_start = stack_base; > - mpnt->vm_end = stack_base + arg_size; > -#else > - mpnt->vm_end = stack_top; > - mpnt->vm_start = mpnt->vm_end - arg_size; > -#endif > + struct vm_area_struct *prev = NULL; > + unsigned long vm_flags = vma->vm_flags; > + > /* Adjust stack execute permissions; explicitly enable > * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X > * and leave alone (arch default) otherwise. */ > if (unlikely(executable_stack == EXSTACK_ENABLE_X)) > - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; > + vm_flags |= VM_EXEC; > else if (executable_stack == EXSTACK_DISABLE_X) > - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; > - else > - mpnt->vm_flags = VM_STACK_FLAGS; > - mpnt->vm_flags |= mm->def_flags; > - mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7]; > - if ((ret = insert_vm_struct(mm, mpnt))) { > + vm_flags &= ~VM_EXEC; > + vm_flags |= mm->def_flags; > + > + ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, > + vm_flags); > + if (ret) { > up_write(&mm->mmap_sem); > - kmem_cache_free(vm_area_cachep, mpnt); > return ret; > } > - mm->stack_vm = mm->total_vm = vma_pages(mpnt); > - } > + BUG_ON(prev != vma); :( > - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { > - struct page *page = bprm->page[i]; > - if (page) { > - bprm->page[i] = NULL; > - install_arg_page(mpnt, page, stack_base); > + /* Move stack pages down in memory. */ > + if (stack_shift) { > + ret = shift_arg_pages(vma, stack_shift); > + if (ret) { > + up_write(&mm->mmap_sem); > + return ret; > + } > + } > + > +#ifdef CONFIG_STACK_GROWSUP > + if (expand_stack(vma, vma->vm_end + > + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { > + up_write(&mm->mmap_sem); > + return -EFAULT; > + } > +#else > + if (expand_stack(vma, vma->vm_start - > + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { > + up_write(&mm->mmap_sem); > + return -EFAULT; > } > - stack_base += PAGE_SIZE; > +#endif > } > > ... > > Index: linux-2.6-2/include/linux/binfmts.h > =================================================================== > --- linux-2.6-2.orig/include/linux/binfmts.h 2007-06-05 16:29:41.000000000 +0200 > +++ linux-2.6-2/include/linux/binfmts.h 2007-06-05 16:29:45.000000000 +0200 > @@ -5,12 +5,9 @@ > > struct pt_regs; > > -/* > - * MAX_ARG_PAGES defines the number of pages allocated for arguments > - * and envelope for the new program. 32 should suffice, this gives > - * a maximum env+arg of 128kB w/4KB pages! > - */ > -#define MAX_ARG_PAGES 32 > +/* FIXME: Find real limits, or none. */ > +#define MAX_ARG_STRLEN (PAGE_SIZE * 32) > +#define MAX_ARG_STRINGS 0x7FFFFFFF DOCUMENTME! > /* sizeof(linux_binprm->buf) */ > #define BINPRM_BUF_SIZE 128 > @@ -24,7 +21,12 @@ struct pt_regs; > */ > > ... > > +#ifdef CONFIG_STACK_GROWSUP > +extern int expand_downwards(struct vm_area_struct *vma, unsigned long address); > +#endif We don't (or shouldn't) need the ifdefs here. > - return NULL; > - if (prev->vm_flags & VM_LOCKED) { > - make_pages_present(addr, prev->vm_end); > - } > - return prev; > -} > -#else > /* > * vma is the first one with address < vma->vm_start. Have to extend vma. > */ > -int expand_stack(struct vm_area_struct *vma, unsigned long address) > +#ifndef CONFIG_STACK_GROWSUP > +static inline > +#endif > +int expand_downwards(struct vm_area_struct *vma, unsigned long address) > { eww, that was a bit rude. Can we just leave this as static int then do int expand_stack_downwards(....) ? That's a better name anyway. > int error; > > @@ -1620,6 +1600,34 @@ int expand_stack(struct vm_area_struct * > return error; > } > > +#ifdef CONFIG_STACK_GROWSUP > +int expand_stack(struct vm_area_struct *vma, unsigned long address) > +{ > + return expand_upwards(vma, address); > +} > + > +struct vm_area_struct * > +find_extend_vma(struct mm_struct *mm, unsigned long addr) > +{ > + struct vm_area_struct *vma, *prev; > + > + addr &= PAGE_MASK; > + vma = find_vma_prev(mm, addr, &prev); > + if (vma && (vma->vm_start <= addr)) > + return vma; > + if (!prev || expand_stack(prev, addr)) > + return NULL; > + if (prev->vm_flags & VM_LOCKED) { > + make_pages_present(addr, prev->vm_end); > + } unneeded braces We really should check and propagate the make_pages_present() return value when appropriate. It can fail under -ENOMEM, ulimit exceeded, overcommit, etc. > + return prev; > +} > +#else > +int expand_stack(struct vm_area_struct *vma, unsigned long address) > +{ > + return expand_downwards(vma, address); > +} > + > struct vm_area_struct * > find_extend_vma(struct mm_struct * mm, unsigned long addr) > { > Index: linux-2.6-2/mm/mprotect.c > =================================================================== > --- linux-2.6-2.orig/mm/mprotect.c 2007-06-05 16:23:16.000000000 +0200 > +++ linux-2.6-2/mm/mprotect.c 2007-06-05 16:29:45.000000000 +0200 > @@ -128,7 +128,7 @@ static void change_protection(struct vm_ > flush_tlb_range(vma, start, end); > } > > -static int > +int > mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, > unsigned long start, unsigned long end, unsigned long newflags) > { > Index: linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c > =================================================================== > --- linux-2.6-2.orig/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:23:16.000000000 +0200 > +++ linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:29:45.000000000 +0200 > @@ -195,62 +195,23 @@ ia64_elf32_init (struct pt_regs *regs) > ia32_load_state(current); > } > > +#undef setup_arg_pages What's this for? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-05 23:39 ` Andrew Morton @ 2007-06-06 0:48 ` Ollie Wild 2007-06-06 6:02 ` Peter Zijlstra 1 sibling, 0 replies; 26+ messages in thread From: Ollie Wild @ 2007-06-06 0:48 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, linux-kernel, parisc-linux, linux-mm, linux-arch, Ingo Molnar, Andi Kleen OK. It sounds like a healthy dose of comments is in order. I'll clean things up and send out a new patch sometime tonight or tomorrow. Additional comments inline below: > > - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); > > - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) > > + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); > > + if (!len || len > MAX_ARG_STRLEN) > > strnlen_user() is a scary function. Please do remember that if the memory > we just strlen'ed is writeable by any user thread then that thread can at > any time invalidate the number which the kernel now holds. At this point, we've already called setup_arg_pages(), so the user memory is our own private copy. No other threads can access it. > > - !(len = strnlen_user(compat_ptr(str), bprm->p))) { > > + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) { > > ret = -EFAULT; > > goto out; > > } > > > > - if (bprm->p < len) { > > + if (MAX_ARG_STRLEN < len) { > > ret = -E2BIG; > > goto out; > > } > > Do we have an off-by-one here? Should it be <=? No, strnlen_user() returns N+1 (where N==MAX_ARG_STRLEN) if the string is too large. > If not, then this code is relying upon the string's terminating \0 coming > from userspace? If so, that's buggy: userspace can overwrite the \0 after > we ran the strnlen_user(), perhaps, and confound the kernel? If that's the case, then we will fail to copy the null terminator, and the string will munge into the following string. Since we always access this data via the various userspace access routines, we will either return an error on a later operation, or the new process will segfault shortly upon starting. > > + vma_adjust(vma, new_start, old_end, > > + vma->vm_pgoff - (-shift >> PAGE_SHIFT), NULL); > > hm, a right-shift of a negated unsigned value. That's pretty unusual. I > hope you know what you're doing ;) This is correct. In this case, shift is already populated with a negative, wrapped unsigned value. The -shift is needed to make it positive before the bitwise shift. > > #define EXTRA_STACK_VM_PAGES 20 /* random */ > > > > +/* Finalizes the stack vm_area_struct. The flags and permissions are updated, > > + * the stack is optionally relocated, and some extra space is added. > > + */ > > That's better. > > But what extra space is added, and why? We add EXTRA_STACK_VM_PAGES. To be honest, I think neither of us know why this is done. It's just what the old code did, so we preserved it. Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-05 23:39 ` Andrew Morton 2007-06-06 0:48 ` Ollie Wild @ 2007-06-06 6:02 ` Peter Zijlstra 1 sibling, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 6:02 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Tue, 2007-06-05 at 16:39 -0700, Andrew Morton wrote: > > @@ -1620,6 +1600,34 @@ int expand_stack(struct vm_area_struct * > > return error; > > } > > > > +#ifdef CONFIG_STACK_GROWSUP > > +int expand_stack(struct vm_area_struct *vma, unsigned long address) > > +{ > > + return expand_upwards(vma, address); > > +} > > + > > +struct vm_area_struct * > > +find_extend_vma(struct mm_struct *mm, unsigned long addr) > > +{ > > + struct vm_area_struct *vma, *prev; > > + > > + addr &= PAGE_MASK; > > + vma = find_vma_prev(mm, addr, &prev); > > + if (vma && (vma->vm_start <= addr)) > > + return vma; > > + if (!prev || expand_stack(prev, addr)) > > + return NULL; > > + if (prev->vm_flags & VM_LOCKED) { > > + make_pages_present(addr, prev->vm_end); > > + } > > unneeded braces > > We really should check and propagate the make_pages_present() return value > when appropriate. It can fail under -ENOMEM, ulimit exceeded, overcommit, > etc. Right, this is not new ugliness, but indeed a good opportunity to clean up. > > + return prev; > > +} > > +#else > > +int expand_stack(struct vm_area_struct *vma, unsigned long address) > > +{ > > + return expand_downwards(vma, address); > > +} > > + > > struct vm_area_struct * > > find_extend_vma(struct mm_struct * mm, unsigned long addr) > > { > > Index: linux-2.6-2/mm/mprotect.c > > =================================================================== > > --- linux-2.6-2.orig/mm/mprotect.c 2007-06-05 16:23:16.000000000 +0200 > > +++ linux-2.6-2/mm/mprotect.c 2007-06-05 16:29:45.000000000 +0200 > > @@ -128,7 +128,7 @@ static void change_protection(struct vm_ > > flush_tlb_range(vma, start, end); > > } > > > > -static int > > +int > > mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, > > unsigned long start, unsigned long end, unsigned long newflags) > > { > > Index: linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c > > =================================================================== > > --- linux-2.6-2.orig/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:23:16.000000000 +0200 > > +++ linux-2.6-2/arch/ia64/ia32/binfmt_elf32.c 2007-06-05 16:29:45.000000000 +0200 > > @@ -195,62 +195,23 @@ ia64_elf32_init (struct pt_regs *regs) > > ia32_load_state(current); > > } > > > > +#undef setup_arg_pages > > What's this for? That file reads: #define setup_arg_pages(bprm,tos,exec) ia32_setup_arg_pages(bprm,exec) .... #include "../../../fs/binfmt_elf.c" .... int ia32_setup_arg_pages (struct linux_binprm *bprm, int executable_stack) { int ret; ret = setup_arg_pages(bprm, IA32_STACK_TOP, executable_stack); /---------------^ We really want to call the real setup_arg_pages() here, not ourselves please. if (!ret) { /* * Can't do it in ia64_elf32_init(). Needs to be done before * calls to elf32_map() */ current->thread.ppl = ia32_init_pp_list(); } return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-05 15:05 ` [PATCH 4/4] mm: variable length argument support Peter Zijlstra, Ollie Wild 2007-06-05 23:39 ` Andrew Morton @ 2007-06-06 8:36 ` Andrew Morton 2007-06-06 8:44 ` Paul Mundt 2007-06-06 8:54 ` Peter Zijlstra 1 sibling, 2 replies; 26+ messages in thread From: Andrew Morton @ 2007-06-06 8:36 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Tue, 05 Jun 2007 17:05:27 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > From: Ollie Wild <aaw@google.com> > > Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly > from the old mm into the new mm. > > We create the new mm before the binfmt code runs, and place the new stack > at the very top of the address space. Once the binfmt code runs and figures > out where the stack should be, we move it downwards. > > It is a bit peculiar in that we have one task with two mm's, one of which is > inactive. > > ... > > + flush_cache_page(bprm->vma, kpos, > + page_to_pfn(kmapped_page)); Breaks SuperH: fs/exec.c: In function `bprm_mm_init': fs/exec.c:268: warning: unused variable `vma' fs/exec.c: In function `copy_strings': fs/exec.c:431: error: structure has no member named `vma' -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 8:36 ` Andrew Morton @ 2007-06-06 8:44 ` Paul Mundt 2007-06-06 8:54 ` Peter Zijlstra 1 sibling, 0 replies; 26+ messages in thread From: Paul Mundt @ 2007-06-06 8:44 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, Jun 06, 2007 at 01:36:58AM -0700, Andrew Morton wrote: > On Tue, 05 Jun 2007 17:05:27 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > From: Ollie Wild <aaw@google.com> > > > > Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly > > from the old mm into the new mm. > > > > We create the new mm before the binfmt code runs, and place the new stack > > at the very top of the address space. Once the binfmt code runs and figures > > out where the stack should be, we move it downwards. > > > > It is a bit peculiar in that we have one task with two mm's, one of which is > > inactive. > > > > ... > > > > + flush_cache_page(bprm->vma, kpos, > > + page_to_pfn(kmapped_page)); > > Breaks SuperH: > > fs/exec.c: In function `bprm_mm_init': > fs/exec.c:268: warning: unused variable `vma' > fs/exec.c: In function `copy_strings': > fs/exec.c:431: error: structure has no member named `vma' > More pointedly, bprm->vma doesn't exist if CONFIG_MMU=n, which Andrew's config seems to have ;-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 8:36 ` Andrew Morton 2007-06-06 8:44 ` Paul Mundt @ 2007-06-06 8:54 ` Peter Zijlstra 2007-06-06 9:06 ` Andrew Morton 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 8:54 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 01:36 -0700, Andrew Morton wrote: > On Tue, 05 Jun 2007 17:05:27 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > From: Ollie Wild <aaw@google.com> > > > > Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly > > from the old mm into the new mm. > > > > We create the new mm before the binfmt code runs, and place the new stack > > at the very top of the address space. Once the binfmt code runs and figures > > out where the stack should be, we move it downwards. > > > > It is a bit peculiar in that we have one task with two mm's, one of which is > > inactive. > > > > ... > > > > + flush_cache_page(bprm->vma, kpos, > > + page_to_pfn(kmapped_page)); Bah, and my frv cross build bums out on an unrelated change,.. I'll see if I can get a noMMU arch building, in the mean time, would you try this: --- Since no-MMU doesn't do the fancy inactive mm access there is no need to flush cache. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Index: linux-2.6-2/fs/exec.c =================================================================== --- linux-2.6-2.orig/fs/exec.c 2007-06-05 16:48:52.000000000 +0200 +++ linux-2.6-2/fs/exec.c 2007-06-06 10:49:19.000000000 +0200 @@ -428,8 +428,10 @@ static int copy_strings(int argc, char _ kmapped_page = page; kaddr = kmap(kmapped_page); kpos = pos & PAGE_MASK; +#ifdef CONFIG_MMU flush_cache_page(bprm->vma, kpos, page_to_pfn(kmapped_page)); +#endif } if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 8:54 ` Peter Zijlstra @ 2007-06-06 9:06 ` Andrew Morton 2007-06-06 9:12 ` Peter Zijlstra 2007-06-06 9:34 ` Peter Zijlstra 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2007-06-06 9:06 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, 06 Jun 2007 10:54:21 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > It is a bit peculiar in that we have one task with two mm's, one of which is > > > inactive. > > > > > > ... > > > > > > + flush_cache_page(bprm->vma, kpos, > > > + page_to_pfn(kmapped_page)); > > Bah, and my frv cross build bums out on an unrelated change,.. > I'll see if I can get a noMMU arch building, in the mean time, would you > try this: > > --- > > Since no-MMU doesn't do the fancy inactive mm access there is no need to > flush cache. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > > Index: linux-2.6-2/fs/exec.c > =================================================================== > --- linux-2.6-2.orig/fs/exec.c 2007-06-05 16:48:52.000000000 +0200 > +++ linux-2.6-2/fs/exec.c 2007-06-06 10:49:19.000000000 +0200 > @@ -428,8 +428,10 @@ static int copy_strings(int argc, char _ > kmapped_page = page; > kaddr = kmap(kmapped_page); > kpos = pos & PAGE_MASK; > +#ifdef CONFIG_MMU > flush_cache_page(bprm->vma, kpos, > page_to_pfn(kmapped_page)); > +#endif > } > if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { > ret = -EFAULT; > I think the same problem will happen on NOMMU && STACK_GROWS_UP. There are several new references to bprm->vma in there, not all inside CONFIG_MMU. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:06 ` Andrew Morton @ 2007-06-06 9:12 ` Peter Zijlstra 2007-06-06 14:40 ` [parisc-linux] " Grant Grundler 2007-06-06 9:34 ` Peter Zijlstra 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 9:12 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 02:06 -0700, Andrew Morton wrote: > On Wed, 06 Jun 2007 10:54:21 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > > It is a bit peculiar in that we have one task with two mm's, one of which is > > > > inactive. > > > > > > > > ... > > > > > > > > + flush_cache_page(bprm->vma, kpos, > > > > + page_to_pfn(kmapped_page)); > > > > Bah, and my frv cross build bums out on an unrelated change,.. > > I'll see if I can get a noMMU arch building, in the mean time, would you > > try this: > > > > --- > > > > Since no-MMU doesn't do the fancy inactive mm access there is no need to > > flush cache. > > > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > > --- > > > > Index: linux-2.6-2/fs/exec.c > > =================================================================== > > --- linux-2.6-2.orig/fs/exec.c 2007-06-05 16:48:52.000000000 +0200 > > +++ linux-2.6-2/fs/exec.c 2007-06-06 10:49:19.000000000 +0200 > > @@ -428,8 +428,10 @@ static int copy_strings(int argc, char _ > > kmapped_page = page; > > kaddr = kmap(kmapped_page); > > kpos = pos & PAGE_MASK; > > +#ifdef CONFIG_MMU > > flush_cache_page(bprm->vma, kpos, > > page_to_pfn(kmapped_page)); > > +#endif > > } > > if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { > > ret = -EFAULT; > > > > I think the same problem will happen on NOMMU && STACK_GROWS_UP. There are > several new references to bprm->vma in there, not all inside CONFIG_MMU. Right, which archs have that combo? I'll go gather cross compilers. Perhaps I'd better create a flush_arg_page() function and stick that in the mmu/nommu section somewhere earlier on in that file. Patch in a few. A related question; does anybody know of a no-MMU arch that uses fs/compat.c ? If there is such a beast, that would need some work. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [parisc-linux] Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:12 ` Peter Zijlstra @ 2007-06-06 14:40 ` Grant Grundler 0 siblings, 0 replies; 26+ messages in thread From: Grant Grundler @ 2007-06-06 14:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-arch, linux-kernel, Andi Kleen, linux-mm, Ollie Wild, Ingo Molnar, parisc-linux On Wed, Jun 06, 2007 at 11:12:09AM +0200, Peter Zijlstra wrote: ... > > I think the same problem will happen on NOMMU && STACK_GROWS_UP. There are > > several new references to bprm->vma in there, not all inside CONFIG_MMU. > > Right, which archs have that combo? I'll go gather cross compilers. parisc only supports with MMU. I don't know who elses uses STACK_GROWS_UP. hth, grant > > Perhaps I'd better create a flush_arg_page() function and stick that in > the mmu/nommu section somewhere earlier on in that file. Patch in a few. > > A related question; does anybody know of a no-MMU arch that uses > fs/compat.c ? If there is such a beast, that would need some work. > > _______________________________________________ > parisc-linux mailing list > parisc-linux@lists.parisc-linux.org > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:06 ` Andrew Morton 2007-06-06 9:12 ` Peter Zijlstra @ 2007-06-06 9:34 ` Peter Zijlstra 2007-06-06 9:44 ` Paul Mundt 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 9:34 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 02:06 -0700, Andrew Morton wrote: > I think the same problem will happen on NOMMU && STACK_GROWS_UP. There are > several new references to bprm->vma in there, not all inside CONFIG_MMU. I found two: one in setup_arg_pages() and one in get_arg_page() both are under CONFIG_MMU. --- fix a no-MMU compile error on flush_cache_page() and clean up the no-MMU code a bit by placing more #ifdef CONFIG_MMU stuff into their own function. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- fs/exec.c | 112 +++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 68 insertions(+), 44 deletions(-) Index: linux-2.6-2/fs/exec.c =================================================================== --- linux-2.6-2.orig/fs/exec.c 2007-06-05 16:48:52.000000000 +0200 +++ linux-2.6-2/fs/exec.c 2007-06-06 11:21:35.000000000 +0200 @@ -215,6 +215,58 @@ static void free_arg_pages(struct linux_ { } +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, + struct page *page) +{ + flush_cache_page(bprm->vma, pos, page_to_pfn(page)); +} + +static int __bprm_mm_init(struct linux_binprm *bprm) +{ + int err = -ENOMEM; + struct vm_area_struct *vma = NULL; + struct mm_struct *mm = bprm->mm; + + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); + if (!vma) + goto err; + + down_write(&mm->mmap_sem); + vma->vm_mm = mm; + + /* + * Place the stack at the top of user memory. Later, we'll + * move this to an appropriate place. We don't use STACK_TOP + * because that can depend on attributes which aren't + * configured yet. + */ + vma->vm_end = STACK_TOP_MAX; + vma->vm_start = vma->vm_end - PAGE_SIZE; + + vma->vm_flags = VM_STACK_FLAGS; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + err = insert_vm_struct(mm, vma); + if (err) { + up_write(&mm->mmap_sem); + goto err; + } + + mm->stack_vm = mm->total_vm = 1; + up_write(&mm->mmap_sem); + + bprm->p = vma->vm_end - sizeof(void *); + + return 0; + +err: + if (vma) { + bprm->vma = NULL; + kmem_cache_free(vm_area_cachep, vma); + } + + return err; +} + #else static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, @@ -253,6 +305,17 @@ static void free_arg_pages(struct linux_ free_arg_page(bprm, i); } +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, + struct page *page) +{ +} + +static int __bprm_mm_init(struct linux_binprm *bprm) +{ + bprm->p = PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *); + return 0; +} + #endif /* CONFIG_MMU */ /* @@ -265,61 +328,23 @@ int bprm_mm_init(struct linux_binprm *bp { int err; struct mm_struct *mm = NULL; - struct vm_area_struct *vma = NULL; bprm->mm = mm = mm_alloc(); err = -ENOMEM; if (!mm) goto err; - if ((err = init_new_context(current, mm))) + err = init_new_context(current, mm); + if (err) goto err; -#ifdef CONFIG_MMU - bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); - err = -ENOMEM; - if (!vma) + err = __bprm_mm_init(bprm); + if (err) goto err; - down_write(&mm->mmap_sem); - { - vma->vm_mm = mm; - - /* - * Place the stack at the top of user memory. Later, we'll - * move this to an appropriate place. We don't use STACK_TOP - * because that can depend on attributes which aren't - * configured yet. - */ - vma->vm_end = STACK_TOP_MAX; - vma->vm_start = vma->vm_end - PAGE_SIZE; - - vma->vm_flags = VM_STACK_FLAGS; - vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; - if ((err = insert_vm_struct(mm, vma))) { - up_write(&mm->mmap_sem); - goto err; - } - - mm->stack_vm = mm->total_vm = 1; - } - up_write(&mm->mmap_sem); - - bprm->p = vma->vm_end - sizeof(void *); -#else - bprm->p = PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *); -#endif - return 0; err: -#ifdef CONFIG_MMU - if (vma) { - bprm->vma = NULL; - kmem_cache_free(vm_area_cachep, vma); - } -#endif - if (mm) { bprm->mm = NULL; mmdrop(mm); @@ -428,8 +453,7 @@ static int copy_strings(int argc, char _ kmapped_page = page; kaddr = kmap(kmapped_page); kpos = pos & PAGE_MASK; - flush_cache_page(bprm->vma, kpos, - page_to_pfn(kmapped_page)); + flush_arg_page(bprm, kpos, kmapped_page); } if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:34 ` Peter Zijlstra @ 2007-06-06 9:44 ` Paul Mundt 2007-06-06 9:47 ` Peter Zijlstra 0 siblings, 1 reply; 26+ messages in thread From: Paul Mundt @ 2007-06-06 9:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, Jun 06, 2007 at 11:34:33AM +0200, Peter Zijlstra wrote: > +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, > + struct page *page) > +{ > + flush_cache_page(bprm->vma, pos, page_to_pfn(page)); > +} > + [snip] > @@ -253,6 +305,17 @@ static void free_arg_pages(struct linux_ > free_arg_page(bprm, i); > } > > +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, > + struct page *page) > +{ > +} > + inline? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:44 ` Paul Mundt @ 2007-06-06 9:47 ` Peter Zijlstra 2007-06-06 9:53 ` Andi Kleen 0 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2007-06-06 9:47 UTC (permalink / raw) To: Paul Mundt Cc: Andrew Morton, linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar, Andi Kleen On Wed, 2007-06-06 at 18:44 +0900, Paul Mundt wrote: > On Wed, Jun 06, 2007 at 11:34:33AM +0200, Peter Zijlstra wrote: > > +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, > > + struct page *page) > > +{ > > + flush_cache_page(bprm->vma, pos, page_to_pfn(page)); > > +} > > + > [snip] > > > @@ -253,6 +305,17 @@ static void free_arg_pages(struct linux_ > > free_arg_page(bprm, i); > > } > > > > +static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, > > + struct page *page) > > +{ > > +} > > + > inline? could do I guess, but doesn't this modern gcc thing auto inline statics that are so small? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: variable length argument support 2007-06-06 9:47 ` Peter Zijlstra @ 2007-06-06 9:53 ` Andi Kleen 0 siblings, 0 replies; 26+ messages in thread From: Andi Kleen @ 2007-06-06 9:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul Mundt, Andrew Morton, linux-kernel, parisc-linux, linux-mm, linux-arch, Ollie Wild, Ingo Molnar > could do I guess, but doesn't this modern gcc thing auto inline statics > that are so small? Yes it does. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-06-06 19:53 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-05 15:05 [PATCH 0/4] no MAX_ARG_PAGES Peter Zijlstra
2007-06-05 15:05 ` [PATCH 1/4] arch: personality independent stack top Peter Zijlstra
2007-06-05 15:05 ` [PATCH 2/4] audit: rework execve audit Peter Zijlstra
2007-06-05 23:39 ` Andrew Morton
2007-06-06 5:52 ` Peter Zijlstra
2007-06-05 15:05 ` [PATCH 3/4] mm: move_page_tables{,_up} Peter Zijlstra
2007-06-05 19:46 ` Christoph Lameter
2007-06-05 23:39 ` Andrew Morton
2007-06-06 19:06 ` Ollie Wild
2007-06-06 19:12 ` Peter Zijlstra
2007-06-06 19:50 ` Ollie Wild
2007-06-06 19:53 ` Peter Zijlstra
2007-06-05 15:05 ` [PATCH 4/4] mm: variable length argument support Peter Zijlstra, Ollie Wild
2007-06-05 23:39 ` Andrew Morton
2007-06-06 0:48 ` Ollie Wild
2007-06-06 6:02 ` Peter Zijlstra
2007-06-06 8:36 ` Andrew Morton
2007-06-06 8:44 ` Paul Mundt
2007-06-06 8:54 ` Peter Zijlstra
2007-06-06 9:06 ` Andrew Morton
2007-06-06 9:12 ` Peter Zijlstra
2007-06-06 14:40 ` [parisc-linux] " Grant Grundler
2007-06-06 9:34 ` Peter Zijlstra
2007-06-06 9:44 ` Paul Mundt
2007-06-06 9:47 ` Peter Zijlstra
2007-06-06 9:53 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox