* Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-11 0:05 Ollie Wild
2006-10-11 8:00 ` Arjan van de Ven
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Ollie Wild @ 2006-10-11 0:05 UTC (permalink / raw)
To: linux-kernel, parisc-linux
Cc: Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm,
Andrew Morton, Andi Kleen, linux-arch, David Howells
[-- Attachment #1: Type: text/plain, Size: 2213 bytes --]
Hi,
A few months back, there was some discussion about increasing
MAX_ARG_PAGES (http://thread.gmane.org/gmane.linux.kernel/418858).
Since this is a problem which Google seems to bump into on a regular
basis -- we just keep jacking up the limit -- I thought I'd have a
stab at implementing the preferred solution as outlined by Linus.
I've attached a patch which replaces the pages array from the
linux_binprm struct with a stack vm_area added to the new process's
mm_struct. The region grows using the normal page fault / stack
extension mechanism. Since the vm_area is swappable -- which I've
confirmed via testing -- we don't have to worry about memory
starvation.
I've tested this on the um/i386, i386, and x86_64 architectures with
the elf and script binfmts. I'd like to get feedback from the Linux
community. If anyone would like to test/implement the areas I've
missed, I'd greatly appreciate it.
Some outstanding issues / concerns:
- I haven't addressed the CONFIG_STACK_GROWSUP (parisc) case. Would
someone on the parisc list be willing to have a go at this? Grant
Grundler has offered to provide me access to hardware, but I'd prefer
to let a pa-risc guru have a go.
- I haven't tested this on a NOMMU architecture. Could someone please
validate this?
- What, if any command line limits should we enforce? The do_execve()
callchain has plenty of hooks where the calling process can sleep, so
CPU starvation shouldn't be a problem. Memory consumption is limited
by the RLIMIT_STACK setting. Note, however, that I'm not providing
any guarantee on the availability of space for the actual stack.
- Inside setup_arg_pages(), I have to move the stack region.
Different linux_binfmt handlers place it in different locations
relative to STACK_TOP, and STACK_TOP itself may have different values
depending on the personality of the executable. When executing
scripts, the arg strings need to be accessed before we know which
binary format is being executed, so we can't generally make the
correct determination beforehand.
- There are bunch of linux_binfmt handlers which are either obsolete
or specific to particular platforms. Would someone be willing to look
at these?
Thanks,
Ollie
[-- Attachment #2: no_MAX_ARG_PAGES.patch --]
[-- Type: text/x-patch, Size: 26707 bytes --]
diff --git a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c
index 82ef182..2e5a934 100644
--- a/arch/x86_64/ia32/ia32_binfmt.c
+++ b/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ #define exit_elf_binfmt exit_elf32_bin
#define load_elf_binary load_elf32_binary
#define ELF_PLAT_INIT(r, load_addr) elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
- ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
#undef start_thread
#define start_thread(regs,new_rip,new_rsp) do { \
@@ -338,57 +335,7 @@ static void elf32_init(struct pt_regs *r
int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
int executable_stack)
{
- unsigned long stack_base;
- struct vm_area_struct *mpnt;
- struct mm_struct *mm = current->mm;
- int i, ret;
-
- stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
- mm->arg_start = bprm->p + stack_base;
-
- bprm->p += stack_base;
- if (bprm->loader)
- bprm->loader += stack_base;
- bprm->exec += stack_base;
-
- mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
- if (!mpnt)
- return -ENOMEM;
-
- memset(mpnt, 0, sizeof(*mpnt));
-
- down_write(&mm->mmap_sem);
- {
- mpnt->vm_mm = mm;
- mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
- mpnt->vm_end = stack_top;
- if (executable_stack == EXSTACK_ENABLE_X)
- mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC;
- else if (executable_stack == EXSTACK_DISABLE_X)
- mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
- else
- mpnt->vm_flags = VM_STACK_FLAGS;
- mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ?
- PAGE_COPY_EXEC : PAGE_COPY;
- if ((ret = insert_vm_struct(mm, mpnt))) {
- up_write(&mm->mmap_sem);
- kmem_cache_free(vm_area_cachep, mpnt);
- return ret;
- }
- mm->stack_vm = mm->total_vm = vma_pages(mpnt);
- }
-
- for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
- struct page *page = bprm->page[i];
- if (page) {
- bprm->page[i] = NULL;
- install_arg_page(mpnt, page, stack_base);
- }
- stack_base += PAGE_SIZE;
- }
- up_write(&mm->mmap_sem);
-
- return 0;
+ return setup_arg_pages(bprm, stack_top, executable_stack);
}
EXPORT_SYMBOL(ia32_setup_arg_pages);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 06435f3..d0e2292 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -255,8 +255,8 @@ #endif
while (argc-- > 0) {
size_t len;
__put_user((elf_addr_t)p, argv++);
- len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
- if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+ len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+ if (!len || len > MAX_ARG_STRLEN)
return 0;
p += len;
}
@@ -266,8 +266,8 @@ #endif
while (envc-- > 0) {
size_t len;
__put_user((elf_addr_t)p, envp++);
- len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
- if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+ len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+ if (!len || len > MAX_ARG_STRLEN)
return 0;
p += len;
}
@@ -769,10 +769,6 @@ static int load_elf_binary(struct linux_
}
/* OK, This is the point of no return */
- current->mm->start_data = 0;
- current->mm->end_data = 0;
- current->mm->end_code = 0;
- current->mm->mmap = NULL;
current->flags &= ~PF_FORKNOEXEC;
current->mm->def_flags = def_flags;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 1713c48..1378ba3 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
goto _ret;
if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
- remove_arg_zero(bprm);
+ retval = remove_arg_zero(bprm);
+ if (retval)
+ goto _ret;
}
if (fmt->flags & MISC_FMT_OPEN_BINARY) {
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 1edbcca..5b6309f 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
* This is done in reverse order, because of how the
* user environment and arguments are stored.
*/
- remove_arg_zero(bprm);
+ retval = remove_arg_zero(bprm);
+ if (retval)
+ return retval;
retval = copy_strings_kernel(1, &bprm->interp, bprm);
if (retval < 0) return retval;
bprm->argc++;
diff --git a/fs/compat.c b/fs/compat.c
index 4d3fbcb..3528afa 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1385,6 +1385,7 @@ static int compat_copy_strings(int argc,
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
+ unsigned long kpos = 0;
int ret;
while (argc-- > 0) {
@@ -1393,92 +1394,72 @@ static int compat_copy_strings(int argc,
unsigned long pos;
if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), bprm->p))) {
+ !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
ret = -EFAULT;
goto out;
}
- if (bprm->p < len) {
+ if (MAX_ARG_STRLEN < len) {
ret = -E2BIG;
goto out;
}
- bprm->p -= len;
- /* XXX: add architecture specific overflow check here. */
+ /* We're going to work our way backwords. */
pos = bprm->p;
+ str += len;
+ bprm->p -= len;
while (len > 0) {
- int i, new, err;
int offset, bytes_to_copy;
- struct page *page;
offset = pos % PAGE_SIZE;
- i = pos/PAGE_SIZE;
- page = bprm->page[i];
- new = 0;
- if (!page) {
- page = alloc_page(GFP_HIGHUSER);
- bprm->page[i] = page;
- if (!page) {
- ret = -ENOMEM;
+ if (offset == 0)
+ offset = PAGE_SIZE;
+
+ bytes_to_copy = offset;
+ if (bytes_to_copy > len)
+ bytes_to_copy = len;
+
+ offset -= bytes_to_copy;
+ pos -= bytes_to_copy;
+ str -= bytes_to_copy;
+ len -= bytes_to_copy;
+
+ if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+ struct page *page;
+
+ ret = get_user_pages(current, bprm->mm, pos,
+ 1, 1, 1, &page, NULL);
+ if (ret <= 0) {
+ /* We've exceed the stack rlimit. */
+ ret = -E2BIG;
goto out;
}
- new = 1;
- }
- if (page != kmapped_page) {
- if (kmapped_page)
+ if (kmapped_page) {
kunmap(kmapped_page);
+ put_page(kmapped_page);
+ }
kmapped_page = page;
kaddr = kmap(kmapped_page);
+ kpos = pos & PAGE_MASK;
}
- if (new && offset)
- memset(kaddr, 0, offset);
- bytes_to_copy = PAGE_SIZE - offset;
- if (bytes_to_copy > len) {
- bytes_to_copy = len;
- if (new)
- memset(kaddr+offset+len, 0,
- PAGE_SIZE-offset-len);
- }
- err = copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy);
- if (err) {
+ if (copy_from_user(kaddr+offset, compat_ptr(str),
+ bytes_to_copy)) {
ret = -EFAULT;
goto out;
}
-
- pos += bytes_to_copy;
- str += bytes_to_copy;
- len -= bytes_to_copy;
}
}
ret = 0;
out:
- if (kmapped_page)
+ if (kmapped_page) {
kunmap(kmapped_page);
- return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
- int i;
-
- for (i = 0; i < MAX_ARG_PAGES; i++) {
- if (bprm->page[i])
- __free_page(bprm->page[i]);
- bprm->page[i] = NULL;
+ put_page(kmapped_page);
}
+ return ret;
}
-#endif /* CONFIG_MMU */
-
/*
* compat_do_execve() is mostly a copy of do_execve(), with the exception
* that it processes 32 bit argv and envp pointers.
@@ -1491,7 +1472,6 @@ int compat_do_execve(char * filename,
struct linux_binprm *bprm;
struct file *file;
int retval;
- int i;
retval = -ENOMEM;
bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1505,24 +1485,19 @@ int compat_do_execve(char * filename,
sched_exec();
- bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
bprm->file = file;
bprm->filename = filename;
bprm->interp = filename;
- bprm->mm = mm_alloc();
- retval = -ENOMEM;
- if (!bprm->mm)
- goto out_file;
- retval = init_new_context(current, bprm->mm);
- if (retval < 0)
- goto out_mm;
+ retval = bprm_mm_init(bprm);
+ if (retval)
+ goto out_file;
- bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+ bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
if ((retval = bprm->argc) < 0)
goto out_mm;
- bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+ bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
if ((retval = bprm->envc) < 0)
goto out_mm;
@@ -1547,10 +1522,8 @@ int compat_do_execve(char * filename,
if (retval < 0)
goto out;
- retval = search_binary_handler(bprm, regs);
+ retval = search_binary_handler(bprm,regs);
if (retval >= 0) {
- free_arg_pages(bprm);
-
/* execve success */
security_bprm_free(bprm);
acct_update_integrals(current);
@@ -1559,19 +1532,12 @@ int compat_do_execve(char * filename,
}
out:
- /* Something went wrong, return the inode and free the argument pages*/
- for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
- struct page * page = bprm->page[i];
- if (page)
- __free_page(page);
- }
-
if (bprm->security)
security_bprm_free(bprm);
out_mm:
if (bprm->mm)
- mmdrop(bprm->mm);
+ mmput (bprm->mm);
out_file:
if (bprm->file) {
diff --git a/fs/exec.c b/fs/exec.c
index d993ea1..23a334c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -172,6 +172,79 @@ exit:
goto out;
}
+#ifdef CONFIG_STACK_GROWSUP
+#error I broke your build because I rearchitected the stack code, and I \
+ don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+ set. Please fixe this or send me a machine which I can test this on. \
+ \
+ -- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct. We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values. We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+ int err;
+ struct mm_struct *mm = NULL;
+ struct vm_area_struct *vma = NULL;
+
+ bprm->mm = mm = mm_alloc();
+ err = -ENOMEM;
+ if (!mm)
+ goto err;
+
+ if ((err = init_new_context(current, mm)))
+ goto err;
+
+ bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL);
+ err = -ENOMEM;
+ if (!vma)
+ goto err;
+
+ down_write(&mm->mmap_sem);
+ {
+ vma->vm_mm = mm;
+
+ /* Place the stack at the top of user memory. Later, we'll
+ * move this to an appropriate place. We don't use STACK_TOP
+ * because that can depend on attributes which aren't
+ * configured yet. */
+ vma->vm_end = TASK_SIZE;
+ vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+ vma->vm_flags = VM_STACK_FLAGS;
+ vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+ if ((err = insert_vm_struct(mm, vma))) {
+ up_write(&mm->mmap_sem);
+ goto err;
+ }
+
+ mm->stack_vm = mm->total_vm = 1;
+ }
+ up_write(&mm->mmap_sem);
+
+ bprm->p = vma->vm_end - sizeof(void *);
+
+ return 0;
+
+err:
+ if (vma) {
+ bprm->vma = NULL;
+ kmem_cache_free(vm_area_cachep, vma);
+ }
+
+ if (mm) {
+ bprm->mm = NULL;
+ mmdrop(mm);
+ }
+
+ return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -197,15 +270,16 @@ static int count(char __user * __user *
}
/*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack. The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
*/
static int copy_strings(int argc, char __user * __user * argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
+ unsigned long kpos = 0;
int ret;
while (argc-- > 0) {
@@ -214,69 +288,68 @@ static int copy_strings(int argc, char _
unsigned long pos;
if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, bprm->p))) {
+ !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
ret = -EFAULT;
goto out;
}
- if (bprm->p < len) {
+ if (MAX_ARG_STRLEN < len) {
ret = -E2BIG;
goto out;
}
- bprm->p -= len;
- /* XXX: add architecture specific overflow check here. */
+ /* We're going to work our way backwords. */
pos = bprm->p;
+ str += len;
+ bprm->p -= len;
while (len > 0) {
- int i, new, err;
int offset, bytes_to_copy;
- struct page *page;
offset = pos % PAGE_SIZE;
- i = pos/PAGE_SIZE;
- page = bprm->page[i];
- new = 0;
- if (!page) {
- page = alloc_page(GFP_HIGHUSER);
- bprm->page[i] = page;
- if (!page) {
- ret = -ENOMEM;
+ if (offset == 0)
+ offset = PAGE_SIZE;
+
+ bytes_to_copy = offset;
+ if (bytes_to_copy > len)
+ bytes_to_copy = len;
+
+ offset -= bytes_to_copy;
+ pos -= bytes_to_copy;
+ str -= bytes_to_copy;
+ len -= bytes_to_copy;
+
+ if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+ struct page *page;
+
+ ret = get_user_pages(current, bprm->mm, pos,
+ 1, 1, 1, &page, NULL);
+ if (ret <= 0) {
+ /* We've exceed the stack rlimit. */
+ ret = -E2BIG;
goto out;
}
- new = 1;
- }
- if (page != kmapped_page) {
- if (kmapped_page)
+ if (kmapped_page) {
kunmap(kmapped_page);
+ put_page(kmapped_page);
+ }
kmapped_page = page;
kaddr = kmap(kmapped_page);
+ kpos = pos & PAGE_MASK;
}
- if (new && offset)
- memset(kaddr, 0, offset);
- bytes_to_copy = PAGE_SIZE - offset;
- if (bytes_to_copy > len) {
- bytes_to_copy = len;
- if (new)
- memset(kaddr+offset+len, 0,
- PAGE_SIZE-offset-len);
- }
- err = copy_from_user(kaddr+offset, str, bytes_to_copy);
- if (err) {
+ if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
ret = -EFAULT;
goto out;
}
-
- pos += bytes_to_copy;
- str += bytes_to_copy;
- len -= bytes_to_copy;
}
}
ret = 0;
out:
- if (kmapped_page)
+ if (kmapped_page) {
kunmap(kmapped_page);
+ put_page(kmapped_page);
+ }
return ret;
}
@@ -295,157 +368,74 @@ int copy_strings_kernel(int argc,char **
EXPORT_SYMBOL(copy_strings_kernel);
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
- struct page *page, unsigned long address)
-{
- struct mm_struct *mm = vma->vm_mm;
- pte_t * pte;
- spinlock_t *ptl;
-
- if (unlikely(anon_vma_prepare(vma)))
- goto out;
-
- flush_dcache_page(page);
- pte = get_locked_pte(mm, address, &ptl);
- if (!pte)
- goto out;
- if (!pte_none(*pte)) {
- pte_unmap_unlock(pte, ptl);
- goto out;
- }
- inc_mm_counter(mm, anon_rss);
- lru_cache_add_active(page);
- set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
- page, vma->vm_page_prot))));
- page_add_new_anon_rmap(page, vma, address);
- pte_unmap_unlock(pte, ptl);
-
- /* no need for flush_tlb */
- return;
-out:
- __free_page(page);
- force_sig(SIGKILL, current);
-}
-
#define EXTRA_STACK_VM_PAGES 20 /* random */
+/* Finalizes the stack vm_area_struct. The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_top,
int executable_stack)
{
- unsigned long stack_base;
- struct vm_area_struct *mpnt;
+ unsigned long ret;
+ unsigned long stack_base, stack_shift;
struct mm_struct *mm = current->mm;
- int i, ret;
- long arg_size;
-#ifdef CONFIG_STACK_GROWSUP
- /* Move the argument and environment strings to the bottom of the
- * stack space.
- */
- int offset, j;
- char *to, *from;
-
- /* Start by shifting all the pages down */
- i = 0;
- for (j = 0; j < MAX_ARG_PAGES; j++) {
- struct page *page = bprm->page[j];
- if (!page)
- continue;
- bprm->page[i++] = page;
- }
+ BUG_ON(stack_top > TASK_SIZE);
+ BUG_ON(stack_top & ~PAGE_MASK);
- /* Now move them within their pages */
- offset = bprm->p % PAGE_SIZE;
- to = kmap(bprm->page[0]);
- for (j = 1; j < i; j++) {
- memmove(to, to + offset, PAGE_SIZE - offset);
- from = kmap(bprm->page[j]);
- memcpy(to + PAGE_SIZE - offset, from, offset);
- kunmap(bprm->page[j - 1]);
- to = from;
- }
- memmove(to, to + offset, PAGE_SIZE - offset);
- kunmap(bprm->page[j - 1]);
-
- /* Limit stack size to 1GB */
- stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
- if (stack_base > (1 << 30))
- stack_base = 1 << 30;
- stack_base = PAGE_ALIGN(stack_top - stack_base);
-
- /* Adjust bprm->p to point to the end of the strings. */
- bprm->p = stack_base + PAGE_SIZE * i - offset;
-
- mm->arg_start = stack_base;
- arg_size = i << PAGE_SHIFT;
-
- /* zero pages that were copied above */
- while (i < MAX_ARG_PAGES)
- bprm->page[i++] = NULL;
-#else
- stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+ stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
stack_base = PAGE_ALIGN(stack_base);
- bprm->p += stack_base;
- mm->arg_start = bprm->p;
- arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
- arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+ BUG_ON(stack_shift < 0);
+ bprm->p -= stack_shift;
+ mm->arg_start = bprm->p;
if (bprm->loader)
- bprm->loader += stack_base;
- bprm->exec += stack_base;
-
- mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
- if (!mpnt)
- return -ENOMEM;
-
- memset(mpnt, 0, sizeof(*mpnt));
+ bprm->loader -= stack_shift;
+ bprm->exec -= stack_shift;
down_write(&mm->mmap_sem);
{
- mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
- mpnt->vm_start = stack_base;
- mpnt->vm_end = stack_base + arg_size;
-#else
- mpnt->vm_end = stack_top;
- mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+ struct vm_area_struct *vma = bprm->vma;
+
/* Adjust stack execute permissions; explicitly enable
* for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
* and leave alone (arch default) otherwise. */
if (unlikely(executable_stack == EXSTACK_ENABLE_X))
- mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC;
+ vma->vm_flags |= VM_EXEC;
else if (executable_stack == EXSTACK_DISABLE_X)
- mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
- else
- mpnt->vm_flags = VM_STACK_FLAGS;
- mpnt->vm_flags |= mm->def_flags;
- mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
- if ((ret = insert_vm_struct(mm, mpnt))) {
- up_write(&mm->mmap_sem);
- kmem_cache_free(vm_area_cachep, mpnt);
- return ret;
+ vma->vm_flags &= ~VM_EXEC;
+ // FIXME: Are the next two lines sufficient, or do I need to
+ // do some additional magic?
+ vma->vm_flags |= mm->def_flags;
+ vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+
+ /* Move stack pages down in memory. */
+ if (stack_shift) {
+ // FIXME: Verify the shift is OK.
+
+ /* This should be safe even with overlap because we
+ * are shifting down. */
+ ret = move_vma(vma, vma->vm_start,
+ vma->vm_end - vma->vm_start,
+ vma->vm_end - vma->vm_start,
+ vma->vm_start - stack_shift);
+ if (ret & ~PAGE_MASK) {
+ up_write(&mm->mmap_sem);
+ return ret;
+ }
}
- mm->stack_vm = mm->total_vm = vma_pages(mpnt);
- }
- for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
- struct page *page = bprm->page[i];
- if (page) {
- bprm->page[i] = NULL;
- install_arg_page(mpnt, page, stack_base);
+ // Expand the stack.
+ vma = find_vma(mm, bprm->p);
+ BUG_ON(!vma || bprm->p < vma->vm_start);
+ if (expand_stack(vma, stack_base -
+ EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+ up_write(&mm->mmap_sem);
+ return -EFAULT;
}
- stack_base += PAGE_SIZE;
}
up_write(&mm->mmap_sem);
@@ -454,23 +444,6 @@ #endif
EXPORT_SYMBOL(setup_arg_pages);
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
- int i;
-
- for (i = 0; i < MAX_ARG_PAGES; i++) {
- if (bprm->page[i])
- __free_page(bprm->page[i]);
- bprm->page[i] = NULL;
- }
-}
-
-#endif /* CONFIG_MMU */
-
struct file *open_exec(const char *name)
{
struct nameidata nd;
@@ -986,8 +959,10 @@ void compute_creds(struct linux_binprm *
EXPORT_SYMBOL(compute_creds);
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
{
+ int ret = 0;
+
if (bprm->argc) {
unsigned long offset;
char * kaddr;
@@ -1001,13 +976,23 @@ void remove_arg_zero(struct linux_binprm
continue;
offset = 0;
kunmap_atomic(kaddr, KM_USER0);
+ put_page(page);
inside:
- page = bprm->page[bprm->p/PAGE_SIZE];
+ ret = get_user_pages(current, bprm->mm, bprm->p,
+ 1, 0, 1, &page, NULL);
+ if (ret <= 0) {
+ ret = -EFAULT;
+ goto out;
+ }
kaddr = kmap_atomic(page, KM_USER0);
}
kunmap_atomic(kaddr, KM_USER0);
bprm->argc--;
+ ret = 0;
}
+
+out:
+ return ret;
}
EXPORT_SYMBOL(remove_arg_zero);
@@ -1034,7 +1019,7 @@ #ifdef __alpha__
fput(bprm->file);
bprm->file = NULL;
- loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+ loader = bprm->vma->vm_end - sizeof(void *);
file = open_exec("/sbin/loader");
retval = PTR_ERR(file);
@@ -1127,7 +1112,6 @@ int do_execve(char * filename,
struct linux_binprm *bprm;
struct file *file;
int retval;
- int i;
retval = -ENOMEM;
bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1141,25 +1125,19 @@ int do_execve(char * filename,
sched_exec();
- bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
bprm->file = file;
bprm->filename = filename;
bprm->interp = filename;
- bprm->mm = mm_alloc();
- retval = -ENOMEM;
- if (!bprm->mm)
- goto out_file;
- retval = init_new_context(current, bprm->mm);
- if (retval < 0)
- goto out_mm;
+ retval = bprm_mm_init(bprm);
+ if (retval)
+ goto out_file;
- bprm->argc = count(argv, bprm->p / sizeof(void *));
+ bprm->argc = count(argv, MAX_ARG_STRINGS);
if ((retval = bprm->argc) < 0)
goto out_mm;
- bprm->envc = count(envp, bprm->p / sizeof(void *));
+ bprm->envc = count(envp, MAX_ARG_STRINGS);
if ((retval = bprm->envc) < 0)
goto out_mm;
@@ -1186,8 +1164,6 @@ int do_execve(char * filename,
retval = search_binary_handler(bprm,regs);
if (retval >= 0) {
- free_arg_pages(bprm);
-
/* execve success */
security_bprm_free(bprm);
acct_update_integrals(current);
@@ -1196,19 +1172,12 @@ int do_execve(char * filename,
}
out:
- /* Something went wrong, return the inode and free the argument pages*/
- for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
- struct page * page = bprm->page[i];
- if (page)
- __free_page(page);
- }
-
if (bprm->security)
security_bprm_free(bprm);
out_mm:
if (bprm->mm)
- mmdrop(bprm->mm);
+ mmput (bprm->mm);
out_file:
if (bprm->file) {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c1e82c5..3e11591 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -5,12 +5,9 @@ #include <linux/capability.h>
struct pt_regs;
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
/* sizeof(linux_binprm->buf) */
#define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ #ifdef __KERNEL__
*/
struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
- struct page *page[MAX_ARG_PAGES];
+ struct vm_area_struct *vma;
struct mm_struct *mm;
unsigned long p; /* current top of mem */
int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
extern int unregister_binfmt(struct linux_binfmt *);
extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
extern int flush_old_exec(struct linux_binprm * bprm);
@@ -82,6 +79,7 @@ #define EXSTACK_ENABLE_X 2 /* Enable ex
extern int setup_arg_pages(struct linux_binprm * bprm,
unsigned long stack_top,
int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
extern void compute_creds(struct linux_binprm *binprm);
extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2614662..4f30e28 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -738,7 +738,6 @@ #endif
extern int make_pages_present(unsigned long addr, unsigned long end);
extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -754,6 +753,9 @@ int FASTCALL(set_page_dirty(struct page
int set_page_dirty_lock(struct page *page);
int clear_page_dirty_for_io(struct page *page);
+extern unsigned long move_vma(struct vm_area_struct *vma,
+ unsigned long old_addr, unsigned long old_len,
+ unsigned long new_len, unsigned long new_addr);
extern unsigned long do_mremap(unsigned long addr,
unsigned long old_len, unsigned long new_len,
unsigned long flags, unsigned long new_addr);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 42f2f11..d25c73e 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1748,6 +1748,10 @@ int __audit_ipc_set_perm(unsigned long q
int audit_bprm(struct linux_binprm *bprm)
{
+ /* FIXME: Don't do anything for now until I figure out how to handle
+ * this. With the latest changes, kmalloc could well fail under good
+ * scenarios. */
+#if 0
struct audit_aux_data_execve *ax;
struct audit_context *context = current->audit_context;
unsigned long p, next;
@@ -1775,6 +1779,7 @@ int audit_bprm(struct linux_binprm *bprm
ax->d.type = AUDIT_EXECVE;
ax->d.next = context->aux;
context->aux = (void *)ax;
+#endif
return 0;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 9c769fa..957b6af 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
return len + old_addr - old_end; /* how much done */
}
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr)
{
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild @ 2006-10-11 8:00 ` Arjan van de Ven 2006-10-11 17:13 ` Ollie Wild 2006-10-11 13:14 ` Peter Zijlstra 2007-01-02 17:52 ` Removing MAX_ARG_PAGES (request for comments/assistance) David Howells 2 siblings, 1 reply; 16+ messages in thread From: Arjan van de Ven @ 2006-10-11 8:00 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote: on first sight it looks like you pin the entire userspace buffer at the same time (but I can misread the code; this stuff is a bit of a spaghetti by nature); that would be a DoS scenario if true... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 8:00 ` Arjan van de Ven @ 2006-10-11 17:13 ` Ollie Wild 0 siblings, 0 replies; 16+ messages in thread From: Ollie Wild @ 2006-10-11 17:13 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells > on first sight it looks like you pin the entire userspace buffer at the > same time (but I can misread the code; this stuff is a bit of a > spaghetti by nature); that would be a DoS scenario if true... I'm not sure I understand. Could you please elaborate? Thanks, Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild 2006-10-11 8:00 ` Arjan van de Ven @ 2006-10-11 13:14 ` Peter Zijlstra 2006-10-11 21:48 ` Ollie Wild 2007-01-02 17:52 ` Removing MAX_ARG_PAGES (request for comments/assistance) David Howells 2 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2006-10-11 13:14 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote: > + vma->vm_flags &= ~VM_EXEC; > + // FIXME: Are the next two lines sufficient, or do I need to > + // do some additional magic? > + vma->vm_flags |= mm->def_flags; > + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; Yeah, you'll need to change the PTEs for those pages you created by calling get_user_page() by calling an mprotect like function; perhaps something like: struct vm_area_struct *prev; unsigned long vm_flags = vma->vm_flags; s/vma->vm_flags/vm_flags/g err = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); BUG_ON(prev != vma); mprotect_fixup will then set the new protection on all PTEs and update vma->vm_flags and vma->vm_page_prot. > + /* Move stack pages down in memory. */ > + if (stack_shift) { > + // FIXME: Verify the shift is OK. > + What exactly are you wondering about? the call to move_vma looks sane to me > + /* This should be safe even with overlap because we > + * are shifting down. */ > + ret = move_vma(vma, vma->vm_start, > + vma->vm_end - vma->vm_start, > + vma->vm_end - vma->vm_start, > + vma->vm_start - stack_shift); > + if (ret & ~PAGE_MASK) { > + up_write(&mm->mmap_sem); > + return ret; > + } > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 13:14 ` Peter Zijlstra @ 2006-10-11 21:48 ` Ollie Wild 2006-10-24 17:48 ` Ollie Wild 0 siblings, 1 reply; 16+ messages in thread From: Ollie Wild @ 2006-10-11 21:48 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells [-- Attachment #1: Type: text/plain, Size: 745 bytes --] > Yeah, you'll need to change the PTEs for those pages you created by > calling get_user_page() by calling an mprotect like function; perhaps > something like: Thanks. I've incorporated your changes (updated patch attached). > > + /* Move stack pages down in memory. */ > > + if (stack_shift) { > > + // FIXME: Verify the shift is OK. > > + > > What exactly are you wondering about? the call to move_vma looks sane to > me My concern was that the binfmt handler may have setup other vm areas which overlap the new range. The move_vma() function doesn't do overlap checking. I'm not sure if this is something I need to guard against, or if it falls in the "Don't do that!" category. Ollie [-- Attachment #2: no_MAX_ARG_PAGES.patch --] [-- Type: application/octet-stream, Size: 27310 bytes --] diff --git a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c index 82ef182..2e5a934 100644 --- a/arch/x86_64/ia32/ia32_binfmt.c +++ b/arch/x86_64/ia32/ia32_binfmt.c @@ -279,9 +279,6 @@ #define exit_elf_binfmt exit_elf32_bin #define load_elf_binary load_elf32_binary #define ELF_PLAT_INIT(r, load_addr) elf32_init(r) -#define setup_arg_pages(bprm, stack_top, exec_stack) \ - ia32_setup_arg_pages(bprm, stack_top, exec_stack) -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack); #undef start_thread #define start_thread(regs,new_rip,new_rsp) do { \ @@ -338,57 +335,7 @@ static void elf32_init(struct pt_regs *r int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; - struct mm_struct *mm = current->mm; - int i, ret; - - stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE; - mm->arg_start = bprm->p + stack_base; - - bprm->p += stack_base; - if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); - if (!mpnt) - return -ENOMEM; - - memset(mpnt, 0, sizeof(*mpnt)); - - down_write(&mm->mmap_sem); - { - mpnt->vm_mm = mm; - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; - mpnt->vm_end = stack_top; - if (executable_stack == EXSTACK_ENABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; - else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? - PAGE_COPY_EXEC : PAGE_COPY; - if ((ret = insert_vm_struct(mm, mpnt))) { - up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); - return ret; - } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } - - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); - } - stack_base += PAGE_SIZE; - } - up_write(&mm->mmap_sem); - - return 0; + return setup_arg_pages(bprm, stack_top, executable_stack); } EXPORT_SYMBOL(ia32_setup_arg_pages); diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 06435f3..d0e2292 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -255,8 +255,8 @@ #endif while (argc-- > 0) { size_t len; __put_user((elf_addr_t)p, argv++); - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -266,8 +266,8 @@ #endif while (envc-- > 0) { size_t len; __put_user((elf_addr_t)p, envp++); - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -769,10 +769,6 @@ static int load_elf_binary(struct linux_ } /* OK, This is the point of no return */ - current->mm->start_data = 0; - current->mm->end_data = 0; - current->mm->end_code = 0; - current->mm->mmap = NULL; current->flags &= ~PF_FORKNOEXEC; current->mm->def_flags = def_flags; diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 1713c48..1378ba3 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux goto _ret; if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + goto _ret; } if (fmt->flags & MISC_FMT_OPEN_BINARY) { diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 1edbcca..5b6309f 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -68,7 +68,9 @@ static int load_script(struct linux_binp * This is done in reverse order, because of how the * user environment and arguments are stored. */ - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + return retval; retval = copy_strings_kernel(1, &bprm->interp, bprm); if (retval < 0) return retval; bprm->argc++; diff --git a/fs/compat.c b/fs/compat.c index 50624d4..359836b 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -1385,6 +1385,7 @@ static int compat_copy_strings(int argc, { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -1393,92 +1394,72 @@ static int compat_copy_strings(int argc, unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(compat_ptr(str), bprm->p))) { + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { + if (MAX_ARG_STRLEN < len) { ret = -E2BIG; goto out; } - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; - if (!page) { - ret = -ENOMEM; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + + ret = get_user_pages(current, bprm->mm, pos, + 1, 1, 1, &page, NULL); + if (ret <= 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, compat_ptr(str), - bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, compat_ptr(str), + bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); - return ret; -} - -#ifdef CONFIG_MMU - -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; + put_page(kmapped_page); } + return ret; } -#endif /* CONFIG_MMU */ - /* * compat_do_execve() is mostly a copy of do_execve(), with the exception * that it processes 32 bit argv and envp pointers. @@ -1491,7 +1472,6 @@ int compat_do_execve(char * filename, struct linux_binprm *bprm; struct file *file; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1505,24 +1485,19 @@ int compat_do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t)); + bprm->argc = compat_count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t)); + bprm->envc = compat_count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1547,10 +1522,8 @@ int compat_do_execve(char * filename, if (retval < 0) goto out; - retval = search_binary_handler(bprm, regs); + retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ security_bprm_free(bprm); acct_update_integrals(current); @@ -1559,19 +1532,12 @@ int compat_do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { diff --git a/fs/exec.c b/fs/exec.c index d993ea1..3739c26 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -172,6 +172,79 @@ exit: goto out; } +#ifdef CONFIG_STACK_GROWSUP +#error I broke your build because I rearchitected the stack code, and I \ + don't have access to an architecture where CONFIG_STACK_GROWSUP is \ + set. Please fixe this or send me a machine which I can test this on. \ + \ + -- Ollie Wild <aaw@google.com> +#endif + +/* Create a new mm_struct and populate it with a temporary stack + * vm_area_struct. We don't have enough context at this point to set the + * stack flags, permissions, and offset, so we use temporary values. We'll + * update them later in setup_arg_pages(). */ +int bprm_mm_init(struct linux_binprm *bprm) +{ + int err; + struct mm_struct *mm = NULL; + struct vm_area_struct *vma = NULL; + + bprm->mm = mm = mm_alloc(); + err = -ENOMEM; + if (!mm) + goto err; + + if ((err = init_new_context(current, mm))) + goto err; + + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL); + err = -ENOMEM; + if (!vma) + goto err; + + down_write(&mm->mmap_sem); + { + vma->vm_mm = mm; + + /* Place the stack at the top of user memory. Later, we'll + * move this to an appropriate place. We don't use STACK_TOP + * because that can depend on attributes which aren't + * configured yet. */ + vma->vm_end = TASK_SIZE; + vma->vm_start = vma->vm_end - PAGE_SIZE; + + vma->vm_flags = VM_STACK_FLAGS; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + if ((err = insert_vm_struct(mm, vma))) { + up_write(&mm->mmap_sem); + goto err; + } + + mm->stack_vm = mm->total_vm = 1; + } + up_write(&mm->mmap_sem); + + bprm->p = vma->vm_end - sizeof(void *); + + return 0; + +err: + if (vma) { + bprm->vma = NULL; + kmem_cache_free(vm_area_cachep, vma); + } + + if (mm) { + bprm->mm = NULL; + mmdrop(mm); + } + + return err; +} + +EXPORT_SYMBOL(bprm_mm_init); + /* * count() counts the number of strings in array ARGV. */ @@ -197,15 +270,16 @@ static int count(char __user * __user * } /* - * 'copy_strings()' copies argument/environment strings from user - * memory to free pages in kernel mem. These are in a format ready - * to be put directly into the top of new user memory. + * 'copy_strings()' copies argument/environment strings from the old + * processes's memory to the new process's stack. The call to get_user_pages() + * ensures the destination page is created and not swapped out. */ static int copy_strings(int argc, char __user * __user * argv, struct linux_binprm *bprm) { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -214,69 +288,68 @@ static int copy_strings(int argc, char _ unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(str, bprm->p))) { + !(len = strnlen_user(str, MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { + if (MAX_ARG_STRLEN < len) { ret = -E2BIG; goto out; } - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; - if (!page) { - ret = -ENOMEM; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + + ret = get_user_pages(current, bprm->mm, pos, + 1, 1, 1, &page, NULL); + if (ret <= 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, str, bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } return ret; } @@ -295,157 +368,79 @@ int copy_strings_kernel(int argc,char ** EXPORT_SYMBOL(copy_strings_kernel); -#ifdef CONFIG_MMU -/* - * This routine is used to map in a page into an address space: needed by - * execve() for the initial stack and environment pages. - * - * vma->vm_mm->mmap_sem is held for writing. - */ -void install_arg_page(struct vm_area_struct *vma, - struct page *page, unsigned long address) -{ - struct mm_struct *mm = vma->vm_mm; - pte_t * pte; - spinlock_t *ptl; - - if (unlikely(anon_vma_prepare(vma))) - goto out; - - flush_dcache_page(page); - pte = get_locked_pte(mm, address, &ptl); - if (!pte) - goto out; - if (!pte_none(*pte)) { - pte_unmap_unlock(pte, ptl); - goto out; - } - inc_mm_counter(mm, anon_rss); - lru_cache_add_active(page); - set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte( - page, vma->vm_page_prot)))); - page_add_new_anon_rmap(page, vma, address); - pte_unmap_unlock(pte, ptl); - - /* no need for flush_tlb */ - return; -out: - __free_page(page); - force_sig(SIGKILL, current); -} - #define EXTRA_STACK_VM_PAGES 20 /* random */ +/* Finalizes the stack vm_area_struct. The flags and permissions are updated, + * the stack is optionally relocated, and some extra space is added. + */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; + unsigned long ret; + unsigned long stack_base, stack_shift; struct mm_struct *mm = current->mm; - int i, ret; - long arg_size; -#ifdef CONFIG_STACK_GROWSUP - /* Move the argument and environment strings to the bottom of the - * stack space. - */ - int offset, j; - char *to, *from; - - /* Start by shifting all the pages down */ - i = 0; - for (j = 0; j < MAX_ARG_PAGES; j++) { - struct page *page = bprm->page[j]; - if (!page) - continue; - bprm->page[i++] = page; - } + BUG_ON(stack_top > TASK_SIZE); + BUG_ON(stack_top & ~PAGE_MASK); - /* Now move them within their pages */ - offset = bprm->p % PAGE_SIZE; - to = kmap(bprm->page[0]); - for (j = 1; j < i; j++) { - memmove(to, to + offset, PAGE_SIZE - offset); - from = kmap(bprm->page[j]); - memcpy(to + PAGE_SIZE - offset, from, offset); - kunmap(bprm->page[j - 1]); - to = from; - } - memmove(to, to + offset, PAGE_SIZE - offset); - kunmap(bprm->page[j - 1]); - - /* Limit stack size to 1GB */ - stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; - if (stack_base > (1 << 30)) - stack_base = 1 << 30; - stack_base = PAGE_ALIGN(stack_top - stack_base); - - /* Adjust bprm->p to point to the end of the strings. */ - bprm->p = stack_base + PAGE_SIZE * i - offset; - - mm->arg_start = stack_base; - arg_size = i << PAGE_SHIFT; - - /* zero pages that were copied above */ - while (i < MAX_ARG_PAGES) - bprm->page[i++] = NULL; -#else - stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE); + stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE); stack_base = PAGE_ALIGN(stack_base); - bprm->p += stack_base; - mm->arg_start = bprm->p; - arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start); -#endif - arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; + stack_shift = (bprm->p & PAGE_MASK) - stack_base; + BUG_ON(stack_shift < 0); + bprm->p -= stack_shift; + mm->arg_start = bprm->p; if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); - if (!mpnt) - return -ENOMEM; - - memset(mpnt, 0, sizeof(*mpnt)); + bprm->loader -= stack_shift; + bprm->exec -= stack_shift; down_write(&mm->mmap_sem); { - mpnt->vm_mm = mm; -#ifdef CONFIG_STACK_GROWSUP - mpnt->vm_start = stack_base; - mpnt->vm_end = stack_base + arg_size; -#else - mpnt->vm_end = stack_top; - mpnt->vm_start = mpnt->vm_end - arg_size; -#endif + struct vm_area_struct *vma = bprm->vma; + struct vm_area_struct *prev = NULL; + unsigned long vm_flags = vma->vm_flags; + /* Adjust stack execute permissions; explicitly enable * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X * and leave alone (arch default) otherwise. */ if (unlikely(executable_stack == EXSTACK_ENABLE_X)) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; + vm_flags |= VM_EXEC; else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_flags |= mm->def_flags; - mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7]; - if ((ret = insert_vm_struct(mm, mpnt))) { + vm_flags &= ~VM_EXEC; + vm_flags |= mm->def_flags; + + ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, + vm_flags); + if (ret) { up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); return ret; } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } + BUG_ON(prev != vma); + + /* Move stack pages down in memory. */ + if (stack_shift) { + /* This should be safe even with overlap because we + * are shifting down. */ + ret = move_vma(vma, vma->vm_start, + vma->vm_end - vma->vm_start, + vma->vm_end - vma->vm_start, + vma->vm_start - stack_shift); + if (ret & ~PAGE_MASK) { + up_write(&mm->mmap_sem); + return ret; + } + } - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); + // Expand the stack. + vma = find_vma(mm, bprm->p); + BUG_ON(!vma || bprm->p < vma->vm_start); + if (expand_stack(vma, stack_base - + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { + up_write(&mm->mmap_sem); + return -EFAULT; } - stack_base += PAGE_SIZE; } up_write(&mm->mmap_sem); @@ -454,23 +449,6 @@ #endif EXPORT_SYMBOL(setup_arg_pages); -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; - } -} - -#endif /* CONFIG_MMU */ - struct file *open_exec(const char *name) { struct nameidata nd; @@ -986,8 +964,10 @@ void compute_creds(struct linux_binprm * EXPORT_SYMBOL(compute_creds); -void remove_arg_zero(struct linux_binprm *bprm) +int remove_arg_zero(struct linux_binprm *bprm) { + int ret = 0; + if (bprm->argc) { unsigned long offset; char * kaddr; @@ -1001,13 +981,23 @@ void remove_arg_zero(struct linux_binprm continue; offset = 0; kunmap_atomic(kaddr, KM_USER0); + put_page(page); inside: - page = bprm->page[bprm->p/PAGE_SIZE]; + ret = get_user_pages(current, bprm->mm, bprm->p, + 1, 0, 1, &page, NULL); + if (ret <= 0) { + ret = -EFAULT; + goto out; + } kaddr = kmap_atomic(page, KM_USER0); } kunmap_atomic(kaddr, KM_USER0); bprm->argc--; + ret = 0; } + +out: + return ret; } EXPORT_SYMBOL(remove_arg_zero); @@ -1034,7 +1024,7 @@ #ifdef __alpha__ fput(bprm->file); bprm->file = NULL; - loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); + loader = bprm->vma->vm_end - sizeof(void *); file = open_exec("/sbin/loader"); retval = PTR_ERR(file); @@ -1127,7 +1117,6 @@ int do_execve(char * filename, struct linux_binprm *bprm; struct file *file; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1141,25 +1130,19 @@ int do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); - bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = count(argv, bprm->p / sizeof(void *)); + bprm->argc = count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = count(envp, bprm->p / sizeof(void *)); + bprm->envc = count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1186,8 +1169,6 @@ int do_execve(char * filename, retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ security_bprm_free(bprm); acct_update_integrals(current); @@ -1196,19 +1177,12 @@ int do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index c1e82c5..3e11591 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -5,12 +5,9 @@ #include <linux/capability.h> struct pt_regs; -/* - * MAX_ARG_PAGES defines the number of pages allocated for arguments - * and envelope for the new program. 32 should suffice, this gives - * a maximum env+arg of 128kB w/4KB pages! - */ -#define MAX_ARG_PAGES 32 +/* FIXME: Find real limits, or none. */ +#define MAX_ARG_STRLEN (PAGE_SIZE * 32) +#define MAX_ARG_STRINGS 0x7FFFFFFF /* sizeof(linux_binprm->buf) */ #define BINPRM_BUF_SIZE 128 @@ -22,7 +19,7 @@ #ifdef __KERNEL__ */ struct linux_binprm{ char buf[BINPRM_BUF_SIZE]; - struct page *page[MAX_ARG_PAGES]; + struct vm_area_struct *vma; struct mm_struct *mm; unsigned long p; /* current top of mem */ int sh_bang; @@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_ extern int unregister_binfmt(struct linux_binfmt *); extern int prepare_binprm(struct linux_binprm *); -extern void remove_arg_zero(struct linux_binprm *); +extern int __must_check remove_arg_zero(struct linux_binprm *); extern int search_binary_handler(struct linux_binprm *,struct pt_regs *); extern int flush_old_exec(struct linux_binprm * bprm); @@ -82,6 +79,7 @@ #define EXSTACK_ENABLE_X 2 /* Enable ex extern int setup_arg_pages(struct linux_binprm * bprm, unsigned long stack_top, int executable_stack); +extern int bprm_mm_init(struct linux_binprm *bprm); extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm); extern void compute_creds(struct linux_binprm *binprm); extern int do_coredump(long signr, int exit_code, struct pt_regs * regs); diff --git a/include/linux/mm.h b/include/linux/mm.h index 2614662..bb1d4c9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -738,7 +738,6 @@ #endif extern int make_pages_present(unsigned long addr, unsigned long end); extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write); -void install_arg_page(struct vm_area_struct *, struct page *, unsigned long); int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, struct page **pages, struct vm_area_struct **vmas); @@ -754,9 +753,15 @@ int FASTCALL(set_page_dirty(struct page int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); +extern unsigned long move_vma(struct vm_area_struct *vma, + unsigned long old_addr, unsigned long old_len, + unsigned long new_len, unsigned long new_addr); extern unsigned long do_mremap(unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags, unsigned long new_addr); +extern int mprotect_fixup(struct vm_area_struct *vma, + struct vm_area_struct **pprev, unsigned long start, + unsigned long end, unsigned long newflags); /* * Prototype to add a shrinker callback for ageable caches. diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 42f2f11..d25c73e 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1748,6 +1748,10 @@ int __audit_ipc_set_perm(unsigned long q int audit_bprm(struct linux_binprm *bprm) { + /* FIXME: Don't do anything for now until I figure out how to handle + * this. With the latest changes, kmalloc could well fail under good + * scenarios. */ +#if 0 struct audit_aux_data_execve *ax; struct audit_context *context = current->audit_context; unsigned long p, next; @@ -1775,6 +1779,7 @@ int audit_bprm(struct linux_binprm *bprm ax->d.type = AUDIT_EXECVE; ax->d.next = context->aux; context->aux = (void *)ax; +#endif return 0; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 3b8f3c0..e8346c3 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -128,7 +128,7 @@ static void change_protection(struct vm_ flush_tlb_range(vma, start, end); } -static int +int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) { diff --git a/mm/mremap.c b/mm/mremap.c index 9c769fa..957b6af 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -155,7 +155,7 @@ static unsigned long move_page_tables(st return len + old_addr - old_end; /* how much done */ } -static unsigned long move_vma(struct vm_area_struct *vma, +unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr) { ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 21:48 ` Ollie Wild @ 2006-10-24 17:48 ` Ollie Wild 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar 2007-01-02 18:18 ` David Howells 0 siblings, 2 replies; 16+ messages in thread From: Ollie Wild @ 2006-10-24 17:48 UTC (permalink / raw) To: linux-kernel Cc: parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra It's been a couple weeks since I got any feedback on this. I just wanted to send out a gentle reminder to encourage people to take a look. I realize the kernel is in bugfix mode at the moment. If people would rather I resend after the 2.6.19 release, please let me know. Thanks, Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [patch] remove MAX_ARG_PAGES 2006-10-24 17:48 ` Ollie Wild @ 2006-12-29 20:03 ` Ingo Molnar 2006-12-29 20:49 ` Russell King ` (3 more replies) 2007-01-02 18:18 ` David Howells 1 sibling, 4 replies; 16+ messages in thread From: Ingo Molnar @ 2006-12-29 20:03 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 2.6.20-rc2 and have included it in the -rt kernel. It's working great - i can now finally do a "ls -t patches/*.patch" in my patch repository - something i havent been able to do for years ;-) what is keeping this fix from going upstream? Ingo --------------> Subject: [patch] remove MAX_ARG_PAGES From: Ollie Wild <aaw@google.com> this patch removes the MAX_ARG_PAGES limit by copying between VMs. This makes process argv/env limited by the stack limit (and it's thus arbitrarily sizable). No more: -bash: /bin/ls: Argument list too long Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86_64/ia32/ia32_binfmt.c | 55 ----- fs/binfmt_elf.c | 12 - fs/binfmt_misc.c | 4 fs/binfmt_script.c | 4 fs/compat.c | 118 ++++-------- fs/exec.c | 382 +++++++++++++++++++---------------------- include/linux/binfmts.h | 14 - include/linux/mm.h | 7 kernel/auditsc.c | 5 mm/mprotect.c | 2 mm/mremap.c | 2 11 files changed, 250 insertions(+), 355 deletions(-) Index: linux/arch/x86_64/ia32/ia32_binfmt.c =================================================================== --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c +++ linux/arch/x86_64/ia32/ia32_binfmt.c @@ -279,9 +279,6 @@ do { \ #define load_elf_binary load_elf32_binary #define ELF_PLAT_INIT(r, load_addr) elf32_init(r) -#define setup_arg_pages(bprm, stack_top, exec_stack) \ - ia32_setup_arg_pages(bprm, stack_top, exec_stack) -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack); #undef start_thread #define start_thread(regs,new_rip,new_rsp) do { \ @@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; - struct mm_struct *mm = current->mm; - int i, ret; - - stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE; - mm->arg_start = bprm->p + stack_base; - - bprm->p += stack_base; - if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); - if (!mpnt) - return -ENOMEM; - - memset(mpnt, 0, sizeof(*mpnt)); - - down_write(&mm->mmap_sem); - { - mpnt->vm_mm = mm; - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; - mpnt->vm_end = stack_top; - if (executable_stack == EXSTACK_ENABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; - else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? - PAGE_COPY_EXEC : PAGE_COPY; - if ((ret = insert_vm_struct(mm, mpnt))) { - up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); - return ret; - } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } - - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); - } - stack_base += PAGE_SIZE; - } - up_write(&mm->mmap_sem); - - return 0; + return setup_arg_pages(bprm, stack_top, executable_stack); } EXPORT_SYMBOL(ia32_setup_arg_pages); Index: linux/fs/binfmt_elf.c =================================================================== --- linux.orig/fs/binfmt_elf.c +++ linux/fs/binfmt_elf.c @@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b size_t len; if (__put_user((elf_addr_t)p, argv++)) return -EFAULT; - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b size_t len; if (__put_user((elf_addr_t)p, envp++)) return -EFAULT; - len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES); - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); + if (!len || len > MAX_ARG_STRLEN) return 0; p += len; } @@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_ } /* OK, This is the point of no return */ - current->mm->start_data = 0; - current->mm->end_data = 0; - current->mm->end_code = 0; - current->mm->mmap = NULL; current->flags &= ~PF_FORKNOEXEC; current->mm->def_flags = def_flags; Index: linux/fs/binfmt_misc.c =================================================================== --- linux.orig/fs/binfmt_misc.c +++ linux/fs/binfmt_misc.c @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux goto _ret; if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + goto _ret; } if (fmt->flags & MISC_FMT_OPEN_BINARY) { Index: linux/fs/binfmt_script.c =================================================================== --- linux.orig/fs/binfmt_script.c +++ linux/fs/binfmt_script.c @@ -68,7 +68,9 @@ static int load_script(struct linux_binp * This is done in reverse order, because of how the * user environment and arguments are stored. */ - remove_arg_zero(bprm); + retval = remove_arg_zero(bprm); + if (retval) + return retval; retval = copy_strings_kernel(1, &bprm->interp, bprm); if (retval < 0) return retval; bprm->argc++; Index: linux/fs/compat.c =================================================================== --- linux.orig/fs/compat.c +++ linux/fs/compat.c @@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc, { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc, unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(compat_ptr(str), bprm->p))) { + !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { + if (MAX_ARG_STRLEN < len) { ret = -E2BIG; goto out; } - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; - if (!page) { - ret = -ENOMEM; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + + ret = get_user_pages(current, bprm->mm, pos, + 1, 1, 1, &page, NULL); + if (ret <= 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, compat_ptr(str), - bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, compat_ptr(str), + bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); - return ret; -} - -#ifdef CONFIG_MMU - -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; + put_page(kmapped_page); } + return ret; } -#endif /* CONFIG_MMU */ - /* * compat_do_execve() is mostly a copy of do_execve(), with the exception * that it processes 32 bit argv and envp pointers. @@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename, struct linux_binprm *bprm; struct file *file; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t)); + bprm->argc = compat_count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t)); + bprm->envc = compat_count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename, if (retval < 0) goto out; - retval = search_binary_handler(bprm, regs); + retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ security_bprm_free(bprm); acct_update_integrals(current); @@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { Index: linux/fs/exec.c =================================================================== --- linux.orig/fs/exec.c +++ linux/fs/exec.c @@ -174,6 +174,79 @@ exit: goto out; } +#ifdef CONFIG_STACK_GROWSUP +#error I broke your build because I rearchitected the stack code, and I \ + don't have access to an architecture where CONFIG_STACK_GROWSUP is \ + set. Please fixe this or send me a machine which I can test this on. \ + \ + -- Ollie Wild <aaw@google.com> +#endif + +/* Create a new mm_struct and populate it with a temporary stack + * vm_area_struct. We don't have enough context at this point to set the + * stack flags, permissions, and offset, so we use temporary values. We'll + * update them later in setup_arg_pages(). */ +int bprm_mm_init(struct linux_binprm *bprm) +{ + int err; + struct mm_struct *mm = NULL; + struct vm_area_struct *vma = NULL; + + bprm->mm = mm = mm_alloc(); + err = -ENOMEM; + if (!mm) + goto err; + + if ((err = init_new_context(current, mm))) + goto err; + + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); + err = -ENOMEM; + if (!vma) + goto err; + + down_write(&mm->mmap_sem); + { + vma->vm_mm = mm; + + /* Place the stack at the top of user memory. Later, we'll + * move this to an appropriate place. We don't use STACK_TOP + * because that can depend on attributes which aren't + * configured yet. */ + vma->vm_end = TASK_SIZE; + vma->vm_start = vma->vm_end - PAGE_SIZE; + + vma->vm_flags = VM_STACK_FLAGS; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + if ((err = insert_vm_struct(mm, vma))) { + up_write(&mm->mmap_sem); + goto err; + } + + mm->stack_vm = mm->total_vm = 1; + } + up_write(&mm->mmap_sem); + + bprm->p = vma->vm_end - sizeof(void *); + + return 0; + +err: + if (vma) { + bprm->vma = NULL; + kmem_cache_free(vm_area_cachep, vma); + } + + if (mm) { + bprm->mm = NULL; + mmdrop(mm); + } + + return err; +} + +EXPORT_SYMBOL(bprm_mm_init); + /* * count() counts the number of strings in array ARGV. */ @@ -199,15 +272,16 @@ static int count(char __user * __user * } /* - * 'copy_strings()' copies argument/environment strings from user - * memory to free pages in kernel mem. These are in a format ready - * to be put directly into the top of new user memory. + * 'copy_strings()' copies argument/environment strings from the old + * processes's memory to the new process's stack. The call to get_user_pages() + * ensures the destination page is created and not swapped out. */ static int copy_strings(int argc, char __user * __user * argv, struct linux_binprm *bprm) { struct page *kmapped_page = NULL; char *kaddr = NULL; + unsigned long kpos = 0; int ret; while (argc-- > 0) { @@ -216,69 +290,68 @@ static int copy_strings(int argc, char _ unsigned long pos; if (get_user(str, argv+argc) || - !(len = strnlen_user(str, bprm->p))) { + !(len = strnlen_user(str, MAX_ARG_STRLEN))) { ret = -EFAULT; goto out; } - if (bprm->p < len) { + if (MAX_ARG_STRLEN < len) { ret = -E2BIG; goto out; } - bprm->p -= len; - /* XXX: add architecture specific overflow check here. */ + /* We're going to work our way backwords. */ pos = bprm->p; + str += len; + bprm->p -= len; while (len > 0) { - int i, new, err; int offset, bytes_to_copy; - struct page *page; offset = pos % PAGE_SIZE; - i = pos/PAGE_SIZE; - page = bprm->page[i]; - new = 0; - if (!page) { - page = alloc_page(GFP_HIGHUSER); - bprm->page[i] = page; - if (!page) { - ret = -ENOMEM; + if (offset == 0) + offset = PAGE_SIZE; + + bytes_to_copy = offset; + if (bytes_to_copy > len) + bytes_to_copy = len; + + offset -= bytes_to_copy; + pos -= bytes_to_copy; + str -= bytes_to_copy; + len -= bytes_to_copy; + + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { + struct page *page; + + ret = get_user_pages(current, bprm->mm, pos, + 1, 1, 1, &page, NULL); + if (ret <= 0) { + /* We've exceed the stack rlimit. */ + ret = -E2BIG; goto out; } - new = 1; - } - if (page != kmapped_page) { - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } kmapped_page = page; kaddr = kmap(kmapped_page); + kpos = pos & PAGE_MASK; } - if (new && offset) - memset(kaddr, 0, offset); - bytes_to_copy = PAGE_SIZE - offset; - if (bytes_to_copy > len) { - bytes_to_copy = len; - if (new) - memset(kaddr+offset+len, 0, - PAGE_SIZE-offset-len); - } - err = copy_from_user(kaddr+offset, str, bytes_to_copy); - if (err) { + if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; goto out; } - - pos += bytes_to_copy; - str += bytes_to_copy; - len -= bytes_to_copy; } } ret = 0; out: - if (kmapped_page) + if (kmapped_page) { kunmap(kmapped_page); + put_page(kmapped_page); + } return ret; } @@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char ** EXPORT_SYMBOL(copy_strings_kernel); -#ifdef CONFIG_MMU -/* - * This routine is used to map in a page into an address space: needed by - * execve() for the initial stack and environment pages. - * - * vma->vm_mm->mmap_sem is held for writing. - */ -void install_arg_page(struct vm_area_struct *vma, - struct page *page, unsigned long address) -{ - struct mm_struct *mm = vma->vm_mm; - pte_t * pte; - spinlock_t *ptl; - - if (unlikely(anon_vma_prepare(vma))) - goto out; - - flush_dcache_page(page); - pte = get_locked_pte(mm, address, &ptl); - if (!pte) - goto out; - if (!pte_none(*pte)) { - pte_unmap_unlock(pte, ptl); - goto out; - } - inc_mm_counter(mm, anon_rss); - lru_cache_add_active(page); - set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte( - page, vma->vm_page_prot)))); - page_add_new_anon_rmap(page, vma, address); - pte_unmap_unlock(pte, ptl); - - /* no need for flush_tlb */ - return; -out: - __free_page(page); - force_sig(SIGKILL, current); -} - #define EXTRA_STACK_VM_PAGES 20 /* random */ +/* Finalizes the stack vm_area_struct. The flags and permissions are updated, + * the stack is optionally relocated, and some extra space is added. + */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { - unsigned long stack_base; - struct vm_area_struct *mpnt; + unsigned long ret; + unsigned long stack_base, stack_shift; struct mm_struct *mm = current->mm; - int i, ret; - long arg_size; -#ifdef CONFIG_STACK_GROWSUP - /* Move the argument and environment strings to the bottom of the - * stack space. - */ - int offset, j; - char *to, *from; + BUG_ON(stack_top > TASK_SIZE); + BUG_ON(stack_top & ~PAGE_MASK); - /* Start by shifting all the pages down */ - i = 0; - for (j = 0; j < MAX_ARG_PAGES; j++) { - struct page *page = bprm->page[j]; - if (!page) - continue; - bprm->page[i++] = page; - } - - /* Now move them within their pages */ - offset = bprm->p % PAGE_SIZE; - to = kmap(bprm->page[0]); - for (j = 1; j < i; j++) { - memmove(to, to + offset, PAGE_SIZE - offset); - from = kmap(bprm->page[j]); - memcpy(to + PAGE_SIZE - offset, from, offset); - kunmap(bprm->page[j - 1]); - to = from; - } - memmove(to, to + offset, PAGE_SIZE - offset); - kunmap(bprm->page[j - 1]); - - /* Limit stack size to 1GB */ - stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; - if (stack_base > (1 << 30)) - stack_base = 1 << 30; - stack_base = PAGE_ALIGN(stack_top - stack_base); - - /* Adjust bprm->p to point to the end of the strings. */ - bprm->p = stack_base + PAGE_SIZE * i - offset; - - mm->arg_start = stack_base; - arg_size = i << PAGE_SHIFT; - - /* zero pages that were copied above */ - while (i < MAX_ARG_PAGES) - bprm->page[i++] = NULL; -#else - stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE); + stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE); stack_base = PAGE_ALIGN(stack_base); - bprm->p += stack_base; - mm->arg_start = bprm->p; - arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start); -#endif - arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; + stack_shift = (bprm->p & PAGE_MASK) - stack_base; + BUG_ON(stack_shift < 0); + bprm->p -= stack_shift; + mm->arg_start = bprm->p; if (bprm->loader) - bprm->loader += stack_base; - bprm->exec += stack_base; - - mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); - if (!mpnt) - return -ENOMEM; - - memset(mpnt, 0, sizeof(*mpnt)); + bprm->loader -= stack_shift; + bprm->exec -= stack_shift; down_write(&mm->mmap_sem); { - mpnt->vm_mm = mm; -#ifdef CONFIG_STACK_GROWSUP - mpnt->vm_start = stack_base; - mpnt->vm_end = stack_base + arg_size; -#else - mpnt->vm_end = stack_top; - mpnt->vm_start = mpnt->vm_end - arg_size; -#endif + struct vm_area_struct *vma = bprm->vma; + struct vm_area_struct *prev = NULL; + unsigned long vm_flags = vma->vm_flags; + /* Adjust stack execute permissions; explicitly enable * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X * and leave alone (arch default) otherwise. */ if (unlikely(executable_stack == EXSTACK_ENABLE_X)) - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; + vm_flags |= VM_EXEC; else if (executable_stack == EXSTACK_DISABLE_X) - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; - else - mpnt->vm_flags = VM_STACK_FLAGS; - mpnt->vm_flags |= mm->def_flags; - mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7]; - if ((ret = insert_vm_struct(mm, mpnt))) { + vm_flags &= ~VM_EXEC; + vm_flags |= mm->def_flags; + + ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, + vm_flags); + if (ret) { up_write(&mm->mmap_sem); - kmem_cache_free(vm_area_cachep, mpnt); return ret; } - mm->stack_vm = mm->total_vm = vma_pages(mpnt); - } + BUG_ON(prev != vma); + + /* Move stack pages down in memory. */ + if (stack_shift) { + /* This should be safe even with overlap because we + * are shifting down. */ + ret = move_vma(vma, vma->vm_start, + vma->vm_end - vma->vm_start, + vma->vm_end - vma->vm_start, + vma->vm_start - stack_shift); + if (ret & ~PAGE_MASK) { + up_write(&mm->mmap_sem); + return ret; + } + } - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page *page = bprm->page[i]; - if (page) { - bprm->page[i] = NULL; - install_arg_page(mpnt, page, stack_base); + // Expand the stack. + vma = find_vma(mm, bprm->p); + BUG_ON(!vma || bprm->p < vma->vm_start); + if (expand_stack(vma, stack_base - + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) { + up_write(&mm->mmap_sem); + return -EFAULT; } - stack_base += PAGE_SIZE; } up_write(&mm->mmap_sem); @@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm EXPORT_SYMBOL(setup_arg_pages); -#define free_arg_pages(bprm) do { } while (0) - -#else - -static inline void free_arg_pages(struct linux_binprm *bprm) -{ - int i; - - for (i = 0; i < MAX_ARG_PAGES; i++) { - if (bprm->page[i]) - __free_page(bprm->page[i]); - bprm->page[i] = NULL; - } -} - -#endif /* CONFIG_MMU */ - struct file *open_exec(const char *name) { struct nameidata nd; @@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm * EXPORT_SYMBOL(compute_creds); -void remove_arg_zero(struct linux_binprm *bprm) +int remove_arg_zero(struct linux_binprm *bprm) { + int ret = 0; + if (bprm->argc) { unsigned long offset; char * kaddr; @@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm continue; offset = 0; kunmap_atomic(kaddr, KM_USER0); + put_page(page); inside: - page = bprm->page[bprm->p/PAGE_SIZE]; + ret = get_user_pages(current, bprm->mm, bprm->p, + 1, 0, 1, &page, NULL); + if (ret <= 0) { + ret = -EFAULT; + goto out; + } kaddr = kmap_atomic(page, KM_USER0); } kunmap_atomic(kaddr, KM_USER0); bprm->argc--; + ret = 0; } + +out: + return ret; } EXPORT_SYMBOL(remove_arg_zero); @@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; - loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); + loader = bprm->vma->vm_end - sizeof(void *); file = open_exec("/sbin/loader"); retval = PTR_ERR(file); @@ -1134,7 +1124,6 @@ int do_execve(char * filename, struct linux_binprm *bprm; struct file *file; int retval; - int i; retval = -ENOMEM; bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); @@ -1148,25 +1137,19 @@ int do_execve(char * filename, sched_exec(); - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); - bprm->file = file; bprm->filename = filename; bprm->interp = filename; - bprm->mm = mm_alloc(); - retval = -ENOMEM; - if (!bprm->mm) - goto out_file; - retval = init_new_context(current, bprm->mm); - if (retval < 0) - goto out_mm; + retval = bprm_mm_init(bprm); + if (retval) + goto out_file; - bprm->argc = count(argv, bprm->p / sizeof(void *)); + bprm->argc = count(argv, MAX_ARG_STRINGS); if ((retval = bprm->argc) < 0) goto out_mm; - bprm->envc = count(envp, bprm->p / sizeof(void *)); + bprm->envc = count(envp, MAX_ARG_STRINGS); if ((retval = bprm->envc) < 0) goto out_mm; @@ -1193,8 +1176,6 @@ int do_execve(char * filename, retval = search_binary_handler(bprm,regs); if (retval >= 0) { - free_arg_pages(bprm); - /* execve success */ security_bprm_free(bprm); acct_update_integrals(current); @@ -1203,19 +1184,12 @@ int do_execve(char * filename, } out: - /* Something went wrong, return the inode and free the argument pages*/ - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { - struct page * page = bprm->page[i]; - if (page) - __free_page(page); - } - if (bprm->security) security_bprm_free(bprm); out_mm: if (bprm->mm) - mmdrop(bprm->mm); + mmput (bprm->mm); out_file: if (bprm->file) { Index: linux/include/linux/binfmts.h =================================================================== --- linux.orig/include/linux/binfmts.h +++ linux/include/linux/binfmts.h @@ -5,12 +5,9 @@ struct pt_regs; -/* - * MAX_ARG_PAGES defines the number of pages allocated for arguments - * and envelope for the new program. 32 should suffice, this gives - * a maximum env+arg of 128kB w/4KB pages! - */ -#define MAX_ARG_PAGES 32 +/* FIXME: Find real limits, or none. */ +#define MAX_ARG_STRLEN (PAGE_SIZE * 32) +#define MAX_ARG_STRINGS 0x7FFFFFFF /* sizeof(linux_binprm->buf) */ #define BINPRM_BUF_SIZE 128 @@ -22,7 +19,7 @@ struct pt_regs; */ struct linux_binprm{ char buf[BINPRM_BUF_SIZE]; - struct page *page[MAX_ARG_PAGES]; + struct vm_area_struct *vma; struct mm_struct *mm; unsigned long p; /* current top of mem */ int sh_bang; @@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_ extern int unregister_binfmt(struct linux_binfmt *); extern int prepare_binprm(struct linux_binprm *); -extern void remove_arg_zero(struct linux_binprm *); +extern int __must_check remove_arg_zero(struct linux_binprm *); extern int search_binary_handler(struct linux_binprm *,struct pt_regs *); extern int flush_old_exec(struct linux_binprm * bprm); @@ -82,6 +79,7 @@ extern int suid_dumpable; extern int setup_arg_pages(struct linux_binprm * bprm, unsigned long stack_top, int executable_stack); +extern int bprm_mm_init(struct linux_binprm *bprm); extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm); extern void compute_creds(struct linux_binprm *binprm); extern int do_coredump(long signr, int exit_code, struct pt_regs * regs); Index: linux/include/linux/mm.h =================================================================== --- linux.orig/include/linux/mm.h +++ linux/include/linux/mm.h @@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct extern int make_pages_present(unsigned long addr, unsigned long end); extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write); -void install_arg_page(struct vm_area_struct *, struct page *, unsigned long); int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, struct page **pages, struct vm_area_struct **vmas); @@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); +extern unsigned long move_vma(struct vm_area_struct *vma, + unsigned long old_addr, unsigned long old_len, + unsigned long new_len, unsigned long new_addr); extern unsigned long do_mremap(unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags, unsigned long new_addr); +extern int mprotect_fixup(struct vm_area_struct *vma, + struct vm_area_struct **pprev, unsigned long start, + unsigned long end, unsigned long newflags); /* * Prototype to add a shrinker callback for ageable caches. Index: linux/kernel/auditsc.c =================================================================== --- linux.orig/kernel/auditsc.c +++ linux/kernel/auditsc.c @@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q int audit_bprm(struct linux_binprm *bprm) { + /* FIXME: Don't do anything for now until I figure out how to handle + * this. With the latest changes, kmalloc could well fail under good + * scenarios. */ +#if 0 struct audit_aux_data_execve *ax; struct audit_context *context = current->audit_context; unsigned long p, next; @@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm ax->d.type = AUDIT_EXECVE; ax->d.next = context->aux; context->aux = (void *)ax; +#endif return 0; } Index: linux/mm/mprotect.c =================================================================== --- linux.orig/mm/mprotect.c +++ linux/mm/mprotect.c @@ -128,7 +128,7 @@ static void change_protection(struct vm_ flush_tlb_range(vma, start, end); } -static int +int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) { Index: linux/mm/mremap.c =================================================================== --- linux.orig/mm/mremap.c +++ linux/mm/mremap.c @@ -155,7 +155,7 @@ static unsigned long move_page_tables(st return len + old_addr - old_end; /* how much done */ } -static unsigned long move_vma(struct vm_area_struct *vma, +unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar @ 2006-12-29 20:49 ` Russell King 2006-12-29 21:23 ` Linus Torvalds 2006-12-29 21:57 ` Ingo Molnar 2007-01-01 6:51 ` Ollie Wild ` (2 subsequent siblings) 3 siblings, 2 replies; 16+ messages in thread From: Russell King @ 2006-12-29 20:49 UTC (permalink / raw) To: Ingo Molnar Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote: > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - > i can now finally do a "ls -t patches/*.patch" in my patch repository - > something i havent been able to do for years ;-) How do the various autoconf stuff react to this? Eg, I notice the following in various configure scripts: checking the maximum length of command line arguments... 32768 Suggest you test (eg) a rebuild of libX11 to see how it reacts to this patch. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:49 ` Russell King @ 2006-12-29 21:23 ` Linus Torvalds 2006-12-29 22:20 ` Ingo Molnar 2006-12-29 21:57 ` Ingo Molnar 1 sibling, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2006-12-29 21:23 UTC (permalink / raw) To: Russell King Cc: Ingo Molnar, Ollie Wild, linux-kernel, parisc-linux, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra On Fri, 29 Dec 2006, Russell King wrote: > > Suggest you test (eg) a rebuild of libX11 to see how it reacts to > this patch. Also: please rebuild "xargs" and install first. Otherwise, a lot of build script etc that use "xargs" won't ever trigger the new limits (or lack thereof), because xargs will have been installed with some old limits. Perhaps more worrying is if compiling xargs under a new kernel then means that it won't work correctly under an old one. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 21:23 ` Linus Torvalds @ 2006-12-29 22:20 ` Ingo Molnar 0 siblings, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2006-12-29 22:20 UTC (permalink / raw) To: Linus Torvalds Cc: Russell King, Ollie Wild, linux-kernel, parisc-linux, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra, Ulrich Drepper [Cc:-ed Ulrich too] * Linus Torvalds <torvalds@osdl.org> wrote: > On Fri, 29 Dec 2006, Russell King wrote: > > > > Suggest you test (eg) a rebuild of libX11 to see how it reacts to > > this patch. > > Also: please rebuild "xargs" and install first. Otherwise, a lot of > build script etc that use "xargs" won't ever trigger the new limits > (or lack thereof), because xargs will have been installed with some > old limits. yeah, and i think the default chunking of xargs should still remain 128K. If it's fine for a script to get chunked input, and if the script has no security relevance (xargs is fundamentally unsafe if any portion of the VFS namespace it gets used is untrusted), then there's no problem for the xargs limit to stay at 128K. > Perhaps more worrying is if compiling xargs under a new kernel then > means that it won't work correctly under an old one. xargs has its limit hardcoded AFAICS, it's based on: #define ARG_MAX 131072 /* # bytes of args + environ for exec() */ i'd not change that just yet. The sysconf(3) manpage says it's generally unreliable: BUGS It is difficult to use ARG_MAX because it is not specified how much of the argument space for exec() is consumed by the user's environment variables. but ... as it is with every limit, it is always possible to write an application that hardcodes a larger limit and then doesnt work when running with the lower limit. Would that have been a correct argument against say raising the user stack limit from the historic 1MB? right now some of my (more stupid) scripts occasionally break if any random portion of my VFS namespace grows over the silly 128K limit. (and it rarely has the tendency to shrink, sadly) I think that is just as much of a legitimate problem as any naive newly written script not working on an older kernel on a huge VFS namespace. (in fact i could argue for it to be a more legitimate problem than other stupid scripts not being backwards compatible, not the least because it is a problem with /my/ scripts ;-) we could try something like adding an ARG_MAX rlimit, but i think that would be overdoing it ... we could also do a sysctl as a global limit - equally pointless because distros will likely tweak it up anyway, and in any case neither measure really prevents the writing of stupid scripts. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:49 ` Russell King 2006-12-29 21:23 ` Linus Torvalds @ 2006-12-29 21:57 ` Ingo Molnar 1 sibling, 0 replies; 16+ messages in thread From: Ingo Molnar @ 2006-12-29 21:57 UTC (permalink / raw) To: Russell King Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra * Russell King <rmk+lkml@arm.linux.org.uk> wrote: > On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote: > > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to > > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - > > i can now finally do a "ls -t patches/*.patch" in my patch repository - > > something i havent been able to do for years ;-) > > How do the various autoconf stuff react to this? Eg, I notice the > following in various configure scripts: > > checking the maximum length of command line arguments... 32768 yes, that's how libtool works, it goes from 32K downwards to figure out a maximum. I dont see a problem there. you can find a few other variants at: http://www.google.com/codesearch?q=%22checking+the+maximum+length+of+command+line+arguments%22&hl=en&btnG=Search+Code worst-case the test-command would get a segfault from the default stack limit. (8MB on Fedora) Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar 2006-12-29 20:49 ` Russell King @ 2007-01-01 6:51 ` Ollie Wild 2007-01-01 6:58 ` Ollie Wild 2007-01-01 17:28 ` Pavel Machek 3 siblings, 0 replies; 16+ messages in thread From: Ollie Wild @ 2007-01-01 6:51 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen, linux-kernel, David Howells, linux-mm, Linus Torvalds, parisc-linux, Arjan van de Ven [-- Attachment #1.1: Type: text/plain, Size: 40586 bytes --] There are still a couple outstanding issues which need to be resolved before this is ready for inclusion in the mainline kernel. The main one is support for CONFIG_STACK_GROWSUP, which I think is just parisc. I've been meaning to look into this for a while, but I was out of commision for most of November so it got punted to the back burner. I'll try to revisit it soonish. If someone from the parisc-linux list wants to take a look, though, that's fine by me. The other is support for the various executable formats. I've tested elf and script pretty thoroughly, but I'm not sure how to go about testing most of the others -- does anyone use aout anymore? Maybe the solution is just to check it in and wait to see if someone complains. Ollie On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote: > > > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - > i can now finally do a "ls -t patches/*.patch" in my patch repository - > something i havent been able to do for years ;-) > > what is keeping this fix from going upstream? > > Ingo > > --------------> > Subject: [patch] remove MAX_ARG_PAGES > From: Ollie Wild <aaw@google.com> > > this patch removes the MAX_ARG_PAGES limit by copying between VMs. This > makes process argv/env limited by the stack limit (and it's thus > arbitrarily sizable). No more: > > -bash: /bin/ls: Argument list too long > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > arch/x86_64/ia32/ia32_binfmt.c | 55 ----- > fs/binfmt_elf.c | 12 - > fs/binfmt_misc.c | 4 > fs/binfmt_script.c | 4 > fs/compat.c | 118 ++++-------- > fs/exec.c | 382 > +++++++++++++++++++---------------------- > include/linux/binfmts.h | 14 - > include/linux/mm.h | 7 > kernel/auditsc.c | 5 > mm/mprotect.c | 2 > mm/mremap.c | 2 > 11 files changed, 250 insertions(+), 355 deletions(-) > > Index: linux/arch/x86_64/ia32/ia32_binfmt.c > =================================================================== > --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c > +++ linux/arch/x86_64/ia32/ia32_binfmt.c > @@ -279,9 +279,6 @@ do > { \ > #define load_elf_binary load_elf32_binary > > #define ELF_PLAT_INIT(r, load_addr) elf32_init(r) > -#define setup_arg_pages(bprm, stack_top, exec_stack) \ > - ia32_setup_arg_pages(bprm, stack_top, exec_stack) > -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long > stack_top, int executable_stack); > > #undef start_thread > #define start_thread(regs,new_rip,new_rsp) do { \ > @@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r > int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long > stack_top, > int executable_stack) > { > - unsigned long stack_base; > - struct vm_area_struct *mpnt; > - struct mm_struct *mm = current->mm; > - int i, ret; > - > - stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE; > - mm->arg_start = bprm->p + stack_base; > - > - bprm->p += stack_base; > - if (bprm->loader) > - bprm->loader += stack_base; > - bprm->exec += stack_base; > - > - mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); > - if (!mpnt) > - return -ENOMEM; > - > - memset(mpnt, 0, sizeof(*mpnt)); > - > - down_write(&mm->mmap_sem); > - { > - mpnt->vm_mm = mm; > - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; > - mpnt->vm_end = stack_top; > - if (executable_stack == EXSTACK_ENABLE_X) > - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; > - else if (executable_stack == EXSTACK_DISABLE_X) > - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; > - else > - mpnt->vm_flags = VM_STACK_FLAGS; > - mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? > - PAGE_COPY_EXEC : PAGE_COPY; > - if ((ret = insert_vm_struct(mm, mpnt))) { > - up_write(&mm->mmap_sem); > - kmem_cache_free(vm_area_cachep, mpnt); > - return ret; > - } > - mm->stack_vm = mm->total_vm = vma_pages(mpnt); > - } > - > - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { > - struct page *page = bprm->page[i]; > - if (page) { > - bprm->page[i] = NULL; > - install_arg_page(mpnt, page, stack_base); > - } > - stack_base += PAGE_SIZE; > - } > - up_write(&mm->mmap_sem); > - > - return 0; > + return setup_arg_pages(bprm, stack_top, executable_stack); > } > EXPORT_SYMBOL(ia32_setup_arg_pages); > > Index: linux/fs/binfmt_elf.c > =================================================================== > --- linux.orig/fs/binfmt_elf.c > +++ linux/fs/binfmt_elf.c > @@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b > size_t len; > if (__put_user((elf_addr_t)p, argv++)) > return -EFAULT; > - len = strnlen_user((void __user *)p, > PAGE_SIZE*MAX_ARG_PAGES); > - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) > + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); > + if (!len || len > MAX_ARG_STRLEN) > return 0; > p += len; > } > @@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b > size_t len; > if (__put_user((elf_addr_t)p, envp++)) > return -EFAULT; > - len = strnlen_user((void __user *)p, > PAGE_SIZE*MAX_ARG_PAGES); > - if (!len || len > PAGE_SIZE*MAX_ARG_PAGES) > + len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); > + if (!len || len > MAX_ARG_STRLEN) > return 0; > p += len; > } > @@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_ > } > > /* OK, This is the point of no return */ > - current->mm->start_data = 0; > - current->mm->end_data = 0; > - current->mm->end_code = 0; > - current->mm->mmap = NULL; > current->flags &= ~PF_FORKNOEXEC; > current->mm->def_flags = def_flags; > > Index: linux/fs/binfmt_misc.c > =================================================================== > --- linux.orig/fs/binfmt_misc.c > +++ linux/fs/binfmt_misc.c > @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux > goto _ret; > > if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { > - remove_arg_zero(bprm); > + retval = remove_arg_zero(bprm); > + if (retval) > + goto _ret; > } > > if (fmt->flags & MISC_FMT_OPEN_BINARY) { > Index: linux/fs/binfmt_script.c > =================================================================== > --- linux.orig/fs/binfmt_script.c > +++ linux/fs/binfmt_script.c > @@ -68,7 +68,9 @@ static int load_script(struct linux_binp > * This is done in reverse order, because of how the > * user environment and arguments are stored. > */ > - remove_arg_zero(bprm); > + retval = remove_arg_zero(bprm); > + if (retval) > + return retval; > retval = copy_strings_kernel(1, &bprm->interp, bprm); > if (retval < 0) return retval; > bprm->argc++; > Index: linux/fs/compat.c > =================================================================== > --- linux.orig/fs/compat.c > +++ linux/fs/compat.c > @@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc, > { > struct page *kmapped_page = NULL; > char *kaddr = NULL; > + unsigned long kpos = 0; > int ret; > > while (argc-- > 0) { > @@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc, > unsigned long pos; > > if (get_user(str, argv+argc) || > - !(len = strnlen_user(compat_ptr(str), bprm->p))) { > + !(len = strnlen_user(compat_ptr(str), > MAX_ARG_STRLEN))) { > ret = -EFAULT; > goto out; > } > > - if (bprm->p < len) { > + if (MAX_ARG_STRLEN < len) { > ret = -E2BIG; > goto out; > } > > - bprm->p -= len; > - /* XXX: add architecture specific overflow check here. */ > + /* We're going to work our way backwords. */ > pos = bprm->p; > + str += len; > + bprm->p -= len; > > while (len > 0) { > - int i, new, err; > int offset, bytes_to_copy; > - struct page *page; > > offset = pos % PAGE_SIZE; > - i = pos/PAGE_SIZE; > - page = bprm->page[i]; > - new = 0; > - if (!page) { > - page = alloc_page(GFP_HIGHUSER); > - bprm->page[i] = page; > - if (!page) { > - ret = -ENOMEM; > + if (offset == 0) > + offset = PAGE_SIZE; > + > + bytes_to_copy = offset; > + if (bytes_to_copy > len) > + bytes_to_copy = len; > + > + offset -= bytes_to_copy; > + pos -= bytes_to_copy; > + str -= bytes_to_copy; > + len -= bytes_to_copy; > + > + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { > + struct page *page; > + > + ret = get_user_pages(current, bprm->mm, > pos, > + 1, 1, 1, &page, > NULL); > + if (ret <= 0) { > + /* We've exceed the stack rlimit. > */ > + ret = -E2BIG; > goto out; > } > - new = 1; > - } > > - if (page != kmapped_page) { > - if (kmapped_page) > + if (kmapped_page) { > kunmap(kmapped_page); > + put_page(kmapped_page); > + } > kmapped_page = page; > kaddr = kmap(kmapped_page); > + kpos = pos & PAGE_MASK; > } > - if (new && offset) > - memset(kaddr, 0, offset); > - bytes_to_copy = PAGE_SIZE - offset; > - if (bytes_to_copy > len) { > - bytes_to_copy = len; > - if (new) > - memset(kaddr+offset+len, 0, > - PAGE_SIZE-offset-len); > - } > - err = copy_from_user(kaddr+offset, > compat_ptr(str), > - bytes_to_copy); > - if (err) { > + if (copy_from_user(kaddr+offset, compat_ptr(str), > + bytes_to_copy)) { > ret = -EFAULT; > goto out; > } > - > - pos += bytes_to_copy; > - str += bytes_to_copy; > - len -= bytes_to_copy; > } > } > ret = 0; > out: > - if (kmapped_page) > + if (kmapped_page) { > kunmap(kmapped_page); > - return ret; > -} > - > -#ifdef CONFIG_MMU > - > -#define free_arg_pages(bprm) do { } while (0) > - > -#else > - > -static inline void free_arg_pages(struct linux_binprm *bprm) > -{ > - int i; > - > - for (i = 0; i < MAX_ARG_PAGES; i++) { > - if (bprm->page[i]) > - __free_page(bprm->page[i]); > - bprm->page[i] = NULL; > + put_page(kmapped_page); > } > + return ret; > } > > -#endif /* CONFIG_MMU */ > - > /* > * compat_do_execve() is mostly a copy of do_execve(), with the exception > * that it processes 32 bit argv and envp pointers. > @@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename, > struct linux_binprm *bprm; > struct file *file; > int retval; > - int i; > > retval = -ENOMEM; > bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); > @@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename, > > sched_exec(); > > - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); > bprm->file = file; > bprm->filename = filename; > bprm->interp = filename; > - bprm->mm = mm_alloc(); > - retval = -ENOMEM; > - if (!bprm->mm) > - goto out_file; > > - retval = init_new_context(current, bprm->mm); > - if (retval < 0) > - goto out_mm; > + retval = bprm_mm_init(bprm); > + if (retval) > + goto out_file; > > - bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t)); > + bprm->argc = compat_count(argv, MAX_ARG_STRINGS); > if ((retval = bprm->argc) < 0) > goto out_mm; > > - bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t)); > + bprm->envc = compat_count(envp, MAX_ARG_STRINGS); > if ((retval = bprm->envc) < 0) > goto out_mm; > > @@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename, > if (retval < 0) > goto out; > > - retval = search_binary_handler(bprm, regs); > + retval = search_binary_handler(bprm,regs); > if (retval >= 0) { > - free_arg_pages(bprm); > - > /* execve success */ > security_bprm_free(bprm); > acct_update_integrals(current); > @@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename, > } > > out: > - /* Something went wrong, return the inode and free the argument > pages*/ > - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { > - struct page * page = bprm->page[i]; > - if (page) > - __free_page(page); > - } > - > if (bprm->security) > security_bprm_free(bprm); > > out_mm: > if (bprm->mm) > - mmdrop(bprm->mm); > + mmput (bprm->mm); > > out_file: > if (bprm->file) { > Index: linux/fs/exec.c > =================================================================== > --- linux.orig/fs/exec.c > +++ linux/fs/exec.c > @@ -174,6 +174,79 @@ exit: > goto out; > } > > +#ifdef CONFIG_STACK_GROWSUP > +#error I broke your build because I rearchitected the stack code, and I \ > + don't have access to an architecture where CONFIG_STACK_GROWSUP is > \ > + set. Please fixe this or send me a machine which I can test this > on. \ > + \ > + -- Ollie Wild <aaw@google.com> > +#endif > + > +/* Create a new mm_struct and populate it with a temporary stack > + * vm_area_struct. We don't have enough context at this point to set the > + * stack flags, permissions, and offset, so we use temporary > values. We'll > + * update them later in setup_arg_pages(). */ > +int bprm_mm_init(struct linux_binprm *bprm) > +{ > + int err; > + struct mm_struct *mm = NULL; > + struct vm_area_struct *vma = NULL; > + > + bprm->mm = mm = mm_alloc(); > + err = -ENOMEM; > + if (!mm) > + goto err; > + > + if ((err = init_new_context(current, mm))) > + goto err; > + > + bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); > + err = -ENOMEM; > + if (!vma) > + goto err; > + > + down_write(&mm->mmap_sem); > + { > + vma->vm_mm = mm; > + > + /* Place the stack at the top of user memory. Later, > we'll > + * move this to an appropriate place. We don't use > STACK_TOP > + * because that can depend on attributes which aren't > + * configured yet. */ > + vma->vm_end = TASK_SIZE; > + vma->vm_start = vma->vm_end - PAGE_SIZE; > + > + vma->vm_flags = VM_STACK_FLAGS; > + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; > + if ((err = insert_vm_struct(mm, vma))) { > + up_write(&mm->mmap_sem); > + goto err; > + } > + > + mm->stack_vm = mm->total_vm = 1; > + } > + up_write(&mm->mmap_sem); > + > + bprm->p = vma->vm_end - sizeof(void *); > + > + return 0; > + > +err: > + if (vma) { > + bprm->vma = NULL; > + kmem_cache_free(vm_area_cachep, vma); > + } > + > + if (mm) { > + bprm->mm = NULL; > + mmdrop(mm); > + } > + > + return err; > +} > + > +EXPORT_SYMBOL(bprm_mm_init); > + > /* > * count() counts the number of strings in array ARGV. > */ > @@ -199,15 +272,16 @@ static int count(char __user * __user * > } > > /* > - * 'copy_strings()' copies argument/environment strings from user > - * memory to free pages in kernel mem. These are in a format ready > - * to be put directly into the top of new user memory. > + * 'copy_strings()' copies argument/environment strings from the old > + * processes's memory to the new process's stack. The call to > get_user_pages() > + * ensures the destination page is created and not swapped out. > */ > static int copy_strings(int argc, char __user * __user * argv, > struct linux_binprm *bprm) > { > struct page *kmapped_page = NULL; > char *kaddr = NULL; > + unsigned long kpos = 0; > int ret; > > while (argc-- > 0) { > @@ -216,69 +290,68 @@ static int copy_strings(int argc, char _ > unsigned long pos; > > if (get_user(str, argv+argc) || > - !(len = strnlen_user(str, bprm->p))) { > + !(len = strnlen_user(str, > MAX_ARG_STRLEN))) { > ret = -EFAULT; > goto out; > } > > - if (bprm->p < len) { > + if (MAX_ARG_STRLEN < len) { > ret = -E2BIG; > goto out; > } > > - bprm->p -= len; > - /* XXX: add architecture specific overflow check here. */ > + /* We're going to work our way backwords. */ > pos = bprm->p; > + str += len; > + bprm->p -= len; > > while (len > 0) { > - int i, new, err; > int offset, bytes_to_copy; > - struct page *page; > > offset = pos % PAGE_SIZE; > - i = pos/PAGE_SIZE; > - page = bprm->page[i]; > - new = 0; > - if (!page) { > - page = alloc_page(GFP_HIGHUSER); > - bprm->page[i] = page; > - if (!page) { > - ret = -ENOMEM; > + if (offset == 0) > + offset = PAGE_SIZE; > + > + bytes_to_copy = offset; > + if (bytes_to_copy > len) > + bytes_to_copy = len; > + > + offset -= bytes_to_copy; > + pos -= bytes_to_copy; > + str -= bytes_to_copy; > + len -= bytes_to_copy; > + > + if (!kmapped_page || kpos != (pos & PAGE_MASK)) { > + struct page *page; > + > + ret = get_user_pages(current, bprm->mm, > pos, > + 1, 1, 1, &page, > NULL); > + if (ret <= 0) { > + /* We've exceed the stack rlimit. > */ > + ret = -E2BIG; > goto out; > } > - new = 1; > - } > > - if (page != kmapped_page) { > - if (kmapped_page) > + if (kmapped_page) { > kunmap(kmapped_page); > + put_page(kmapped_page); > + } > kmapped_page = page; > kaddr = kmap(kmapped_page); > + kpos = pos & PAGE_MASK; > } > - if (new && offset) > - memset(kaddr, 0, offset); > - bytes_to_copy = PAGE_SIZE - offset; > - if (bytes_to_copy > len) { > - bytes_to_copy = len; > - if (new) > - memset(kaddr+offset+len, 0, > - PAGE_SIZE-offset-len); > - } > - err = copy_from_user(kaddr+offset, str, > bytes_to_copy); > - if (err) { > + if (copy_from_user(kaddr+offset, str, > bytes_to_copy)) { > ret = -EFAULT; > goto out; > } > - > - pos += bytes_to_copy; > - str += bytes_to_copy; > - len -= bytes_to_copy; > } > } > ret = 0; > out: > - if (kmapped_page) > + if (kmapped_page) { > kunmap(kmapped_page); > + put_page(kmapped_page); > + } > return ret; > } > > @@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char ** > > EXPORT_SYMBOL(copy_strings_kernel); > > -#ifdef CONFIG_MMU > -/* > - * This routine is used to map in a page into an address space: needed by > - * execve() for the initial stack and environment pages. > - * > - * vma->vm_mm->mmap_sem is held for writing. > - */ > -void install_arg_page(struct vm_area_struct *vma, > - struct page *page, unsigned long address) > -{ > - struct mm_struct *mm = vma->vm_mm; > - pte_t * pte; > - spinlock_t *ptl; > - > - if (unlikely(anon_vma_prepare(vma))) > - goto out; > - > - flush_dcache_page(page); > - pte = get_locked_pte(mm, address, &ptl); > - if (!pte) > - goto out; > - if (!pte_none(*pte)) { > - pte_unmap_unlock(pte, ptl); > - goto out; > - } > - inc_mm_counter(mm, anon_rss); > - lru_cache_add_active(page); > - set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte( > - page, vma->vm_page_prot)))); > - page_add_new_anon_rmap(page, vma, address); > - pte_unmap_unlock(pte, ptl); > - > - /* no need for flush_tlb */ > - return; > -out: > - __free_page(page); > - force_sig(SIGKILL, current); > -} > - > #define EXTRA_STACK_VM_PAGES 20 /* random */ > > +/* Finalizes the stack vm_area_struct. The flags and permissions are > updated, > + * the stack is optionally relocated, and some extra space is added. > + */ > int setup_arg_pages(struct linux_binprm *bprm, > unsigned long stack_top, > int executable_stack) > { > - unsigned long stack_base; > - struct vm_area_struct *mpnt; > + unsigned long ret; > + unsigned long stack_base, stack_shift; > struct mm_struct *mm = current->mm; > - int i, ret; > - long arg_size; > > -#ifdef CONFIG_STACK_GROWSUP > - /* Move the argument and environment strings to the bottom of the > - * stack space. > - */ > - int offset, j; > - char *to, *from; > + BUG_ON(stack_top > TASK_SIZE); > + BUG_ON(stack_top & ~PAGE_MASK); > > - /* Start by shifting all the pages down */ > - i = 0; > - for (j = 0; j < MAX_ARG_PAGES; j++) { > - struct page *page = bprm->page[j]; > - if (!page) > - continue; > - bprm->page[i++] = page; > - } > - > - /* Now move them within their pages */ > - offset = bprm->p % PAGE_SIZE; > - to = kmap(bprm->page[0]); > - for (j = 1; j < i; j++) { > - memmove(to, to + offset, PAGE_SIZE - offset); > - from = kmap(bprm->page[j]); > - memcpy(to + PAGE_SIZE - offset, from, offset); > - kunmap(bprm->page[j - 1]); > - to = from; > - } > - memmove(to, to + offset, PAGE_SIZE - offset); > - kunmap(bprm->page[j - 1]); > - > - /* Limit stack size to 1GB */ > - stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; > - if (stack_base > (1 << 30)) > - stack_base = 1 << 30; > - stack_base = PAGE_ALIGN(stack_top - stack_base); > - > - /* Adjust bprm->p to point to the end of the strings. */ > - bprm->p = stack_base + PAGE_SIZE * i - offset; > - > - mm->arg_start = stack_base; > - arg_size = i << PAGE_SHIFT; > - > - /* zero pages that were copied above */ > - while (i < MAX_ARG_PAGES) > - bprm->page[i++] = NULL; > -#else > - stack_base = arch_align_stack(stack_top - > MAX_ARG_PAGES*PAGE_SIZE); > + stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE); > stack_base = PAGE_ALIGN(stack_base); > - bprm->p += stack_base; > - mm->arg_start = bprm->p; > - arg_size = stack_top - (PAGE_MASK & (unsigned long) > mm->arg_start); > -#endif > > - arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; > + stack_shift = (bprm->p & PAGE_MASK) - stack_base; > + BUG_ON(stack_shift < 0); > + bprm->p -= stack_shift; > + mm->arg_start = bprm->p; > > if (bprm->loader) > - bprm->loader += stack_base; > - bprm->exec += stack_base; > - > - mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); > - if (!mpnt) > - return -ENOMEM; > - > - memset(mpnt, 0, sizeof(*mpnt)); > + bprm->loader -= stack_shift; > + bprm->exec -= stack_shift; > > down_write(&mm->mmap_sem); > { > - mpnt->vm_mm = mm; > -#ifdef CONFIG_STACK_GROWSUP > - mpnt->vm_start = stack_base; > - mpnt->vm_end = stack_base + arg_size; > -#else > - mpnt->vm_end = stack_top; > - mpnt->vm_start = mpnt->vm_end - arg_size; > -#endif > + struct vm_area_struct *vma = bprm->vma; > + struct vm_area_struct *prev = NULL; > + unsigned long vm_flags = vma->vm_flags; > + > /* Adjust stack execute permissions; explicitly enable > * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X > * and leave alone (arch default) otherwise. */ > if (unlikely(executable_stack == EXSTACK_ENABLE_X)) > - mpnt->vm_flags = VM_STACK_FLAGS | VM_EXEC; > + vm_flags |= VM_EXEC; > else if (executable_stack == EXSTACK_DISABLE_X) > - mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC; > - else > - mpnt->vm_flags = VM_STACK_FLAGS; > - mpnt->vm_flags |= mm->def_flags; > - mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7]; > - if ((ret = insert_vm_struct(mm, mpnt))) { > + vm_flags &= ~VM_EXEC; > + vm_flags |= mm->def_flags; > + > + ret = mprotect_fixup(vma, &prev, vma->vm_start, > vma->vm_end, > + vm_flags); > + if (ret) { > up_write(&mm->mmap_sem); > - kmem_cache_free(vm_area_cachep, mpnt); > return ret; > } > - mm->stack_vm = mm->total_vm = vma_pages(mpnt); > - } > + BUG_ON(prev != vma); > + > + /* Move stack pages down in memory. */ > + if (stack_shift) { > + /* This should be safe even with overlap because > we > + * are shifting down. */ > + ret = move_vma(vma, vma->vm_start, > + vma->vm_end - vma->vm_start, > + vma->vm_end - vma->vm_start, > + vma->vm_start - stack_shift); > + if (ret & ~PAGE_MASK) { > + up_write(&mm->mmap_sem); > + return ret; > + } > + } > > - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { > - struct page *page = bprm->page[i]; > - if (page) { > - bprm->page[i] = NULL; > - install_arg_page(mpnt, page, stack_base); > + // Expand the stack. > + vma = find_vma(mm, bprm->p); > + BUG_ON(!vma || bprm->p < vma->vm_start); > + if (expand_stack(vma, stack_base - > + EXTRA_STACK_VM_PAGES * PAGE_SIZE)) > { > + up_write(&mm->mmap_sem); > + return -EFAULT; > } > - stack_base += PAGE_SIZE; > } > up_write(&mm->mmap_sem); > > @@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm > > EXPORT_SYMBOL(setup_arg_pages); > > -#define free_arg_pages(bprm) do { } while (0) > - > -#else > - > -static inline void free_arg_pages(struct linux_binprm *bprm) > -{ > - int i; > - > - for (i = 0; i < MAX_ARG_PAGES; i++) { > - if (bprm->page[i]) > - __free_page(bprm->page[i]); > - bprm->page[i] = NULL; > - } > -} > - > -#endif /* CONFIG_MMU */ > - > struct file *open_exec(const char *name) > { > struct nameidata nd; > @@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm * > > EXPORT_SYMBOL(compute_creds); > > -void remove_arg_zero(struct linux_binprm *bprm) > +int remove_arg_zero(struct linux_binprm *bprm) > { > + int ret = 0; > + > if (bprm->argc) { > unsigned long offset; > char * kaddr; > @@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm > continue; > offset = 0; > kunmap_atomic(kaddr, KM_USER0); > + put_page(page); > inside: > - page = bprm->page[bprm->p/PAGE_SIZE]; > + ret = get_user_pages(current, bprm->mm, bprm->p, > + 1, 0, 1, &page, NULL); > + if (ret <= 0) { > + ret = -EFAULT; > + goto out; > + } > kaddr = kmap_atomic(page, KM_USER0); > } > kunmap_atomic(kaddr, KM_USER0); > bprm->argc--; > + ret = 0; > } > + > +out: > + return ret; > } > > EXPORT_SYMBOL(remove_arg_zero); > @@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b > fput(bprm->file); > bprm->file = NULL; > > - loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); > + loader = bprm->vma->vm_end - sizeof(void *); > > file = open_exec("/sbin/loader"); > retval = PTR_ERR(file); > @@ -1134,7 +1124,6 @@ int do_execve(char * filename, > struct linux_binprm *bprm; > struct file *file; > int retval; > - int i; > > retval = -ENOMEM; > bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); > @@ -1148,25 +1137,19 @@ int do_execve(char * filename, > > sched_exec(); > > - bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *); > - > bprm->file = file; > bprm->filename = filename; > bprm->interp = filename; > - bprm->mm = mm_alloc(); > - retval = -ENOMEM; > - if (!bprm->mm) > - goto out_file; > > - retval = init_new_context(current, bprm->mm); > - if (retval < 0) > - goto out_mm; > + retval = bprm_mm_init(bprm); > + if (retval) > + goto out_file; > > - bprm->argc = count(argv, bprm->p / sizeof(void *)); > + bprm->argc = count(argv, MAX_ARG_STRINGS); > if ((retval = bprm->argc) < 0) > goto out_mm; > > - bprm->envc = count(envp, bprm->p / sizeof(void *)); > + bprm->envc = count(envp, MAX_ARG_STRINGS); > if ((retval = bprm->envc) < 0) > goto out_mm; > > @@ -1193,8 +1176,6 @@ int do_execve(char * filename, > > retval = search_binary_handler(bprm,regs); > if (retval >= 0) { > - free_arg_pages(bprm); > - > /* execve success */ > security_bprm_free(bprm); > acct_update_integrals(current); > @@ -1203,19 +1184,12 @@ int do_execve(char * filename, > } > > out: > - /* Something went wrong, return the inode and free the argument > pages*/ > - for (i = 0 ; i < MAX_ARG_PAGES ; i++) { > - struct page * page = bprm->page[i]; > - if (page) > - __free_page(page); > - } > - > if (bprm->security) > security_bprm_free(bprm); > > out_mm: > if (bprm->mm) > - mmdrop(bprm->mm); > + mmput (bprm->mm); > > out_file: > if (bprm->file) { > Index: linux/include/linux/binfmts.h > =================================================================== > --- linux.orig/include/linux/binfmts.h > +++ linux/include/linux/binfmts.h > @@ -5,12 +5,9 @@ > > struct pt_regs; > > -/* > - * MAX_ARG_PAGES defines the number of pages allocated for arguments > - * and envelope for the new program. 32 should suffice, this gives > - * a maximum env+arg of 128kB w/4KB pages! > - */ > -#define MAX_ARG_PAGES 32 > +/* FIXME: Find real limits, or none. */ > +#define MAX_ARG_STRLEN (PAGE_SIZE * 32) > +#define MAX_ARG_STRINGS 0x7FFFFFFF > > /* sizeof(linux_binprm->buf) */ > #define BINPRM_BUF_SIZE 128 > @@ -22,7 +19,7 @@ struct pt_regs; > */ > struct linux_binprm{ > char buf[BINPRM_BUF_SIZE]; > - struct page *page[MAX_ARG_PAGES]; > + struct vm_area_struct *vma; > struct mm_struct *mm; > unsigned long p; /* current top of mem */ > int sh_bang; > @@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_ > extern int unregister_binfmt(struct linux_binfmt *); > > extern int prepare_binprm(struct linux_binprm *); > -extern void remove_arg_zero(struct linux_binprm *); > +extern int __must_check remove_arg_zero(struct linux_binprm *); > extern int search_binary_handler(struct linux_binprm *,struct pt_regs *); > extern int flush_old_exec(struct linux_binprm * bprm); > > @@ -82,6 +79,7 @@ extern int suid_dumpable; > extern int setup_arg_pages(struct linux_binprm * bprm, > unsigned long stack_top, > int executable_stack); > +extern int bprm_mm_init(struct linux_binprm *bprm); > extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm > *bprm); > extern void compute_creds(struct linux_binprm *binprm); > extern int do_coredump(long signr, int exit_code, struct pt_regs * regs); > Index: linux/include/linux/mm.h > =================================================================== > --- linux.orig/include/linux/mm.h > +++ linux/include/linux/mm.h > @@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct > > extern int make_pages_present(unsigned long addr, unsigned long end); > extern int access_process_vm(struct task_struct *tsk, unsigned long addr, > void *buf, int len, int write); > -void install_arg_page(struct vm_area_struct *, struct page *, unsigned > long); > > int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned > long start, > int len, int write, int force, struct page **pages, struct > vm_area_struct **vmas); > @@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page > int set_page_dirty_lock(struct page *page); > int clear_page_dirty_for_io(struct page *page); > > +extern unsigned long move_vma(struct vm_area_struct *vma, > + unsigned long old_addr, unsigned long old_len, > + unsigned long new_len, unsigned long new_addr); > extern unsigned long do_mremap(unsigned long addr, > unsigned long old_len, unsigned long > new_len, > unsigned long flags, unsigned long > new_addr); > +extern int mprotect_fixup(struct vm_area_struct *vma, > + struct vm_area_struct **pprev, unsigned long > start, > + unsigned long end, unsigned long newflags); > > /* > * Prototype to add a shrinker callback for ageable caches. > Index: linux/kernel/auditsc.c > =================================================================== > --- linux.orig/kernel/auditsc.c > +++ linux/kernel/auditsc.c > @@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q > > int audit_bprm(struct linux_binprm *bprm) > { > + /* FIXME: Don't do anything for now until I figure out how to > handle > + * this. With the latest changes, kmalloc could well fail under > good > + * scenarios. */ > +#if 0 > struct audit_aux_data_execve *ax; > struct audit_context *context = current->audit_context; > unsigned long p, next; > @@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm > ax->d.type = AUDIT_EXECVE; > ax->d.next = context->aux; > context->aux = (void *)ax; > +#endif > return 0; > } > > Index: linux/mm/mprotect.c > =================================================================== > --- linux.orig/mm/mprotect.c > +++ linux/mm/mprotect.c > @@ -128,7 +128,7 @@ static void change_protection(struct vm_ > flush_tlb_range(vma, start, end); > } > > -static int > +int > mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, > unsigned long start, unsigned long end, unsigned long newflags) > { > Index: linux/mm/mremap.c > =================================================================== > --- linux.orig/mm/mremap.c > +++ linux/mm/mremap.c > @@ -155,7 +155,7 @@ static unsigned long move_page_tables(st > return len + old_addr - old_end; /* how much done */ > } > > -static unsigned long move_vma(struct vm_area_struct *vma, > +unsigned long move_vma(struct vm_area_struct *vma, > unsigned long old_addr, unsigned long old_len, > unsigned long new_len, unsigned long new_addr) > { > [-- Attachment #1.2: Type: text/html, Size: 94922 bytes --] [-- Attachment #2: Type: text/plain, Size: 169 bytes --] _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar 2006-12-29 20:49 ` Russell King 2007-01-01 6:51 ` Ollie Wild @ 2007-01-01 6:58 ` Ollie Wild 2007-01-01 17:28 ` Pavel Machek 3 siblings, 0 replies; 16+ messages in thread From: Ollie Wild @ 2007-01-01 6:58 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote: > > what is keeping this fix from going upstream? > There are still a couple outstanding issues which need to be resolved before this is ready for inclusion in the mainline kernel. The main one is support for CONFIG_STACK_GROWSUP, which I think is just parisc. I've been meaning to look into this for a while, but I was out of commision for most of November so it got punted to the back burner. I'll try to revisit it soonish. If someone from the parisc-linux list wants to take a look, though, that's fine by me. The other is support for the various executable formats. I've tested elf and script pretty thoroughly, but I'm not sure how to go about testing most of the others -- does anyone use aout anymore? Maybe the solution is just to check it in and wait to see if someone complains. Ollie -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar ` (2 preceding siblings ...) 2007-01-01 6:58 ` Ollie Wild @ 2007-01-01 17:28 ` Pavel Machek 3 siblings, 0 replies; 16+ messages in thread From: Pavel Machek @ 2007-01-01 17:28 UTC (permalink / raw) To: Ingo Molnar Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra Hi! > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - > i can now finally do a "ls -t patches/*.patch" in my patch repository - > something i havent been able to do for years ;-) > > what is keeping this fix from going upstream? +1 I like this. I've been running with MAX_ARG_PAGES raised to insane value, and I'd love to get rid of that hack. Pavel -- Thanks for all the (sleeping) penguins. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [patch] remove MAX_ARG_PAGES 2006-10-24 17:48 ` Ollie Wild 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar @ 2007-01-02 18:18 ` David Howells 1 sibling, 0 replies; 16+ messages in thread From: David Howells @ 2007-01-02 18:18 UTC (permalink / raw) To: Ingo Molnar Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells, Peter Zijlstra Ingo Molnar <mingo@elte.hu> wrote: > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - > i can now finally do a "ls -t patches/*.patch" in my patch repository - > something i havent been able to do for years ;-) I get a NULL kernel pointer reference with this patch with FRV in MMU mode. It's whilst the kernel is attempting to execute init: (gdb) bt #0 0xc0063d54 in do_anonymous_page (mm=0xc05c1a04, vma=0xc05c884c, address=3221225457, page_table=0xdc00bffc, pmd=0xc05c6f00, write_access=0) at pgtable.h:525 #1 0xc006468c in __handle_mm_fault (mm=0xbffffff1, vma=0xc05c884c, address=3691036668, write_access=1) at memory.c:2404 #2 0xc0061c18 in get_user_pages (tsk=0xc034eba0, mm=0xc05c1a04, start=3221225457, len=1, write=1, force=0, pages=0xc0353b80, vmas=0x0) at memory.c:1064 #3 0xc0084c7c in copy_strings (argc=0, argv=0xc0388c3c, bprm=0xc0388b88) at exec.c:326 #4 0xc0084dac in copy_strings_kernel (argc=1, argv=0xc05c884c, bprm=0xbffffff1) at exec.c:365 #5 0xc0085f5c in do_execve (filename=0xc0387000 "/sbin/init", argv=0xc026e8c8, envp=0xc026e950, regs=0xc0353bf8) at exec.c:1158 #6 0xc000f55c in sys_execve (name=0x1 <Address 0x1 out of bounds>, argv=0xc026e8c8, envp=0xc026e950) at process.c:263 #7 0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878 #8 0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878 The problem appears to be that current->mm is NULL. It's a bit difficult to say for sure because gdb or gcc determined that the error is in pgtable.h, but doesn't seem to have determined which one. However, looking at the code, I think it must be this: static int do_anonymous_page(struct mm_struct *mm, ...) { ... update_mmu_cache(vma, address, entry); ... } But that uses current->mm, which is NULL, via this function: [include/asm-frv/pgtable.h] /* * preload information about a newly instantiated PTE into the SCR0/SCR1 PGE cache */ static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t pte) { unsigned long ampr; pgd_t *pge = pgd_offset(current->mm, address); pud_t *pue = pud_offset(pge, address); pmd_t *pme = pmd_offset(pue, address); ampr = pme->ste[0] & 0xffffff00; ampr |= xAMPRx_L | xAMPRx_SS_16Kb | xAMPRx_S | xAMPRx_C | xAMPRx_V; asm volatile("movgs %0,scr0\n" "movgs %0,scr1\n" "movgs %1,dampr4\n" "movgs %1,dampr5\n" : : "r"(address), "r"(ampr) ); } This really oughtn't to be called in this situation, I suspect. I could just skip the operation if current->mm is NULL, but I'm not sure that that's the right thing to do. Maybe I should be using current->active_mm instead. Everything works fine without the patch. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Removing MAX_ARG_PAGES (request for comments/assistance) 2006-10-11 0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild 2006-10-11 8:00 ` Arjan van de Ven 2006-10-11 13:14 ` Peter Zijlstra @ 2007-01-02 17:52 ` David Howells 2 siblings, 0 replies; 16+ messages in thread From: David Howells @ 2007-01-02 17:52 UTC (permalink / raw) To: Ollie Wild Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells Ollie Wild <aaw@google.com> wrote: > - I haven't tested this on a NOMMU architecture. Could someone please > validate this? There are a number of potential problems with NOMMU: (1) The argument data is copied twice (once into kernel memory and once out of kernel memory). (2) The permitted amount of argument data is governed by the stack size of the program to be exec'd. You should assume that NOMMU stacks cannot grow. (3) VMAs on NOMMU are a shared resource. However, we might be able to extend your idea to improve things. If we work out the stack size required earlier, we can allocate the VMA and the memory for the stack *before* we reach the point of no return. We can then fill in the stack and load up all the parameters *before* releasing the original executable. That would eliminate one of the copied mentioned in (1). Working out the stack size earlier may be difficult though, as we may need to load the interpreter header before we can do so. Overall, I don't think there should be too many problems with this for NOMMU. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-01-02 18:18 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-10-11 0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild 2006-10-11 8:00 ` Arjan van de Ven 2006-10-11 17:13 ` Ollie Wild 2006-10-11 13:14 ` Peter Zijlstra 2006-10-11 21:48 ` Ollie Wild 2006-10-24 17:48 ` Ollie Wild 2006-12-29 20:03 ` [patch] remove MAX_ARG_PAGES Ingo Molnar 2006-12-29 20:49 ` Russell King 2006-12-29 21:23 ` Linus Torvalds 2006-12-29 22:20 ` Ingo Molnar 2006-12-29 21:57 ` Ingo Molnar 2007-01-01 6:51 ` Ollie Wild 2007-01-01 6:58 ` Ollie Wild 2007-01-01 17:28 ` Pavel Machek 2007-01-02 18:18 ` David Howells 2007-01-02 17:52 ` Removing MAX_ARG_PAGES (request for comments/assistance) David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox