[-mm][PATCH 0/2] Memory rlimit fix crash on fork

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [-mm][PATCH 0/2] Memory rlimit fix crash on fork
@ 2008-08-11 10:07 Balbir Singh
  2008-08-11 10:07 ` [-mm][PATCH 1/2] mm owner fix race between swap and exit Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Balbir Singh @ 2008-08-11 10:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	nishimura, Pavel Emelianov, hugh, Balbir Singh, Andrew Morton,
	KAMEZAWA Hiroyuki


This patch fixes a crash that occurs when kernbench is set with memrlimit
set to 500M on my x86_64 box. The root cause for the failure is

1. We don't set mm->mmap to NULL for the process for which fork() failed
2. mmput() dereferences vma (in unmap_vmas, vma->vm_mm).

This patch fixes the problem by

1. Initializing mm->mmap to NULL prior to failing dup_mmap()
2. unmap_vmas() check if mm->mmap is NULL (vma is NULL)
3. Don't uncharge when do_fork() fails in exit_mmap()

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 kernel/fork.c |   19 ++++++++++---------
 mm/memory.c   |    6 +++++-
 mm/mmap.c     |    6 +++++-
 3 files changed, 20 insertions(+), 11 deletions(-)

diff -puN mm/mmap.c~memrlimit-fix-crash-on-fork mm/mmap.c
--- linux-2.6.27-rc1/mm/mmap.c~memrlimit-fix-crash-on-fork	2008-08-11 14:45:07.000000000 +0530
+++ linux-2.6.27-rc1-balbir/mm/mmap.c	2008-08-11 14:57:45.000000000 +0530
@@ -2104,6 +2104,7 @@ void exit_mmap(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
+	bool uncharge_as = true;
 
 	/* mm's last user has gone, and its about to be pulled down */
 	arch_exit_mmap(mm);
@@ -2118,6 +2119,8 @@ void exit_mmap(struct mm_struct *mm)
 		}
 	}
 	vma = mm->mmap;
+	if (!vma)
+		uncharge_as = false;
 	lru_add_drain();
 	flush_cache_mm(mm);
 	tlb = tlb_gather_mmu(mm, 1);
@@ -2125,7 +2128,8 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	memrlimit_cgroup_uncharge_as(mm, mm->total_vm);
+	if (uncharge_as)
+		memrlimit_cgroup_uncharge_as(mm, mm->total_vm);
 	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
 	tlb_finish_mmu(tlb, 0, end);
 
diff -puN kernel/fork.c~memrlimit-fix-crash-on-fork kernel/fork.c
--- linux-2.6.27-rc1/kernel/fork.c~memrlimit-fix-crash-on-fork	2008-08-11 14:45:07.000000000 +0530
+++ linux-2.6.27-rc1-balbir/kernel/fork.c	2008-08-11 14:56:04.000000000 +0530
@@ -274,15 +274,6 @@ static int dup_mmap(struct mm_struct *mm
 	 */
 	down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING);
 
-	/*
-	 * Uncharging as a result of failure is done by mmput()
-	 * in dup_mm()
-	 */
-	if (memrlimit_cgroup_charge_as(oldmm, oldmm->total_vm)) {
-		retval = -ENOMEM;
-		goto out;
-	}
-
 	mm->locked_vm = 0;
 	mm->mmap = NULL;
 	mm->mmap_cache = NULL;
@@ -295,6 +286,16 @@ static int dup_mmap(struct mm_struct *mm
 	rb_parent = NULL;
 	pprev = &mm->mmap;
 
+	/*
+	 * Called after mm->mmap is set to NULL, so that the routines
+	 * following this function understand that fork failed (read
+	 * mmput).
+	 */
+	if (memrlimit_cgroup_charge_as(oldmm, oldmm->total_vm)) {
+		retval = -ENOMEM;
+		goto out;
+	}
+
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
 		struct file *file;
 
diff -puN mm/memory.c~memrlimit-fix-crash-on-fork mm/memory.c
--- linux-2.6.27-rc1/mm/memory.c~memrlimit-fix-crash-on-fork	2008-08-11 14:57:48.000000000 +0530
+++ linux-2.6.27-rc1-balbir/mm/memory.c	2008-08-11 14:58:33.000000000 +0530
@@ -901,8 +901,12 @@ unsigned long unmap_vmas(struct mmu_gath
 	unsigned long start = start_addr;
 	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
 	int fullmm = (*tlbp)->fullmm;
-	struct mm_struct *mm = vma->vm_mm;
+	struct mm_struct *mm;
+
+	if (!vma)
+		return;
 
+	mm = vma->vm_mm;
 	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-11 10:07 [-mm][PATCH 0/2] Memory rlimit fix crash on fork Balbir Singh
@ 2008-08-11 10:07 ` Balbir Singh
  2008-08-12  0:31   ` Andrew Morton
  2008-08-11 10:07 ` [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner Balbir Singh
  2008-08-13  0:14 ` [-mm][PATCH 0/2] Memory rlimit fix crash on fork Andrew Morton
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2008-08-11 10:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	nishimura, Pavel Emelianov, hugh, Balbir Singh, Andrew Morton,
	KAMEZAWA Hiroyuki


Reported-by: Hugh Dickins <hugh@veritas.com>

There's a race between mm->owner assignment and try_to_unuse(). The condition
occurs when try_to_unuse() runs in parallel with an exiting task.

The race can be visualized below. To quote Hugh
"I don't think your careful alternation of CPU0/1 events at the end matters:
the swapoff CPU simply dereferences mm->owner after that task has gone"

But the alteration does help understand the race better (at-least for me :))

CPU0					CPU1
					try_to_unuse
task 1 stars exiting			look at mm = task1->mm
..					increment mm_users
task 1 exits
mm->owner needs to be updated, but
no new owner is found
(mm_users > 1, but no other task
has task->mm = task1->mm)
mm_update_next_owner() leaves

grace period
					user count drops, call mmput(mm)
task 1 freed
					dereferencing mm->owner fails

The fix is to notify the subsystem (via mm_owner_changed callback), if
no new owner is found by specifying the new task as NULL.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 kernel/cgroup.c |    5 +++--
 kernel/exit.c   |   10 ++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff -puN kernel/exit.c~mm-owner-fix-race-with-swap kernel/exit.c
--- linux-2.6.27-rc1/kernel/exit.c~mm-owner-fix-race-with-swap	2008-08-05 10:46:19.000000000 +0530
+++ linux-2.6.27-rc1-balbir/kernel/exit.c	2008-08-05 10:46:19.000000000 +0530
@@ -625,6 +625,16 @@ retry:
 	} while_each_thread(g, c);
 
 	read_unlock(&tasklist_lock);
+	/*
+	 * We found no owner and mm_users > 1, this implies that
+	 * we are most likely racing with swap (try_to_unuse())
+	 * Mark owner as NULL, so that subsystems can understand
+	 * the callback and take action
+	 */
+	down_write(&mm->mmap_sem);
+	mm->owner = NULL;
+	cgroup_mm_owner_callbacks(mm->owner, NULL);
+	up_write(&mm->mmap_sem);
 	return;
 
 assign_new_owner:
diff -L kernel/cgroup/.c -puN /dev/null /dev/null
diff -puN kernel/cgroup.c~mm-owner-fix-race-with-swap kernel/cgroup.c
--- linux-2.6.27-rc1/kernel/cgroup.c~mm-owner-fix-race-with-swap	2008-08-05 10:47:20.000000000 +0530
+++ linux-2.6.27-rc1-balbir/kernel/cgroup.c	2008-08-05 10:47:55.000000000 +0530
@@ -2740,14 +2740,15 @@ void cgroup_fork_callbacks(struct task_s
  */
 void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
 {
-	struct cgroup *oldcgrp, *newcgrp;
+	struct cgroup *oldcgrp, *newcgrp = NULL;
 
 	if (need_mm_owner_callback) {
 		int i;
 		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
 			oldcgrp = task_cgroup(old, ss->subsys_id);
-			newcgrp = task_cgroup(new, ss->subsys_id);
+			if (new)
+				newcgrp = task_cgroup(new, ss->subsys_id);
 			if (oldcgrp == newcgrp)
 				continue;
 			if (ss->mm_owner_changed)
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner
  2008-08-11 10:07 [-mm][PATCH 0/2] Memory rlimit fix crash on fork Balbir Singh
  2008-08-11 10:07 ` [-mm][PATCH 1/2] mm owner fix race between swap and exit Balbir Singh
@ 2008-08-11 10:07 ` Balbir Singh
  2008-08-12  0:39   ` Paul Menage
  2008-08-13  0:14 ` [-mm][PATCH 0/2] Memory rlimit fix crash on fork Andrew Morton
  2 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2008-08-11 10:07 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	nishimura, Pavel Emelianov, hugh, Balbir Singh, Andrew Morton,
	KAMEZAWA Hiroyuki


mm_owner_changed callback can also be called with new task set to NULL.
(race between try_to_unuse() and mm->owner exiting). Surprisingly the order
of cgroup arguments being passed was incorrect (proves that we did not
run into mm_owner_changed callback at all).

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 mm/memrlimitcgroup.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff -puN mm/memrlimitcgroup.c~memrlimit-handle-mm-owner-notification-with-task-null mm/memrlimitcgroup.c
--- linux-2.6.27-rc1/mm/memrlimitcgroup.c~memrlimit-handle-mm-owner-notification-with-task-null	2008-08-05 10:56:56.000000000 +0530
+++ linux-2.6.27-rc1-balbir/mm/memrlimitcgroup.c	2008-08-05 11:24:04.000000000 +0530
@@ -73,6 +73,12 @@ void memrlimit_cgroup_uncharge_as(struct
 {
 	struct memrlimit_cgroup *memrcg;
 
+	/*
+	 * Uncharge happened as a part of the mm_owner_changed callback
+	 */
+	if (!mm->owner)
+		return;
+
 	memrcg = memrlimit_cgroup_from_task(mm->owner);
 	res_counter_uncharge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
 }
@@ -235,8 +241,8 @@ out:
  * This callback is called with mmap_sem held
  */
 static void memrlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
-						struct cgroup *cgrp,
 						struct cgroup *old_cgrp,
+						struct cgroup *cgrp,
 						struct task_struct *p)
 {
 	struct memrlimit_cgroup *memrcg, *old_memrcg;
@@ -246,7 +252,12 @@ static void memrlimit_cgroup_mm_owner_ch
 	memrcg = memrlimit_cgroup_from_cgrp(cgrp);
 	old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
 
-	if (res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
+	/*
+	 * If we don't have a new cgroup, we just uncharge from the old one.
+	 * It means that the task is going away
+	 */
+	if (memrcg &&
+	    res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
 		goto out;
 	res_counter_uncharge(&old_memrcg->as_res, (mm->total_vm << PAGE_SHIFT));
 out:
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-11 10:07 ` [-mm][PATCH 1/2] mm owner fix race between swap and exit Balbir Singh
@ 2008-08-12  0:31   ` Andrew Morton
  2008-08-12  0:43     ` Paul Menage
  2008-08-12  4:06     ` Balbir Singh
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2008-08-12  0:31 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

On Mon, 11 Aug 2008 15:37:33 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> There's a race between mm->owner assignment and try_to_unuse(). The condition
> occurs when try_to_unuse() runs in parallel with an exiting task.
> 
> The race can be visualized below. To quote Hugh
> "I don't think your careful alternation of CPU0/1 events at the end matters:
> the swapoff CPU simply dereferences mm->owner after that task has gone"
> 
> But the alteration does help understand the race better (at-least for me :))
> 
> CPU0					CPU1
> 					try_to_unuse
> task 1 stars exiting			look at mm = task1->mm
> ..					increment mm_users
> task 1 exits
> mm->owner needs to be updated, but
> no new owner is found
> (mm_users > 1, but no other task
> has task->mm = task1->mm)
> mm_update_next_owner() leaves
> 
> grace period
> 					user count drops, call mmput(mm)
> task 1 freed
> 					dereferencing mm->owner fails
> 
> The fix is to notify the subsystem (via mm_owner_changed callback), if
> no new owner is found by specifying the new task as NULL.

This patch applies to mainline, 2.6.27-rc2 and even 2.6.26.

Against which kernel/patch is it actually applicable?

(If the answer was "all of the above" then please don't go embedding
mainline bugfixes in the middle of a -mm-only patch series!)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner
  2008-08-11 10:07 ` [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner Balbir Singh
@ 2008-08-12  0:39   ` Paul Menage
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Menage @ 2008-08-12  0:39 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, lizf, linux-kernel,
	nishimura, Pavel Emelianov, hugh, Andrew Morton,
	KAMEZAWA Hiroyuki

On Mon, Aug 11, 2008 at 3:07 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>
> mm_owner_changed callback can also be called with new task set to NULL.
> (race between try_to_unuse() and mm->owner exiting). Surprisingly the order
> of cgroup arguments being passed was incorrect (proves that we did not
> run into mm_owner_changed callback at all).
>
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Acked-by: Paul Menage <menage@google.com>


> ---
>
>  mm/memrlimitcgroup.c |   15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff -puN mm/memrlimitcgroup.c~memrlimit-handle-mm-owner-notification-with-task-null mm/memrlimitcgroup.c
> --- linux-2.6.27-rc1/mm/memrlimitcgroup.c~memrlimit-handle-mm-owner-notification-with-task-null 2008-08-05 10:56:56.000000000 +0530
> +++ linux-2.6.27-rc1-balbir/mm/memrlimitcgroup.c        2008-08-05 11:24:04.000000000 +0530
> @@ -73,6 +73,12 @@ void memrlimit_cgroup_uncharge_as(struct
>  {
>        struct memrlimit_cgroup *memrcg;
>
> +       /*
> +        * Uncharge happened as a part of the mm_owner_changed callback
> +        */
> +       if (!mm->owner)
> +               return;
> +
>        memrcg = memrlimit_cgroup_from_task(mm->owner);
>        res_counter_uncharge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
>  }
> @@ -235,8 +241,8 @@ out:
>  * This callback is called with mmap_sem held
>  */
>  static void memrlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
> -                                               struct cgroup *cgrp,
>                                                struct cgroup *old_cgrp,
> +                                               struct cgroup *cgrp,
>                                                struct task_struct *p)
>  {
>        struct memrlimit_cgroup *memrcg, *old_memrcg;
> @@ -246,7 +252,12 @@ static void memrlimit_cgroup_mm_owner_ch
>        memrcg = memrlimit_cgroup_from_cgrp(cgrp);
>        old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
>
> -       if (res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
> +       /*
> +        * If we don't have a new cgroup, we just uncharge from the old one.
> +        * It means that the task is going away
> +        */
> +       if (memrcg &&
> +           res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
>                goto out;
>        res_counter_uncharge(&old_memrcg->as_res, (mm->total_vm << PAGE_SHIFT));
>  out:
> _
>
> --
>        Warm Regards,
>        Balbir Singh
>        Linux Technology Center
>        IBM, ISTL
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-12  0:31   ` Andrew Morton
@ 2008-08-12  0:43     ` Paul Menage
  2008-08-12  4:06     ` Balbir Singh
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Menage @ 2008-08-12  0:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, linux-mm, skumar, yamamoto, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

On Mon, Aug 11, 2008 at 5:31 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>> The fix is to notify the subsystem (via mm_owner_changed callback), if
>> no new owner is found by specifying the new task as NULL.
>
> This patch applies to mainline, 2.6.27-rc2 and even 2.6.26.
>
> Against which kernel/patch is it actually applicable?
>
> (If the answer was "all of the above" then please don't go embedding
> mainline bugfixes in the middle of a -mm-only patch series!)

The main thing this fixes is the memrlimit controller, which is only
in -mm. But there's also a dereference of mm->owner in memcontrol.c -
and I think that needs to be fixed to handle a possible NULL mm->owner
too, since in the case of a swapoff racing with the last user of an mm
exiting, I suspect that the swapoff code could try to pull in a page
that gets charged to the mm after its owner has been set to NULL.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-12  0:31   ` Andrew Morton
  2008-08-12  0:43     ` Paul Menage
@ 2008-08-12  4:06     ` Balbir Singh
  2008-08-12  4:56       ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2008-08-12  4:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

Andrew Morton wrote:
> On Mon, 11 Aug 2008 15:37:33 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> There's a race between mm->owner assignment and try_to_unuse(). The condition
>> occurs when try_to_unuse() runs in parallel with an exiting task.
>>
>> The race can be visualized below. To quote Hugh
>> "I don't think your careful alternation of CPU0/1 events at the end matters:
>> the swapoff CPU simply dereferences mm->owner after that task has gone"
>>
>> But the alteration does help understand the race better (at-least for me :))
>>
>> CPU0					CPU1
>> 					try_to_unuse
>> task 1 stars exiting			look at mm = task1->mm
>> ..					increment mm_users
>> task 1 exits
>> mm->owner needs to be updated, but
>> no new owner is found
>> (mm_users > 1, but no other task
>> has task->mm = task1->mm)
>> mm_update_next_owner() leaves
>>
>> grace period
>> 					user count drops, call mmput(mm)
>> task 1 freed
>> 					dereferencing mm->owner fails
>>
>> The fix is to notify the subsystem (via mm_owner_changed callback), if
>> no new owner is found by specifying the new task as NULL.
> 
> This patch applies to mainline, 2.6.27-rc2 and even 2.6.26.
> 
> Against which kernel/patch is it actually applicable?
> 
> (If the answer was "all of the above" then please don't go embedding
> mainline bugfixes in the middle of a -mm-only patch series!)

Andrew,

The answer is all, but the bug is not exposed *outside* of the memrlimit
controller, thus the push into -mm. I can redo and rework the patches for
mainline if required and pull it out of -mm.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-12  4:06     ` Balbir Singh
@ 2008-08-12  4:56       ` Andrew Morton
  2008-08-12  5:04         ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2008-08-12  4:56 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

On Tue, 12 Aug 2008 09:36:36 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> > This patch applies to mainline, 2.6.27-rc2 and even 2.6.26.
> > 
> > Against which kernel/patch is it actually applicable?
> > 
> > (If the answer was "all of the above" then please don't go embedding
> > mainline bugfixes in the middle of a -mm-only patch series!)
> 
> Andrew,
> 
> The answer is all, but the bug is not exposed *outside* of the memrlimit
> controller, thus the push into -mm. I can redo and rework the patches for
> mainline if required and pull it out of -mm.

OK, I'll move it into the general MM patchpile for 2.6.28.  It will precede
any memrlimit merge.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 1/2] mm owner fix race between swap and exit
  2008-08-12  4:56       ` Andrew Morton
@ 2008-08-12  5:04         ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2008-08-12  5:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

Andrew Morton wrote:
> 
> OK, I'll move it into the general MM patchpile for 2.6.28.  It will precede
> any memrlimit merge.

Thanks, sounds good.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 0/2] Memory rlimit fix crash on fork
  2008-08-11 10:07 [-mm][PATCH 0/2] Memory rlimit fix crash on fork Balbir Singh
  2008-08-11 10:07 ` [-mm][PATCH 1/2] mm owner fix race between swap and exit Balbir Singh
  2008-08-11 10:07 ` [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner Balbir Singh
@ 2008-08-13  0:14 ` Andrew Morton
  2008-08-13  1:24   ` Balbir Singh
  2 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2008-08-13  0:14 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

On Mon, 11 Aug 2008 15:37:19 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> --- linux-2.6.27-rc1/mm/memory.c~memrlimit-fix-crash-on-fork	2008-08-11 14:57:48.000000000 +0530
> +++ linux-2.6.27-rc1-balbir/mm/memory.c	2008-08-11 14:58:33.000000000 +0530
> @@ -901,8 +901,12 @@ unsigned long unmap_vmas(struct mmu_gath

^^ returns a long.

>  	unsigned long start = start_addr;
>  	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>  	int fullmm = (*tlbp)->fullmm;
> -	struct mm_struct *mm = vma->vm_mm;
> +	struct mm_struct *mm;
> +
> +	if (!vma)
> +		return;

^^ mm/memory.c:907: warning: 'return' with no value, in function returning non-void

How does this happen?

I'll drop the patch.  The above mystery change needs a comment, IMO.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 0/2] Memory rlimit fix crash on fork
  2008-08-13  0:14 ` [-mm][PATCH 0/2] Memory rlimit fix crash on fork Andrew Morton
@ 2008-08-13  1:24   ` Balbir Singh
  2008-08-13  2:07     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2008-08-13  1:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

Andrew Morton wrote:
> On Mon, 11 Aug 2008 15:37:19 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> --- linux-2.6.27-rc1/mm/memory.c~memrlimit-fix-crash-on-fork	2008-08-11 14:57:48.000000000 +0530
>> +++ linux-2.6.27-rc1-balbir/mm/memory.c	2008-08-11 14:58:33.000000000 +0530
>> @@ -901,8 +901,12 @@ unsigned long unmap_vmas(struct mmu_gath
> 
> ^^ returns a long.
> 
>>  	unsigned long start = start_addr;
>>  	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>>  	int fullmm = (*tlbp)->fullmm;
>> -	struct mm_struct *mm = vma->vm_mm;
>> +	struct mm_struct *mm;
>> +
>> +	if (!vma)
>> +		return;
> 
> ^^ mm/memory.c:907: warning: 'return' with no value, in function returning non-void
> 
> How does this happen?
> 
> I'll drop the patch.  The above mystery change needs a comment, IMO.

Oops.. I'll send the updated version. I'll comment it as well.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [-mm][PATCH 0/2] Memory rlimit fix crash on fork
  2008-08-13  1:24   ` Balbir Singh
@ 2008-08-13  2:07     ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2008-08-13  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	nishimura, xemul, hugh, kamezawa.hiroyu

* Balbir Singh <balbir@linux.vnet.ibm.com> [2008-08-13 06:54:08]:

> Andrew Morton wrote:
> > On Mon, 11 Aug 2008 15:37:19 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> >> --- linux-2.6.27-rc1/mm/memory.c~memrlimit-fix-crash-on-fork	2008-08-11 14:57:48.000000000 +0530
> >> +++ linux-2.6.27-rc1-balbir/mm/memory.c	2008-08-11 14:58:33.000000000 +0530
> >> @@ -901,8 +901,12 @@ unsigned long unmap_vmas(struct mmu_gath
> > 
> > ^^ returns a long.
> > 
> >>  	unsigned long start = start_addr;
> >>  	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> >>  	int fullmm = (*tlbp)->fullmm;
> >> -	struct mm_struct *mm = vma->vm_mm;
> >> +	struct mm_struct *mm;
> >> +
> >> +	if (!vma)
> >> +		return;
> > 
> > ^^ mm/memory.c:907: warning: 'return' with no value, in function returning non-void
> > 
> > How does this happen?
> > 
> > I'll drop the patch.  The above mystery change needs a comment, IMO.
> 
> Oops.. I'll send the updated version. I'll comment it as well.
>

Andrew,

I double checked the compiler warnings this time around and tested the
patch. I've changed the core logic to avoid calling into unmap_vmas
and do an early exit. My understanding is that doing an early exit
should be OK, but I would like you get either you or Hugh or folks
from linux-mm to comment on it and explicitly mention if it is OK to do so.


Changelog v2
------------

Remove changes from unmap_vmas(), don't call the remaining operations
in exit_mmap() if mm->mmap is NULL.

This patch fixes a crash that occurs when kernbench is set with memrlimit
set to 500M on my x86_64 box. The root cause for the failure is

1. We don't set mm->mmap to NULL for the process for which fork() failed
2. mmput() dereferences vma (in unmap_vmas, vma->vm_mm).

This patch fixes the problem by

1. Initializing mm->mmap to NULL prior to failing dup_mmap()
2. Check early if mm->mmap is NULL in exit_mmap() and return

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 kernel/fork.c |   19 ++++++++++---------
 mm/mmap.c     |    9 +++++++++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff -puN mm/mmap.c~memrlimit-fix-crash-on-fork mm/mmap.c
--- linux-2.6.27-rc1/mm/mmap.c~memrlimit-fix-crash-on-fork	2008-08-11 14:45:07.000000000 +0530
+++ linux-2.6.27-rc1-balbir/mm/mmap.c	2008-08-13 07:17:34.000000000 +0530
@@ -2118,6 +2118,15 @@ void exit_mmap(struct mm_struct *mm)
 		}
 	}
 	vma = mm->mmap;
+
+	/*
+	 * In the case that dup_mm() failed, mm->mmap is NULL and
+	 * we never really setup the mm. We don't have much to do,
+	 * we might as well return early
+	 */
+	if (!vma)
+		return;
+
 	lru_add_drain();
 	flush_cache_mm(mm);
 	tlb = tlb_gather_mmu(mm, 1);
diff -puN kernel/fork.c~memrlimit-fix-crash-on-fork kernel/fork.c
--- linux-2.6.27-rc1/kernel/fork.c~memrlimit-fix-crash-on-fork	2008-08-11 14:45:07.000000000 +0530
+++ linux-2.6.27-rc1-balbir/kernel/fork.c	2008-08-11 14:56:04.000000000 +0530
@@ -274,15 +274,6 @@ static int dup_mmap(struct mm_struct *mm
 	 */
 	down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING);
 
-	/*
-	 * Uncharging as a result of failure is done by mmput()
-	 * in dup_mm()
-	 */
-	if (memrlimit_cgroup_charge_as(oldmm, oldmm->total_vm)) {
-		retval = -ENOMEM;
-		goto out;
-	}
-
 	mm->locked_vm = 0;
 	mm->mmap = NULL;
 	mm->mmap_cache = NULL;
@@ -295,6 +286,16 @@ static int dup_mmap(struct mm_struct *mm
 	rb_parent = NULL;
 	pprev = &mm->mmap;
 
+	/*
+	 * Called after mm->mmap is set to NULL, so that the routines
+	 * following this function understand that fork failed (read
+	 * mmput).
+	 */
+	if (memrlimit_cgroup_charge_as(oldmm, oldmm->total_vm)) {
+		retval = -ENOMEM;
+		goto out;
+	}
+
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
 		struct file *file;
 
diff -puN mm/memory.c~memrlimit-fix-crash-on-fork mm/memory.c
_
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-08-13  2:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-11 10:07 [-mm][PATCH 0/2] Memory rlimit fix crash on fork Balbir Singh
2008-08-11 10:07 ` [-mm][PATCH 1/2] mm owner fix race between swap and exit Balbir Singh
2008-08-12  0:31   ` Andrew Morton
2008-08-12  0:43     ` Paul Menage
2008-08-12  4:06     ` Balbir Singh
2008-08-12  4:56       ` Andrew Morton
2008-08-12  5:04         ` Balbir Singh
2008-08-11 10:07 ` [-mm][PATCH 2/2] Memory rlimit enhance mm_owner_changed callback to deal with exited owner Balbir Singh
2008-08-12  0:39   ` Paul Menage
2008-08-13  0:14 ` [-mm][PATCH 0/2] Memory rlimit fix crash on fork Andrew Morton
2008-08-13  1:24   ` Balbir Singh
2008-08-13  2:07     ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox