* [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
@ 2008-07-03 20:50 Hugh Dickins
2008-07-03 23:01 ` Andrew Morton
2008-07-04 1:49 ` Balbir Singh
0 siblings, 2 replies; 8+ messages in thread
From: Hugh Dickins @ 2008-07-03 20:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: Balbir Singh, linux-kernel, linux-mm
"ps -f" hung after "killall make" of make -j20 kernel builds. It's
generally considered bad manners to down_write something you already
have down_read. exit_mm up_reads before calling mm_update_next_owner,
so I guess exec_mmap can safely do so too. (And with that repositioning
there's not much point in mm_need_new_owner allowing for NULL mm.)
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
Fix to memrlimit-cgroup-mm-owner-callback-changes-to-add-task-info.patch
quite independent of its recent sleeping-inside-spinlock fix; could even
be applied to 2.6.26, though no deadlock there. Gosh, I see those patches
have spawned "Reviewed-by" tags in my name: sorry, no, just "Bug-found-by".
fs/exec.c | 2 +-
kernel/exit.c | 2 --
2 files changed, 1 insertion(+), 3 deletions(-)
--- 2.6.26-rc8-mm1/fs/exec.c 2008-07-03 11:35:20.000000000 +0100
+++ linux/fs/exec.c 2008-07-03 20:27:20.000000000 +0100
@@ -738,11 +738,11 @@ static int exec_mmap(struct mm_struct *m
tsk->active_mm = mm;
activate_mm(active_mm, mm);
task_unlock(tsk);
- mm_update_next_owner(old_mm);
arch_pick_mmap_layout(mm);
if (old_mm) {
up_read(&old_mm->mmap_sem);
BUG_ON(active_mm != old_mm);
+ mm_update_next_owner(old_mm);
mmput(old_mm);
return 0;
}
--- 2.6.26-rc8-mm1/kernel/exit.c 2008-07-03 11:35:37.000000000 +0100
+++ linux/kernel/exit.c 2008-07-03 20:28:35.000000000 +0100
@@ -588,8 +588,6 @@ mm_need_new_owner(struct mm_struct *mm,
* If there are other users of the mm and the owner (us) is exiting
* we need to find a new owner to take on the responsibility.
*/
- if (!mm)
- return 0;
if (atomic_read(&mm->mm_users) <= 1)
return 0;
if (mm->owner != p)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-03 20:50 [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock Hugh Dickins
@ 2008-07-03 23:01 ` Andrew Morton
2008-07-04 1:49 ` Balbir Singh
2008-07-04 1:49 ` Balbir Singh
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2008-07-03 23:01 UTC (permalink / raw)
To: Hugh Dickins; +Cc: balbir, linux-kernel, linux-mm
On Thu, 3 Jul 2008 21:50:31 +0100 (BST)
Hugh Dickins <hugh@veritas.com> wrote:
> "ps -f" hung after "killall make" of make -j20 kernel builds. It's
> generally considered bad manners to down_write something you already
> have down_read. exit_mm up_reads before calling mm_update_next_owner,
> so I guess exec_mmap can safely do so too. (And with that repositioning
> there's not much point in mm_need_new_owner allowing for NULL mm.)
>
thanks
> ---
> Fix to memrlimit-cgroup-mm-owner-callback-changes-to-add-task-info.patch
> quite independent of its recent sleeping-inside-spinlock fix; could even
> be applied to 2.6.26, though no deadlock there. Gosh, I see those patches
> have spawned "Reviewed-by" tags in my name: sorry, no, just "Bug-found-by".
I switched
memrlimit-add-memrlimit-controller-accounting-and-control-memrlimit-improve-fork-and-error-handling.patch
and
memrlimit-cgroup-mm-owner-callback-changes-to-add-task-info-memrlimit-fix-sleep-inside-sleeplock-in-mm_update_next_owner.patch
to Cc:you.
There doesn't seem to have been much discussion regarding your recent
objections to the memrlimit patches. But it caused me to put a big
black mark on them. Perhaps sending it all again would be helpful.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-03 23:01 ` Andrew Morton
@ 2008-07-04 1:49 ` Balbir Singh
2008-07-04 2:01 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2008-07-04 1:49 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-kernel, linux-mm
Andrew Morton wrote:
> There doesn't seem to have been much discussion regarding your recent
> objections to the memrlimit patches. But it caused me to put a big
> black mark on them. Perhaps sending it all again would be helpful.
Black marks are not good, but there have been some silly issues found with them.
I have been addressing/answering concerns raised so far. Would you like me to
fold all patches and fixes and send them out for review again?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-04 1:49 ` Balbir Singh
@ 2008-07-04 2:01 ` Andrew Morton
2008-07-04 3:20 ` Balbir Singh
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2008-07-04 2:01 UTC (permalink / raw)
To: balbir; +Cc: Hugh Dickins, linux-kernel, linux-mm
On Fri, 04 Jul 2008 07:19:45 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Andrew Morton wrote:
> > There doesn't seem to have been much discussion regarding your recent
> > objections to the memrlimit patches. But it caused me to put a big
> > black mark on them. Perhaps sending it all again would be helpful.
>
> Black marks are not good, but there have been some silly issues found with them.
> I have been addressing/answering concerns raised so far. Would you like me to
> fold all patches and fixes and send them out for review again?
>
>
I was referring to the below (which is where the conversation ended).
It questions the basis of the whole feature.
On Wed, 25 Jun 2008 06:31:05 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Hugh Dickins wrote:
>
> ...
>
> > (In passing, I'll add that I'm not a great fan of these memrlimits:
> > to me it's loony to be charging people for virtual address space,
> > it's _virtual_, and process A can have as much as it likes without
> > affecting process B in any way. You're following the lead of RLIMIT_AS,
> > but I've always thought RLIMIT_AS a lame attempt to move into the mmap
> > decade, after RLIMIT_DATA and RLIMIT_STACK no longer made sense.
> >
> > Taking Alan Cox's Committed_AS as a limited resource charged per mm makes
> > much more sense to me: but yes, it's not perfect, and it is a lot harder
> > to get its accounting right, and to maintain that down the line. Okay,
> > you've gone for the easier option of tracking total_vm, getting that
> > right is a more achievable target. And I accept that I may be too
> > pessimistic about it: total_vm may often enough give a rough
> > approximation to something else worth limiting.)
>
> You seem to have read my mind, my motivation for memrlimits is
>
> 1. Administrators to set a limit and be sure that a cgroup cannot consume more
> swap + RSS than the assigned virtual memory limit
> 2. It allows applications to fail gracefully or decide what parts to free up
> to get more memory or change their allocation pattern (a scientific application
> deciding what size of matrix to allocate for example).
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-04 2:01 ` Andrew Morton
@ 2008-07-04 3:20 ` Balbir Singh
2008-07-04 4:27 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2008-07-04 3:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-kernel, linux-mm
Andrew Morton wrote:
> On Fri, 04 Jul 2008 07:19:45 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>> Andrew Morton wrote:
>>> There doesn't seem to have been much discussion regarding your recent
>>> objections to the memrlimit patches. But it caused me to put a big
>>> black mark on them. Perhaps sending it all again would be helpful.
>> Black marks are not good, but there have been some silly issues found with them.
>> I have been addressing/answering concerns raised so far. Would you like me to
>> fold all patches and fixes and send them out for review again?
>>
>>
>
> I was referring to the below (which is where the conversation ended).
>
> It questions the basis of the whole feature.
>
In the email below, I referred to Hugh's comment on tracking total_vm as a more
achievable target and it gives a rough approximation of something worth
limiting. I agree with him on those points and mentioned my motivation for the
memrlimit patchset. We also look forward to enhancing memrlimit to control
mlock'ed pages (as it provides the generic infrastructure to control RLIMIT'ed
resources). Given Hugh's comment, I looked at it from the more positive side
rather the pessimistic angle. I've had discussions along these lines with Paul
Menage and Kamezawa. In the past we've discussed and there are cases where
memrlimit is not useful (large VM allocations with sparse usage), but there are
cases as mentioned below in the motivation for memrlimits as to why and where
they are useful.
If there are suggestions to help improve the feature or provide similar
functionality without the noise; I am all ears
>
> On Wed, 25 Jun 2008 06:31:05 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>> Hugh Dickins wrote:
>>
>> ...
>>
>>> (In passing, I'll add that I'm not a great fan of these memrlimits:
>>> to me it's loony to be charging people for virtual address space,
>>> it's _virtual_, and process A can have as much as it likes without
>>> affecting process B in any way. You're following the lead of RLIMIT_AS,
>>> but I've always thought RLIMIT_AS a lame attempt to move into the mmap
>>> decade, after RLIMIT_DATA and RLIMIT_STACK no longer made sense.
>>>
>>> Taking Alan Cox's Committed_AS as a limited resource charged per mm makes
>>> much more sense to me: but yes, it's not perfect, and it is a lot harder
>>> to get its accounting right, and to maintain that down the line. Okay,
>>> you've gone for the easier option of tracking total_vm, getting that
>>> right is a more achievable target. And I accept that I may be too
>>> pessimistic about it: total_vm may often enough give a rough
>>> approximation to something else worth limiting.)
>> You seem to have read my mind, my motivation for memrlimits is
>>
>> 1. Administrators to set a limit and be sure that a cgroup cannot consume more
>> swap + RSS than the assigned virtual memory limit
>> 2. It allows applications to fail gracefully or decide what parts to free up
>> to get more memory or change their allocation pattern (a scientific application
>> deciding what size of matrix to allocate for example).
>>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-04 3:20 ` Balbir Singh
@ 2008-07-04 4:27 ` Andrew Morton
2008-07-04 7:07 ` Balbir Singh
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2008-07-04 4:27 UTC (permalink / raw)
To: balbir; +Cc: Hugh Dickins, linux-kernel, linux-mm
On Fri, 04 Jul 2008 08:50:47 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > I was referring to the below (which is where the conversation ended).
> >
> > It questions the basis of the whole feature.
> >
>
> In the email below, I referred to Hugh's comment on tracking total_vm as a more
> achievable target and it gives a rough approximation of something worth
> limiting. I agree with him on those points and mentioned my motivation for the
> memrlimit patchset. We also look forward to enhancing memrlimit to control
> mlock'ed pages (as it provides the generic infrastructure to control RLIMIT'ed
> resources). Given Hugh's comment, I looked at it from the more positive side
> rather the pessimistic angle. I've had discussions along these lines with Paul
> Menage and Kamezawa. In the past we've discussed and there are cases where
> memrlimit is not useful (large VM allocations with sparse usage), but there are
> cases as mentioned below in the motivation for memrlimits as to why and where
> they are useful.
>
> If there are suggestions to help improve the feature or provide similar
> functionality without the noise; I am all ears
Well I've never reeeeeeealy understood what the whole feature is for.
+Advantages of providing this feature
+
+1. Control over virtual address space allows for a cgroup to fail gracefully
+ i.e., via a malloc or mmap failure as compared to OOM kill when no
+ pages can be reclaimed.
+2. It provides better control over how many pages can be swapped out when
+ the cgroup goes over its limit. A badly setup cgroup can cause excessive
+ swapping. Providing control over the address space allocations ensures
+ that the system administrator has control over the total swapping that
+ can take place.
umm, OK. I'm not sure _why_ someone would want to do that. Perhaps
some use-cases would help motivate us. Perhaps desriptions of
real-world operational problems would would be improved or solved were
this feature available to the operator.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-04 4:27 ` Andrew Morton
@ 2008-07-04 7:07 ` Balbir Singh
0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2008-07-04 7:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-kernel, linux-mm
Andrew Morton wrote:
> On Fri, 04 Jul 2008 08:50:47 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
>>> I was referring to the below (which is where the conversation ended).
>>>
>>> It questions the basis of the whole feature.
>>>
>> In the email below, I referred to Hugh's comment on tracking total_vm as a more
>> achievable target and it gives a rough approximation of something worth
>> limiting. I agree with him on those points and mentioned my motivation for the
>> memrlimit patchset. We also look forward to enhancing memrlimit to control
>> mlock'ed pages (as it provides the generic infrastructure to control RLIMIT'ed
>> resources). Given Hugh's comment, I looked at it from the more positive side
>> rather the pessimistic angle. I've had discussions along these lines with Paul
>> Menage and Kamezawa. In the past we've discussed and there are cases where
>> memrlimit is not useful (large VM allocations with sparse usage), but there are
>> cases as mentioned below in the motivation for memrlimits as to why and where
>> they are useful.
>>
>> If there are suggestions to help improve the feature or provide similar
>> functionality without the noise; I am all ears
>
> Well I've never reeeeeeealy understood what the whole feature is for.
>
> +Advantages of providing this feature
> +
> +1. Control over virtual address space allows for a cgroup to fail gracefully
> + i.e., via a malloc or mmap failure as compared to OOM kill when no
> + pages can be reclaimed.
> +2. It provides better control over how many pages can be swapped out when
> + the cgroup goes over its limit. A badly setup cgroup can cause excessive
> + swapping. Providing control over the address space allocations ensures
> + that the system administrator has control over the total swapping that
> + can take place.
>
> umm, OK. I'm not sure _why_ someone would want to do that. Perhaps
> some use-cases would help motivate us. Perhaps desriptions of
> real-world operational problems would would be improved or solved were
> this feature available to the operator.
I can go over the use cases and some of the motivation
0. Provide the basic infrastructure for rlimit control for cgroups (mlock comes
to mind right away)
1. Similar to the goals of over commit accounting (although not that granular),
we would like to be able to decide on a per cgroup node, how much to overcommit
the system by
2. With the memory controller in place, a cgroup that exceeds it's limit is sent
to the reclaimer. We swap out pages or OOM the heaviest task in the cgroup. The
swap controller will help, but we want a gentler way of saying "No more virtual
RSS+swap space is available", so I am failing this allocation. The application
can then decide if it can free up some memory now or if it has to fail.
As far as real examples are concerned, I was told (via private communication -
discussion), by a user that scientific jobs can sometimes cause a havoc on
shared systems. They don't have control over how much virtual memory the set of
jobs consume. They would ideally like to be able to provide feedback to the
application about the maximum RSS + Swap that it can consume (case 1). With a
memrlimit address space controller in place, a failed allocation would tell the
jobs to use lesser memory (and potentially take longer) to finish the job,
instead of causing large amounts of swapping or OOM on the system.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock
2008-07-03 20:50 [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock Hugh Dickins
2008-07-03 23:01 ` Andrew Morton
@ 2008-07-04 1:49 ` Balbir Singh
1 sibling, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2008-07-04 1:49 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm
Hugh Dickins wrote:
> "ps -f" hung after "killall make" of make -j20 kernel builds. It's
> generally considered bad manners to down_write something you already
> have down_read. exit_mm up_reads before calling mm_update_next_owner,
> so I guess exec_mmap can safely do so too. (And with that repositioning
> there's not much point in mm_need_new_owner allowing for NULL mm.)
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
Thanks!
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-07-04 7:07 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-03 20:50 [PATCH 2.6.26-rc8-mm1] memrlimit: fix mmap_sem deadlock Hugh Dickins
2008-07-03 23:01 ` Andrew Morton
2008-07-04 1:49 ` Balbir Singh
2008-07-04 2:01 ` Andrew Morton
2008-07-04 3:20 ` Balbir Singh
2008-07-04 4:27 ` Andrew Morton
2008-07-04 7:07 ` Balbir Singh
2008-07-04 1:49 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox