* [PATCH][V7]make get_user_pages interruptible
@ 2008-12-03 5:17 Ying Han
2008-12-03 7:19 ` KOSAKI Motohiro
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ying Han @ 2008-12-03 5:17 UTC (permalink / raw)
To: linux-mm, Andrew Morton, KOSAKI Motohiro, Lee Schermerhorn,
Oleg Nesterov, Pekka Enberg, Paul Menage, Rohit Seth
From: Ying Han <yinghan@google.com>
make get_user_pages interruptible
The initial implementation of checking TIF_MEMDIE covers the cases of OOM
killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
return immediately. This patch includes:
1. add the case that the SIGKILL is sent by user processes. The process can
try to get_user_pages() unlimited memory even if a user process has sent a
SIGKILL to it(maybe a monitor find the process exceed its memory limit and
try to kill it). In the old implementation, the SIGKILL won't be handled
until the get_user_pages() returns.
2. change the return value to be ERESTARTSYS. It makes no sense to return
ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
Considering the general convention for a system call interrupted by a
signal is ERESTARTNOSYS, so the current return value is consistant to that.
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Ying Han <yinghan@google.com>
mm/memory.c | 9 +-
diff --git a/mm/memory.c b/mm/memory.c
index 164951c..ef1c3f0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1218,12 +1218,11 @@ int __get_user_pages(struct task_struct *tsk, struct m
struct page *page;
/*
- * If tsk is ooming, cut off its access to large memory
- * allocations. It has a pending SIGKILL, but it can't
- * be processed until returning to user space.
+ * If we have a pending SIGKILL, don't keep
+ * allocating memory.
*/
- if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
- return i ? i : -ENOMEM;
+ if (unlikely(fatal_signal_pending(current)))
+ return i ? i : -ERESTARTSYS;
if (write)
foll_flags |= FOLL_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][V7]make get_user_pages interruptible
2008-12-03 5:17 [PATCH][V7]make get_user_pages interruptible Ying Han
@ 2008-12-03 7:19 ` KOSAKI Motohiro
2008-12-03 8:21 ` Pekka Enberg
2008-12-03 15:03 ` Lee Schermerhorn
2008-12-03 20:01 ` [PATCH] mmotm: ignore sigkill in get_user_pages during munlock Lee Schermerhorn
2 siblings, 1 reply; 10+ messages in thread
From: KOSAKI Motohiro @ 2008-12-03 7:19 UTC (permalink / raw)
To: Ying Han
Cc: kosaki.motohiro, linux-mm, Andrew Morton, Lee Schermerhorn,
Oleg Nesterov, Pekka Enberg, Paul Menage, Rohit Seth
> From: Ying Han <yinghan@google.com>
>
> make get_user_pages interruptible
> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> return immediately. This patch includes:
>
> 1. add the case that the SIGKILL is sent by user processes. The process can
> try to get_user_pages() unlimited memory even if a user process has sent a
> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> try to kill it). In the old implementation, the SIGKILL won't be handled
> until the get_user_pages() returns.
>
> 2. change the return value to be ERESTARTSYS. It makes no sense to return
> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> Considering the general convention for a system call interrupted by a
> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>
> Signed-off-by: Paul Menage <menage@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
looks good to me.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Thnaks, Ying!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][V7]make get_user_pages interruptible
2008-12-03 7:19 ` KOSAKI Motohiro
@ 2008-12-03 8:21 ` Pekka Enberg
0 siblings, 0 replies; 10+ messages in thread
From: Pekka Enberg @ 2008-12-03 8:21 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ying Han, linux-mm, Andrew Morton, Lee Schermerhorn,
Oleg Nesterov, Paul Menage, Rohit Seth
On Wed, 2008-12-03 at 16:19 +0900, KOSAKI Motohiro wrote:
> > From: Ying Han <yinghan@google.com>
> >
> > make get_user_pages interruptible
> > The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> > killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> > return immediately. This patch includes:
> >
> > 1. add the case that the SIGKILL is sent by user processes. The process can
> > try to get_user_pages() unlimited memory even if a user process has sent a
> > SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> > try to kill it). In the old implementation, the SIGKILL won't be handled
> > until the get_user_pages() returns.
> >
> > 2. change the return value to be ERESTARTSYS. It makes no sense to return
> > ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> > Considering the general convention for a system call interrupted by a
> > signal is ERESTARTNOSYS, so the current return value is consistant to that.
> >
> > Signed-off-by: Paul Menage <menage@google.com>
> > Signed-off-by: Ying Han <yinghan@google.com>
>
> looks good to me.
>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Looks good to me too.
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][V7]make get_user_pages interruptible
2008-12-03 5:17 [PATCH][V7]make get_user_pages interruptible Ying Han
2008-12-03 7:19 ` KOSAKI Motohiro
@ 2008-12-03 15:03 ` Lee Schermerhorn
2008-12-03 20:25 ` Ying Han
2008-12-03 20:01 ` [PATCH] mmotm: ignore sigkill in get_user_pages during munlock Lee Schermerhorn
2 siblings, 1 reply; 10+ messages in thread
From: Lee Schermerhorn @ 2008-12-03 15:03 UTC (permalink / raw)
To: Ying Han
Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
On Tue, 2008-12-02 at 21:17 -0800, Ying Han wrote:
> From: Ying Han <yinghan@google.com>
>
> make get_user_pages interruptible
> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> return immediately. This patch includes:
>
> 1. add the case that the SIGKILL is sent by user processes. The process can
> try to get_user_pages() unlimited memory even if a user process has sent a
> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> try to kill it). In the old implementation, the SIGKILL won't be handled
> until the get_user_pages() returns.
>
> 2. change the return value to be ERESTARTSYS. It makes no sense to return
> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> Considering the general convention for a system call interrupted by a
> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>
> Signed-off-by: Paul Menage <menage@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
>
<snip>
Couple of things:
* I tested your previous patch [that was "just too ugly to live
with" :)] overnight with my swap/unevictable-lru/mlocked-pages stress
test on both x86_64 and ia64. I replaced the two patches in mmotm
081201 with the "ugly one". Both systems ran for ~16:40 [hh:mm] without
error, before I stopped the tests.
* Your patch--bailing out of get_user_pages() when current has SIGKILL
pending--breaks munlock on exit when SIGKILL is pending. This results
in freeing of mlocked pages [not so bad, I guess] and possibly leaving,
e.g., shared library pages mlocked and unevictable after last VM_LOCKED
vma is removed. I noticed this because SIGKILL is how the test harness
kills off the running tests. I have a patch that fixes this. The
overnight runs included this patch. I'll post it after rebasing and a
quick retest [he says optimistically] on mmotm-081203.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] mmotm: ignore sigkill in get_user_pages during munlock
2008-12-03 5:17 [PATCH][V7]make get_user_pages interruptible Ying Han
2008-12-03 7:19 ` KOSAKI Motohiro
2008-12-03 15:03 ` Lee Schermerhorn
@ 2008-12-03 20:01 ` Lee Schermerhorn
2008-12-04 0:30 ` KOSAKI Motohiro
2 siblings, 1 reply; 10+ messages in thread
From: Lee Schermerhorn @ 2008-12-03 20:01 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Ying Han, KOSAKI Motohiro, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
PATCH ignore sigkill in get_user_pages during munlock
Against: 2.6.28-rc7-mmotm-081203-0150
Fixes: make-get_user_pages-interruptible.patch
An unfortunate side effect of "make-get_user_pages-interruptible"
is that it prevents a SIGKILL'd task from munlock-ing pages that it
had mlocked, resulting in freeing of mlocked pages. Freeing of mlocked
pages, in itself, is not so bad. We just count them now--altho' I
had hoped to remove this stat and add PG_MLOCKED to the free pages
flags check.
However, consider pages in shared libraries mapped by more than one
task that a task mlocked--e.g., via mlockall(). If the task that
mlocked the pages exits via SIGKILL, these pages would be left mlocked
and unevictable.
Proposed fix:
Add another GUP flag to ignore sigkill when calling get_user_pages
from munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS
flag for the same purpose. We are not actually allocating memory in
this case, which "make-get_user_pages-interruptible" intends to avoid.
We're just munlocking pages that are already resident and mapped, and
we're reusing get_user_pages() to access those pages.
?? Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL
into a single flag: GUP_FLAGS_MUNLOCK ???
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/internal.h | 1 +
mm/memory.c | 11 ++++++++---
mm/mlock.c | 9 +++++----
3 files changed, 14 insertions(+), 7 deletions(-)
Index: linux-2.6.28-rc7-mmotm-081203/mm/internal.h
===================================================================
--- linux-2.6.28-rc7-mmotm-081203.orig/mm/internal.h 2008-12-03 14:32:06.000000000 -0500
+++ linux-2.6.28-rc7-mmotm-081203/mm/internal.h 2008-12-03 14:32:08.000000000 -0500
@@ -276,6 +276,7 @@ static inline void mminit_validate_memmo
#define GUP_FLAGS_WRITE 0x1
#define GUP_FLAGS_FORCE 0x2
#define GUP_FLAGS_IGNORE_VMA_PERMISSIONS 0x4
+#define GUP_FLAGS_IGNORE_SIGKILL 0x8
int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int flags,
Index: linux-2.6.28-rc7-mmotm-081203/mm/memory.c
===================================================================
--- linux-2.6.28-rc7-mmotm-081203.orig/mm/memory.c 2008-12-03 14:32:06.000000000 -0500
+++ linux-2.6.28-rc7-mmotm-081203/mm/memory.c 2008-12-03 14:33:46.000000000 -0500
@@ -1197,6 +1197,7 @@ int __get_user_pages(struct task_struct
int write = !!(flags & GUP_FLAGS_WRITE);
int force = !!(flags & GUP_FLAGS_FORCE);
int ignore = !!(flags & GUP_FLAGS_IGNORE_VMA_PERMISSIONS);
+ int ignore_sigkill = !!(flags & GUP_FLAGS_IGNORE_SIGKILL);
if (len <= 0)
return 0;
@@ -1275,10 +1276,14 @@ int __get_user_pages(struct task_struct
struct page *page;
/*
- * If we have a pending SIGKILL, don't keep
- * allocating memory.
+ * If we have a pending SIGKILL, don't keep faulting
+ * pages and potentially allocating memory, unless
+ * current is handling munlock--e.g., on exit. In
+ * that case, we are not allocating memory. Rather,
+ * we're only unlocking already resident/mapped pages.
*/
- if (unlikely(fatal_signal_pending(current)))
+ if (unlikely(!ignore_sigkill &&
+ fatal_signal_pending(current)))
return i ? i : -ERESTARTSYS;
if (write)
Index: linux-2.6.28-rc7-mmotm-081203/mm/mlock.c
===================================================================
--- linux-2.6.28-rc7-mmotm-081203.orig/mm/mlock.c 2008-12-03 14:32:06.000000000 -0500
+++ linux-2.6.28-rc7-mmotm-081203/mm/mlock.c 2008-12-03 14:32:08.000000000 -0500
@@ -173,12 +173,13 @@ static long __mlock_vma_pages_range(stru
(atomic_read(&mm->mm_users) != 0));
/*
- * mlock: don't page populate if page has PROT_NONE permission.
- * munlock: the pages always do munlock althrough
- * its has PROT_NONE permission.
+ * mlock: don't page populate if vma has PROT_NONE permission.
+ * munlock: always do munlock although the vma has PROT_NONE
+ * permission, or SIGKILL is pending.
*/
if (!mlock)
- gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS;
+ gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS |
+ GUP_FLAGS_IGNORE_SIGKILL;
if (vma->vm_flags & VM_WRITE)
gup_flags |= GUP_FLAGS_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][V7]make get_user_pages interruptible
2008-12-03 15:03 ` Lee Schermerhorn
@ 2008-12-03 20:25 ` Ying Han
2008-12-03 20:36 ` Lee Schermerhorn
0 siblings, 1 reply; 10+ messages in thread
From: Ying Han @ 2008-12-03 20:25 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
On Wed, Dec 3, 2008 at 7:03 AM, Lee Schermerhorn
<Lee.Schermerhorn@hp.com> wrote:
> On Tue, 2008-12-02 at 21:17 -0800, Ying Han wrote:
>> From: Ying Han <yinghan@google.com>
>>
>> make get_user_pages interruptible
>> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
>> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
>> return immediately. This patch includes:
>>
>> 1. add the case that the SIGKILL is sent by user processes. The process can
>> try to get_user_pages() unlimited memory even if a user process has sent a
>> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
>> try to kill it). In the old implementation, the SIGKILL won't be handled
>> until the get_user_pages() returns.
>>
>> 2. change the return value to be ERESTARTSYS. It makes no sense to return
>> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
>> Considering the general convention for a system call interrupted by a
>> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>>
>> Signed-off-by: Paul Menage <menage@google.com>
>> Signed-off-by: Ying Han <yinghan@google.com>
>>
> <snip>
>
> Couple of things:
>
> * I tested your previous patch [that was "just too ugly to live
> with" :)] overnight with my swap/unevictable-lru/mlocked-pages stress
> test on both x86_64 and ia64. I replaced the two patches in mmotm
> 081201 with the "ugly one". Both systems ran for ~16:40 [hh:mm] without
> error, before I stopped the tests.
thanks Lee and the "swap/unevictable-lru/mlocked-pages" tests is somewhere
i can access? just curious.
> * Your patch--bailing out of get_user_pages() when current has SIGKILL
> pending--breaks munlock on exit when SIGKILL is pending. This results
> in freeing of mlocked pages [not so bad, I guess] and possibly leaving,
> e.g., shared library pages mlocked and unevictable after last VM_LOCKED
> vma is removed. I noticed this because SIGKILL is how the test harness
> kills off the running tests. I have a patch that fixes this. The
> overnight runs included this patch. I'll post it after rebasing and a
> quick retest [he says optimistically] on mmotm-081203.
sorry not get exactly what you mean it breaks the munlock. :-)
> Lee
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][V7]make get_user_pages interruptible
2008-12-03 20:25 ` Ying Han
@ 2008-12-03 20:36 ` Lee Schermerhorn
0 siblings, 0 replies; 10+ messages in thread
From: Lee Schermerhorn @ 2008-12-03 20:36 UTC (permalink / raw)
To: Ying Han
Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
On Wed, 2008-12-03 at 12:25 -0800, Ying Han wrote:
> On Wed, Dec 3, 2008 at 7:03 AM, Lee Schermerhorn
> <Lee.Schermerhorn@hp.com> wrote:
> > On Tue, 2008-12-02 at 21:17 -0800, Ying Han wrote:
> >> From: Ying Han <yinghan@google.com>
> >>
> >> make get_user_pages interruptible
> >> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> >> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> >> return immediately. This patch includes:
> >>
> >> 1. add the case that the SIGKILL is sent by user processes. The process can
> >> try to get_user_pages() unlimited memory even if a user process has sent a
> >> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> >> try to kill it). In the old implementation, the SIGKILL won't be handled
> >> until the get_user_pages() returns.
> >>
> >> 2. change the return value to be ERESTARTSYS. It makes no sense to return
> >> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> >> Considering the general convention for a system call interrupted by a
> >> signal is ERESTARTNOSYS, so the current return value is consistant to that.
> >>
> >> Signed-off-by: Paul Menage <menage@google.com>
> >> Signed-off-by: Ying Han <yinghan@google.com>
> >>
> > <snip>
> >
> > Couple of things:
> >
> > * I tested your previous patch [that was "just too ugly to live
> > with" :)] overnight with my swap/unevictable-lru/mlocked-pages stress
> > test on both x86_64 and ia64. I replaced the two patches in mmotm
> > 081201 with the "ugly one". Both systems ran for ~16:40 [hh:mm] without
> > error, before I stopped the tests.
> thanks Lee and the "swap/unevictable-lru/mlocked-pages" tests is somewhere
> i can access? just curious.
Take a look in http://free.linux.hp.com/~lts/Temp at the usex-vmstress
README. I'm going to update these "real soon now" with later versions
of the sub-tests. Note that the tarball contains binaries for
convenience. You can grab all of the programs' sources and build
yourself, but I created the x86_64 binary tarball for Rik and Kosaki-san
during the split-lru/unevictable-lru development. Even with the
binaries, there's a fair bit of setup described--sufficiently, I
hope--in the README.
I should add an ia64 tarball as well, as this test seems to sniff out
quite a few vm races and such.
>
> > * Your patch--bailing out of get_user_pages() when current has SIGKILL
> > pending--breaks munlock on exit when SIGKILL is pending. This results
> > in freeing of mlocked pages [not so bad, I guess] and possibly leaving,
> > e.g., shared library pages mlocked and unevictable after last VM_LOCKED
> > vma is removed. I noticed this because SIGKILL is how the test harness
> > kills off the running tests. I have a patch that fixes this. The
> > overnight runs included this patch. I'll post it after rebasing and a
> > quick retest [he says optimistically] on mmotm-081203.
> sorry not get exactly what you mean it breaks the munlock. :-)
See the "ignore sigkill in get_user_pages..." patch I just posted.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mmotm: ignore sigkill in get_user_pages during munlock
2008-12-03 20:01 ` [PATCH] mmotm: ignore sigkill in get_user_pages during munlock Lee Schermerhorn
@ 2008-12-04 0:30 ` KOSAKI Motohiro
2008-12-04 1:19 ` Ying Han
2008-12-04 1:49 ` Lee Schermerhorn
0 siblings, 2 replies; 10+ messages in thread
From: KOSAKI Motohiro @ 2008-12-04 0:30 UTC (permalink / raw)
To: Lee Schermerhorn, Ying Han
Cc: kosaki.motohiro, linux-mm, Andrew Morton, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
> PATCH ignore sigkill in get_user_pages during munlock
>
> Against: 2.6.28-rc7-mmotm-081203-0150
>
> Fixes: make-get_user_pages-interruptible.patch
>
> An unfortunate side effect of "make-get_user_pages-interruptible"
> is that it prevents a SIGKILL'd task from munlock-ing pages that it
> had mlocked, resulting in freeing of mlocked pages. Freeing of mlocked
> pages, in itself, is not so bad. We just count them now--altho' I
> had hoped to remove this stat and add PG_MLOCKED to the free pages
> flags check.
>
> However, consider pages in shared libraries mapped by more than one
> task that a task mlocked--e.g., via mlockall(). If the task that
> mlocked the pages exits via SIGKILL, these pages would be left mlocked
> and unevictable.
Indeed!
Thank your for clarification!
Ying, I'd like to explain unevictable lru design for you a bit more.
__get_user_pages() also called exit(2) path.
do_exit()
exit_mm()
mmput()
exit_mmap()
munlock_vma_pages_all()
munlock_vma_pages_range()
__mlock_vma_pages_range()
__get_user_pages()
__mlock_vma_pages_range() process
(1) grab mlock related pages by __get_user_pages()
(2) isolate the page from lru
(3) the page move to evictable list if possible
if (1) is interupptible, the page left unevictable lru
although the page is not mlocked already.
this feature was introduced 2.6.28-rc1. So I should noticed
at last review, very sorry.
this patch is definitly needed.
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> Proposed fix:
>
> Add another GUP flag to ignore sigkill when calling get_user_pages
> from munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS
> flag for the same purpose. We are not actually allocating memory in
> this case, which "make-get_user_pages-interruptible" intends to avoid.
> We're just munlocking pages that are already resident and mapped, and
> we're reusing get_user_pages() to access those pages.
>
> ?? Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL
> into a single flag: GUP_FLAGS_MUNLOCK ???
In my personal feeling, I like current two flags :)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mmotm: ignore sigkill in get_user_pages during munlock
2008-12-04 0:30 ` KOSAKI Motohiro
@ 2008-12-04 1:19 ` Ying Han
2008-12-04 1:49 ` Lee Schermerhorn
1 sibling, 0 replies; 10+ messages in thread
From: Ying Han @ 2008-12-04 1:19 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Lee Schermerhorn, linux-mm, Andrew Morton, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
On Wed, Dec 3, 2008 at 4:30 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> PATCH ignore sigkill in get_user_pages during munlock
>>
>> Against: 2.6.28-rc7-mmotm-081203-0150
>>
>> Fixes: make-get_user_pages-interruptible.patch
>>
>> An unfortunate side effect of "make-get_user_pages-interruptible"
>> is that it prevents a SIGKILL'd task from munlock-ing pages that it
>> had mlocked, resulting in freeing of mlocked pages. Freeing of mlocked
>> pages, in itself, is not so bad. We just count them now--altho' I
>> had hoped to remove this stat and add PG_MLOCKED to the free pages
>> flags check.
>>
>> However, consider pages in shared libraries mapped by more than one
>> task that a task mlocked--e.g., via mlockall(). If the task that
>> mlocked the pages exits via SIGKILL, these pages would be left mlocked
>> and unevictable.
>
> Indeed!
> Thank your for clarification!
>
> Ying, I'd like to explain unevictable lru design for you a bit more.
>
> __get_user_pages() also called exit(2) path.
>
> do_exit()
> exit_mm()
> mmput()
> exit_mmap()
> munlock_vma_pages_all()
> munlock_vma_pages_range()
> __mlock_vma_pages_range()
> __get_user_pages()
>
> __mlock_vma_pages_range() process
> (1) grab mlock related pages by __get_user_pages()
> (2) isolate the page from lru
> (3) the page move to evictable list if possible
>
> if (1) is interupptible, the page left unevictable lru
> although the page is not mlocked already.
Thanks KOSAKI, that is clear now. and thanks Lee for the patch. :-)
--Ying
>
> this feature was introduced 2.6.28-rc1. So I should noticed
> at last review, very sorry.
>
>
> this patch is definitly needed.
>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
>>
>> Proposed fix:
>>
>> Add another GUP flag to ignore sigkill when calling get_user_pages
>> from munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS
>> flag for the same purpose. We are not actually allocating memory in
>> this case, which "make-get_user_pages-interruptible" intends to avoid.
>> We're just munlocking pages that are already resident and mapped, and
>> we're reusing get_user_pages() to access those pages.
>>
>> ?? Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL
>> into a single flag: GUP_FLAGS_MUNLOCK ???
>
> In my personal feeling, I like current two flags :)
>
>
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mmotm: ignore sigkill in get_user_pages during munlock
2008-12-04 0:30 ` KOSAKI Motohiro
2008-12-04 1:19 ` Ying Han
@ 2008-12-04 1:49 ` Lee Schermerhorn
1 sibling, 0 replies; 10+ messages in thread
From: Lee Schermerhorn @ 2008-12-04 1:49 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Ying Han, linux-mm, Andrew Morton, Oleg Nesterov, Pekka Enberg,
Paul Menage, Rohit Seth
On Thu, 2008-12-04 at 09:30 +0900, KOSAKI Motohiro wrote:
> > PATCH ignore sigkill in get_user_pages during munlock
> >
> > Against: 2.6.28-rc7-mmotm-081203-0150
> >
> > Fixes: make-get_user_pages-interruptible.patch
> >
> > An unfortunate side effect of "make-get_user_pages-interruptible"
> > is that it prevents a SIGKILL'd task from munlock-ing pages that it
> > had mlocked, resulting in freeing of mlocked pages. Freeing of mlocked
> > pages, in itself, is not so bad. We just count them now--altho' I
> > had hoped to remove this stat and add PG_MLOCKED to the free pages
> > flags check.
> >
> > However, consider pages in shared libraries mapped by more than one
> > task that a task mlocked--e.g., via mlockall(). If the task that
> > mlocked the pages exits via SIGKILL, these pages would be left mlocked
> > and unevictable.
>
> Indeed!
> Thank your for clarification!
>
> Ying, I'd like to explain unevictable lru design for you a bit more.
>
> __get_user_pages() also called exit(2) path.
>
> do_exit()
> exit_mm()
> mmput()
> exit_mmap()
> munlock_vma_pages_all()
> munlock_vma_pages_range()
> __mlock_vma_pages_range()
> __get_user_pages()
>
> __mlock_vma_pages_range() process
> (1) grab mlock related pages by __get_user_pages()
> (2) isolate the page from lru
> (3) the page move to evictable list if possible
>
> if (1) is interupptible, the page left unevictable lru
> although the page is not mlocked already.
>
>
> this feature was introduced 2.6.28-rc1. So I should noticed
> at last review, very sorry.
>
>
> this patch is definitly needed.
>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> >
> > Proposed fix:
> >
> > Add another GUP flag to ignore sigkill when calling get_user_pages
> > from munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS
> > flag for the same purpose. We are not actually allocating memory in
> > this case, which "make-get_user_pages-interruptible" intends to avoid.
> > We're just munlocking pages that are already resident and mapped, and
> > we're reusing get_user_pages() to access those pages.
> >
> > ?? Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL
> > into a single flag: GUP_FLAGS_MUNLOCK ???
>
> In my personal feeling, I like current two flags :)
I tend to agree with you, that the two different flags makes it clearer
what's happening. And, we may find more reasons to ignore SIGKILL in
get_user_pages() as we test more. I only brought up the possibility of
combining the flags as it does add code, and both flags are currently
only used by munlock.
Thanks for reviewing,
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-12-04 1:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-03 5:17 [PATCH][V7]make get_user_pages interruptible Ying Han
2008-12-03 7:19 ` KOSAKI Motohiro
2008-12-03 8:21 ` Pekka Enberg
2008-12-03 15:03 ` Lee Schermerhorn
2008-12-03 20:25 ` Ying Han
2008-12-03 20:36 ` Lee Schermerhorn
2008-12-03 20:01 ` [PATCH] mmotm: ignore sigkill in get_user_pages during munlock Lee Schermerhorn
2008-12-04 0:30 ` KOSAKI Motohiro
2008-12-04 1:19 ` Ying Han
2008-12-04 1:49 ` Lee Schermerhorn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox