* [PATCH][V6]make get_user_pages interruptible
@ 2008-12-02 19:30 Ying Han
2008-12-02 21:26 ` Pekka Enberg
2008-12-03 2:24 ` KOSAKI Motohiro
0 siblings, 2 replies; 6+ messages in thread
From: Ying Han @ 2008-12-02 19:30 UTC (permalink / raw)
To: linux-mm, Andrew Morton, Lee Schermerhorn, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
changelog
[v6] replace the sigkill_pending() with fatal_signal_pending()
add the check for cases current != tsk
From: Ying Han <yinghan@google.com>
make get_user_pages interruptible
The initial implementation of checking TIF_MEMDIE covers the cases of OOM
killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
return immediately. This patch includes:
1. add the case that the SIGKILL is sent by user processes. The process can
try to get_user_pages() unlimited memory even if a user process has sent a
SIGKILL to it(maybe a monitor find the process exceed its memory limit and
try to kill it). In the old implementation, the SIGKILL won't be handled
until the get_user_pages() returns.
2. change the return value to be ERESTARTSYS. It makes no sense to return
ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
Considering the general convention for a system call interrupted by a
signal is ERESTARTNOSYS, so the current return value is consistant to that.
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Ying Han <yinghan@google.com>
mm/memory.c | 13 ++-
diff --git a/mm/memory.c b/mm/memory.c
index 164951c..049a4f1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m
struct page *page;
/*
- * If tsk is ooming, cut off its access to large memory
- * allocations. It has a pending SIGKILL, but it can't
- * be processed until returning to user space.
+ * If we have a pending SIGKILL, don't keep
+ * allocating memory. We check both current
+ * and tsk to cover the cases where current
+ * is allocating pages on behalf of tsk.
*/
- if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
- return i ? i : -ENOMEM;
+ if (unlikely(fatal_signal_pending(current) ||
+ ((current != tsk) &&
+ fatal_signal_pending(tsk))))
+ return i ? i : -ERESTARTSYS;
if (write)
foll_flags |= FOLL_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH][V6]make get_user_pages interruptible
2008-12-02 19:30 [PATCH][V6]make get_user_pages interruptible Ying Han
@ 2008-12-02 21:26 ` Pekka Enberg
2008-12-02 22:00 ` Ying Han
2008-12-03 2:24 ` KOSAKI Motohiro
1 sibling, 1 reply; 6+ messages in thread
From: Pekka Enberg @ 2008-12-02 21:26 UTC (permalink / raw)
To: Ying Han
Cc: linux-mm, Andrew Morton, Lee Schermerhorn, Oleg Nesterov,
Paul Menage, Rohit Seth
Ying Han wrote:
> changelog
> [v6] replace the sigkill_pending() with fatal_signal_pending()
> add the check for cases current != tsk
>
> From: Ying Han <yinghan@google.com>
>
> make get_user_pages interruptible
> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> return immediately. This patch includes:
>
> 1. add the case that the SIGKILL is sent by user processes. The process can
> try to get_user_pages() unlimited memory even if a user process has sent a
> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> try to kill it). In the old implementation, the SIGKILL won't be handled
> until the get_user_pages() returns.
>
> 2. change the return value to be ERESTARTSYS. It makes no sense to return
> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> Considering the general convention for a system call interrupted by a
> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>
> Signed-off-by: Paul Menage <menage@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
>
> mm/memory.c | 13 ++-
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 164951c..049a4f1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m
> struct page *page;
>
> /*
> - * If tsk is ooming, cut off its access to large memory
> - * allocations. It has a pending SIGKILL, but it can't
> - * be processed until returning to user space.
> + * If we have a pending SIGKILL, don't keep
> + * allocating memory. We check both current
> + * and tsk to cover the cases where current
> + * is allocating pages on behalf of tsk.
> */
> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
> - return i ? i : -ENOMEM;
> + if (unlikely(fatal_signal_pending(current) ||
> + ((current != tsk) &&
Hmm, do we really need that extra check for current != tsk? If it's a
must, then you probably want to do something like:
if (unlikely(fatal_signal_pending(current))
return i ? i : -ERESTARTSYS;
if (unlikely(current!= tsk && fatal_signal_pending(tsk))
return i ? i : - ERESTARTSYS;
The current form seems just too ugly to live with.
> + fatal_signal_pending(tsk))))
> + return i ? i : -ERESTARTSYS;
>
> if (write)
> foll_flags |= FOLL_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH][V6]make get_user_pages interruptible
2008-12-02 21:26 ` Pekka Enberg
@ 2008-12-02 22:00 ` Ying Han
0 siblings, 0 replies; 6+ messages in thread
From: Ying Han @ 2008-12-02 22:00 UTC (permalink / raw)
To: Pekka Enberg
Cc: linux-mm, Andrew Morton, Lee Schermerhorn, Oleg Nesterov,
Paul Menage, Rohit Seth
On Tue, Dec 2, 2008 at 1:26 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> Ying Han wrote:
>>
>> changelog
>> [v6] replace the sigkill_pending() with fatal_signal_pending()
>> add the check for cases current != tsk
>>
>> From: Ying Han <yinghan@google.com>
>>
>> make get_user_pages interruptible
>> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
>> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
>> return immediately. This patch includes:
>>
>> 1. add the case that the SIGKILL is sent by user processes. The process
>> can
>> try to get_user_pages() unlimited memory even if a user process has sent a
>> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
>> try to kill it). In the old implementation, the SIGKILL won't be handled
>> until the get_user_pages() returns.
>>
>> 2. change the return value to be ERESTARTSYS. It makes no sense to return
>> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
>> Considering the general convention for a system call interrupted by a
>> signal is ERESTARTNOSYS, so the current return value is consistant to
>> that.
>>
>> Signed-off-by: Paul Menage <menage@google.com>
>> Signed-off-by: Ying Han <yinghan@google.com>
>>
>> mm/memory.c | 13 ++-
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 164951c..049a4f1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk,
>> struct m
>> struct page *page;
>>
>> /*
>> - * If tsk is ooming, cut off its access to large
>> memory
>> - * allocations. It has a pending SIGKILL, but it
>> can't
>> - * be processed until returning to user space.
>> + * If we have a pending SIGKILL, don't keep
>> + * allocating memory. We check both current
>> + * and tsk to cover the cases where current
>> + * is allocating pages on behalf of tsk.
>> */
>> - if (unlikely(test_tsk_thread_flag(tsk,
>> TIF_MEMDIE)))
>> - return i ? i : -ENOMEM;
>> + if (unlikely(fatal_signal_pending(current) ||
>> + ((current != tsk) &&
>
> Hmm, do we really need that extra check for current != tsk? If it's a must,
> then you probably want to do something like:
>
> if (unlikely(fatal_signal_pending(current))
> return i ? i : -ERESTARTSYS;
> if (unlikely(current!= tsk && fatal_signal_pending(tsk))
> return i ? i : - ERESTARTSYS;
>
> The current form seems just too ugly to live with.
>
>> + fatal_signal_pending(tsk))))
>> + return i ? i : -ERESTARTSYS;
>>
>> if (write)
>> foll_flags |= FOLL_WRITE;
>
>
hmm, i was thinking since in most cases we have current==tsk, and we
don't want to do the doublecheck for fatal_signal_pending. Thanks
Pekka and i will post the fix as you suggested.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][V6]make get_user_pages interruptible
2008-12-02 19:30 [PATCH][V6]make get_user_pages interruptible Ying Han
2008-12-02 21:26 ` Pekka Enberg
@ 2008-12-03 2:24 ` KOSAKI Motohiro
2008-12-03 3:57 ` Ying Han
1 sibling, 1 reply; 6+ messages in thread
From: KOSAKI Motohiro @ 2008-12-03 2:24 UTC (permalink / raw)
To: Ying Han
Cc: kosaki.motohiro, linux-mm, Andrew Morton, Lee Schermerhorn,
Oleg Nesterov, Pekka Enberg, Paul Menage, Rohit Seth
Hi!
Sorry for too late review.
In general, I like this patch. but ...
> changelog
> [v6] replace the sigkill_pending() with fatal_signal_pending()
> add the check for cases current != tsk
>
> From: Ying Han <yinghan@google.com>
>
> make get_user_pages interruptible
> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
> return immediately. This patch includes:
>
> 1. add the case that the SIGKILL is sent by user processes. The process can
> try to get_user_pages() unlimited memory even if a user process has sent a
> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
> try to kill it). In the old implementation, the SIGKILL won't be handled
> until the get_user_pages() returns.
>
> 2. change the return value to be ERESTARTSYS. It makes no sense to return
> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
> Considering the general convention for a system call interrupted by a
> signal is ERESTARTNOSYS, so the current return value is consistant to that.
this description explain why fatal_signal_pending(current) is needed.
but doesn't explain why fatal_signal_pending(tsk) is needed.
more unfortunately, this patch break kernel compatibility.
To read /proc file invoke calling get_user_page().
however, "man 2 read" doesn't describe ERESTARTSYS.
IOW, this patch can break /proc reading user application.
May I ask why fatal_signal_pending(tsk) is needed ?
at least, you need to cc to linux-api@vger.kernel.org IMHO.
Am I talking about pointless?
> Signed-off-by: Paul Menage <menage@google.com>
> Signed-off-by: Ying Han <yinghan@google.com>
>
> mm/memory.c | 13 ++-
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 164951c..049a4f1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m
> struct page *page;
>
> /*
> - * If tsk is ooming, cut off its access to large memory
> - * allocations. It has a pending SIGKILL, but it can't
> - * be processed until returning to user space.
> + * If we have a pending SIGKILL, don't keep
> + * allocating memory. We check both current
> + * and tsk to cover the cases where current
> + * is allocating pages on behalf of tsk.
> */
> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
> - return i ? i : -ENOMEM;
> + if (unlikely(fatal_signal_pending(current) ||
> + ((current != tsk) &&
> + fatal_signal_pending(tsk))))
> + return i ? i : -ERESTARTSYS;
>
> if (write)
> foll_flags |= FOLL_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][V6]make get_user_pages interruptible
2008-12-03 2:24 ` KOSAKI Motohiro
@ 2008-12-03 3:57 ` Ying Han
2008-12-03 4:17 ` KOSAKI Motohiro
0 siblings, 1 reply; 6+ messages in thread
From: Ying Han @ 2008-12-03 3:57 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: linux-mm, Andrew Morton, Lee Schermerhorn, Oleg Nesterov,
Pekka Enberg, Paul Menage, Rohit Seth
[-- Attachment #1: Type: text/plain, Size: 4116 bytes --]
On Tue, Dec 2, 2008 at 6:24 PM, KOSAKI Motohiro <
kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi!
>
> Sorry for too late review.
> In general, I like this patch. but ...
>
>
>> changelog
>> [v6] replace the sigkill_pending() with fatal_signal_pending()
>> add the check for cases current != tsk
>>
>> From: Ying Han <yinghan@google.com>
>>
>> make get_user_pages interruptible
>> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
>> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
>> return immediately. This patch includes:
>>
>> 1. add the case that the SIGKILL is sent by user processes. The process
can
>> try to get_user_pages() unlimited memory even if a user process has sent
a
>> SIGKILL to it(maybe a monitor find the process exceed its memory limit
and
>> try to kill it). In the old implementation, the SIGKILL won't be handled
>> until the get_user_pages() returns.
>>
>> 2. change the return value to be ERESTARTSYS. It makes no sense to return
>> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
>> Considering the general convention for a system call interrupted by a
>> signal is ERESTARTNOSYS, so the current return value is consistant to
that.
>
> this description explain why fatal_signal_pending(current) is needed.
> but doesn't explain why fatal_signal_pending(tsk) is needed.
There were couple of discussions about adding the fatal_signal_pending(tsk)
in the previous
versions of this patch, and the reason i added on is to cover the case when
the current!=tsk
and the caller calls get_user_pages() on behalf of tsk, and we want to
interrupt in this case
as well. if that sounds a reasonable, i will added in the patch description.
>
> more unfortunately, this patch break kernel compatibility.
> To read /proc file invoke calling get_user_page().
> however, "man 2 read" doesn't describe ERESTARTSYS.
yeah, that seems to be right..
>
> IOW, this patch can break /proc reading user application.
>
> May I ask why fatal_signal_pending(tsk) is needed ?
> at least, you need to cc to linux-api@vger.kernel.org IMHO.
all the problems seems to be caused by the fatal_signal_pending(tsk),
i can either make the change like
if (fatal_signal_pending(tsk))
return i ? i : EINTR
or remove the check for fatal_signal_pending(tsk) which is mainly used in
the case you mentioned above. Afterward, the intial point of the patch is to
avoid proccess hanging in the mlock (for example) under memory
pressure while it has SIGKILL pending. Now sounds to me the second option is
better. any comments?
--Ying
>
> Am I talking about pointless?
thanks for comments. :-)
>
>
>
>> Signed-off-by: Paul Menage <menage@google.com>
>> Signed-off-by: Ying Han <yinghan@google.com>
>>
>> mm/memory.c | 13 ++-
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 164951c..049a4f1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk,
struct m
>> struct page *page;
>>
>> /*
>> - * If tsk is ooming, cut off its access to large
memory
>> - * allocations. It has a pending SIGKILL, but it
can't
>> - * be processed until returning to user space.
>> + * If we have a pending SIGKILL, don't keep
>> + * allocating memory. We check both current
>> + * and tsk to cover the cases where current
>> + * is allocating pages on behalf of tsk.
>> */
>> - if (unlikely(test_tsk_thread_flag(tsk,
TIF_MEMDIE)))
>> - return i ? i : -ENOMEM;
>> + if (unlikely(fatal_signal_pending(current) ||
>> + ((current != tsk) &&
>> + fatal_signal_pending(tsk))))
>> + return i ? i : -ERESTARTSYS;
>>
>> if (write)
>> foll_flags |= FOLL_WRITE;
>
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 6283 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][V6]make get_user_pages interruptible
2008-12-03 3:57 ` Ying Han
@ 2008-12-03 4:17 ` KOSAKI Motohiro
0 siblings, 0 replies; 6+ messages in thread
From: KOSAKI Motohiro @ 2008-12-03 4:17 UTC (permalink / raw)
To: Ying Han
Cc: kosaki.motohiro, linux-mm, Andrew Morton, Lee Schermerhorn,
Oleg Nesterov, Pekka Enberg, Paul Menage, Rohit Seth
> > more unfortunately, this patch break kernel compatibility.
> > To read /proc file invoke calling get_user_page().
> > however, "man 2 read" doesn't describe ERESTARTSYS.
> yeah, that seems to be right..
> >
> > IOW, this patch can break /proc reading user application.
> >
> > May I ask why fatal_signal_pending(tsk) is needed ?
> > at least, you need to cc to linux-api@vger.kernel.org IMHO.
> all the problems seems to be caused by the fatal_signal_pending(tsk),
> i can either make the change like
> if (fatal_signal_pending(tsk))
> return i ? i : EINTR
>
> or remove the check for fatal_signal_pending(tsk) which is mainly used in
> the case you mentioned above. Afterward, the intial point of the patch is to
> avoid proccess hanging in the mlock (for example) under memory
> pressure while it has SIGKILL pending. Now sounds to me the second option is
> better. any comments?
it seems both reasonable.
in my personal feeling, I like simple removing than EINTR.
> > Am I talking about pointless?
> thanks for comments. :-)
Could you please cc to me at posting v7.
maybe, I can ack.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-12-03 4:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-02 19:30 [PATCH][V6]make get_user_pages interruptible Ying Han
2008-12-02 21:26 ` Pekka Enberg
2008-12-02 22:00 ` Ying Han
2008-12-03 2:24 ` KOSAKI Motohiro
2008-12-03 3:57 ` Ying Han
2008-12-03 4:17 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox