From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from spaceape11.eur.corp.google.com (spaceape11.eur.corp.google.com [172.28.16.145]) by smtp-out.google.com with ESMTP id mB33vi7v029449 for ; Tue, 2 Dec 2008 19:57:45 -0800 Received: from an-out-0708.google.com (andd14.prod.google.com [10.100.30.14]) by spaceape11.eur.corp.google.com with ESMTP id mB33vgu4027564 for ; Tue, 2 Dec 2008 19:57:42 -0800 Received: by an-out-0708.google.com with SMTP id d14so1359889and.0 for ; Tue, 02 Dec 2008 19:57:41 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20081203111440.1D35.KOSAKI.MOTOHIRO@jp.fujitsu.com> References: <604427e00812021130t1aad58a8j7474258ae33e15a4@mail.gmail.com> <20081203111440.1D35.KOSAKI.MOTOHIRO@jp.fujitsu.com> Date: Tue, 2 Dec 2008 19:57:41 -0800 Message-ID: <604427e00812021957m44549252k21e1b617ba9e78c3@mail.gmail.com> Subject: Re: [PATCH][V6]make get_user_pages interruptible From: Ying Han Content-Type: multipart/alternative; boundary=00504502ccae1c9985045d1c729e Sender: owner-linux-mm@kvack.org Return-Path: To: KOSAKI Motohiro Cc: linux-mm@kvack.org, Andrew Morton , Lee Schermerhorn , Oleg Nesterov , Pekka Enberg , Paul Menage , Rohit Seth List-ID: --00504502ccae1c9985045d1c729e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Tue, Dec 2, 2008 at 6:24 PM, KOSAKI Motohiro < kosaki.motohiro@jp.fujitsu.com> wrote: > Hi! > > Sorry for too late review. > In general, I like this patch. but ... > > >> changelog >> [v6] replace the sigkill_pending() with fatal_signal_pending() >> add the check for cases current != tsk >> >> From: Ying Han >> >> make get_user_pages interruptible >> The initial implementation of checking TIF_MEMDIE covers the cases of OOM >> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it >> return immediately. This patch includes: >> >> 1. add the case that the SIGKILL is sent by user processes. The process can >> try to get_user_pages() unlimited memory even if a user process has sent a >> SIGKILL to it(maybe a monitor find the process exceed its memory limit and >> try to kill it). In the old implementation, the SIGKILL won't be handled >> until the get_user_pages() returns. >> >> 2. change the return value to be ERESTARTSYS. It makes no sense to return >> ENOMEM if the get_user_pages returned by getting a SIGKILL signal. >> Considering the general convention for a system call interrupted by a >> signal is ERESTARTNOSYS, so the current return value is consistant to that. > > this description explain why fatal_signal_pending(current) is needed. > but doesn't explain why fatal_signal_pending(tsk) is needed. There were couple of discussions about adding the fatal_signal_pending(tsk) in the previous versions of this patch, and the reason i added on is to cover the case when the current!=tsk and the caller calls get_user_pages() on behalf of tsk, and we want to interrupt in this case as well. if that sounds a reasonable, i will added in the patch description. > > more unfortunately, this patch break kernel compatibility. > To read /proc file invoke calling get_user_page(). > however, "man 2 read" doesn't describe ERESTARTSYS. yeah, that seems to be right.. > > IOW, this patch can break /proc reading user application. > > May I ask why fatal_signal_pending(tsk) is needed ? > at least, you need to cc to linux-api@vger.kernel.org IMHO. all the problems seems to be caused by the fatal_signal_pending(tsk), i can either make the change like if (fatal_signal_pending(tsk)) return i ? i : EINTR or remove the check for fatal_signal_pending(tsk) which is mainly used in the case you mentioned above. Afterward, the intial point of the patch is to avoid proccess hanging in the mlock (for example) under memory pressure while it has SIGKILL pending. Now sounds to me the second option is better. any comments? --Ying > > Am I talking about pointless? thanks for comments. :-) > > > >> Signed-off-by: Paul Menage >> Signed-off-by: Ying Han >> >> mm/memory.c | 13 ++- >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 164951c..049a4f1 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m >> struct page *page; >> >> /* >> - * If tsk is ooming, cut off its access to large memory >> - * allocations. It has a pending SIGKILL, but it can't >> - * be processed until returning to user space. >> + * If we have a pending SIGKILL, don't keep >> + * allocating memory. We check both current >> + * and tsk to cover the cases where current >> + * is allocating pages on behalf of tsk. >> */ >> - if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) >> - return i ? i : -ENOMEM; >> + if (unlikely(fatal_signal_pending(current) || >> + ((current != tsk) && >> + fatal_signal_pending(tsk)))) >> + return i ? i : -ERESTARTSYS; >> >> if (write) >> foll_flags |= FOLL_WRITE; > > > > > --00504502ccae1c9985045d1c729e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit

On Tue, Dec 2, 2008 at 6:24 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi!
>
> Sorry for too late review.
> In general, I like this patch. but ...
>
>
>> changelog
>> [v6] replace the sigkill_pending() with fatal_signal_pending()
>>       add the check for cases current != tsk
>>
>> From: Ying Han <yinghan@google.com>
>>
>> make get_user_pages interruptible
>> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
>> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
>> return immediately. This patch includes:
>>
>> 1. add the case that the SIGKILL is sent by user processes. The process can
>> try to get_user_pages() unlimited memory even if a user process has sent a
>> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
>> try to kill it). In the old implementation, the SIGKILL won't be handled
>> until the get_user_pages() returns.
>>
>> 2. change the return value to be ERESTARTSYS. It makes no sense to return
>> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
>> Considering the general convention for a system call interrupted by a
>> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>
> this description explain why fatal_signal_pending(current) is needed.
> but doesn't explain why fatal_signal_pending(tsk) is needed.
There were couple of discussions about adding the fatal_signal_pending(tsk) in the previous
versions of this patch, and the reason i added on is to cover the case when the current!=tsk
and the caller calls get_user_pages() on behalf of tsk, and we want to interrupt in this case
as well. if that sounds a reasonable, i will added in the patch description.
>
> more unfortunately, this patch break kernel compatibility.
> To read /proc file invoke calling get_user_page().
> however, "man 2 read" doesn't describe ERESTARTSYS.
yeah, that seems to be right..
>
> IOW, this patch can break /proc reading user application.
>
> May I ask why fatal_signal_pending(tsk) is needed ?
> at least, you need to cc to linux-api@vger.kernel.org IMHO.
all the problems seems to be caused by the fatal_signal_pending(tsk),
i can either make the change like
if (fatal_signal_pending(tsk))
   return i ? i : EINTR

or remove the check for fatal_signal_pending(tsk) which is mainly used in the case you mentioned above. Afterward, the intial point of the patch is to avoid proccess hanging in the mlock (for example) under memory
pressure while it has SIGKILL pending. Now sounds to me the second option is better. any comments?

--Ying
>
> Am I talking about pointless?
thanks for comments. :-)
>
>
>
>> Signed-off-by:        Paul Menage <menage@google.com>
>> Signed-off-by:        Ying Han <yinghan@google.com>
>>
>> mm/memory.c                   |   13 ++-
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 164951c..049a4f1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m
>>                       struct page *page;
>>
>>                       /*
>> -                      * If tsk is ooming, cut off its access to large memory
>> -                      * allocations. It has a pending SIGKILL, but it can't
>> -                      * be processed until returning to user space.
>> +                      * If we have a pending SIGKILL, don't keep
>> +                      * allocating memory. We check both current
>> +                      * and tsk to cover the cases where current
>> +                      * is allocating pages on behalf of tsk.
>>                        */
>> -                     if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
>> -                             return i ? i : -ENOMEM;
>> +                     if (unlikely(fatal_signal_pending(current) ||
>> +                             ((current != tsk) &&
>> +                             fatal_signal_pending(tsk))))
>> +                             return i ? i : -ERESTARTSYS;
>>
>>                       if (write)
>>                               foll_flags |= FOLL_WRITE;
>
>
>
>
>

--00504502ccae1c9985045d1c729e-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org