On Tue, Dec 2, 2008 at 6:24 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi!
>
> Sorry for too late review.
> In general, I like this patch. but ...
>
>
>> changelog
>> [v6] replace the sigkill_pending() with fatal_signal_pending()
>>       add the check for cases current != tsk
>>
>> From: Ying Han <yinghan@google.com>
>>
>> make get_user_pages interruptible
>> The initial implementation of checking TIF_MEMDIE covers the cases of OOM
>> killing. If the process has been OOM killed, the TIF_MEMDIE is set and it
>> return immediately. This patch includes:
>>
>> 1. add the case that the SIGKILL is sent by user processes. The process can
>> try to get_user_pages() unlimited memory even if a user process has sent a
>> SIGKILL to it(maybe a monitor find the process exceed its memory limit and
>> try to kill it). In the old implementation, the SIGKILL won't be handled
>> until the get_user_pages() returns.
>>
>> 2. change the return value to be ERESTARTSYS. It makes no sense to return
>> ENOMEM if the get_user_pages returned by getting a SIGKILL signal.
>> Considering the general convention for a system call interrupted by a
>> signal is ERESTARTNOSYS, so the current return value is consistant to that.
>
> this description explain why fatal_signal_pending(current) is needed.
> but doesn't explain why fatal_signal_pending(tsk) is needed.
There were couple of discussions about adding the fatal_signal_pending(tsk) in the previous
versions of this patch, and the reason i added on is to cover the case when the current!=tsk
and the caller calls get_user_pages() on behalf of tsk, and we want to interrupt in this case
as well. if that sounds a reasonable, i will added in the patch description.
>
> more unfortunately, this patch break kernel compatibility.
> To read /proc file invoke calling get_user_page().
> however, "man 2 read" doesn't describe ERESTARTSYS.
yeah, that seems to be right..
>
> IOW, this patch can break /proc reading user application.
>
> May I ask why fatal_signal_pending(tsk) is needed ?
> at least, you need to cc to linux-api@vger.kernel.org IMHO.
all the problems seems to be caused by the fatal_signal_pending(tsk),
i can either make the change like
if (fatal_signal_pending(tsk))
   return i ? i : EINTR

or remove the check for fatal_signal_pending(tsk) which is mainly used in the case you mentioned above. Afterward, the intial point of the patch is to avoid proccess hanging in the mlock (for example) under memory
pressure while it has SIGKILL pending. Now sounds to me the second option is better. any comments?

--Ying
>
> Am I talking about pointless?
thanks for comments. :-)
>
>
>
>> Signed-off-by:        Paul Menage <menage@google.com>
>> Signed-off-by:        Ying Han <yinghan@google.com>
>>
>> mm/memory.c                   |   13 ++-
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 164951c..049a4f1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1218,12 +1218,15 @@ int __get_user_pages(struct task_struct *tsk, struct m
>>                       struct page *page;
>>
>>                       /*
>> -                      * If tsk is ooming, cut off its access to large memory
>> -                      * allocations. It has a pending SIGKILL, but it can't
>> -                      * be processed until returning to user space.
>> +                      * If we have a pending SIGKILL, don't keep
>> +                      * allocating memory. We check both current
>> +                      * and tsk to cover the cases where current
>> +                      * is allocating pages on behalf of tsk.
>>                        */
>> -                     if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
>> -                             return i ? i : -ENOMEM;
>> +                     if (unlikely(fatal_signal_pending(current) ||
>> +                             ((current != tsk) &&
>> +                             fatal_signal_pending(tsk))))
>> +                             return i ? i : -ERESTARTSYS;
>>
>>                       if (write)
>>                               foll_flags |= FOLL_WRITE;
>
>
>
>
>