From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id BBEB26B0007 for ; Fri, 9 Mar 2018 02:11:52 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id u188so1216172pfb.6 for ; Thu, 08 Mar 2018 23:11:52 -0800 (PST) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.29.96]) by mx.google.com with ESMTPS id n6si322711pgc.801.2018.03.08.23.11.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Mar 2018 23:11:51 -0800 (PST) Subject: Re: [PATCH] mm: oom: Fix race condition between oom_badness and do_exit of task References: <1520427454-22813-1-git-send-email-gkohli@codeaurora.org> <22ebd655-ece4-37e5-5a98-e9750cb20665@codeaurora.org> From: "Kohli, Gaurav" Message-ID: <14ba6c44-d444-bd0a-0bac-0c6851b19344@codeaurora.org> Date: Fri, 9 Mar 2018 12:41:44 +0530 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------64960A97E4E99F864BB6B291" Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa , David Rientjes Cc: akpm@linux-foundation.org, mhocko@suse.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org This is a multi-part message in MIME format. --------------64960A97E4E99F864BB6B291 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 3/8/2018 7:35 PM, Tetsuo Handa wrote: > On 2018/03/08 13:51, Kohli, Gaurav wrote: >> On 3/8/2018 2:26 AM, David Rientjes wrote: >> >>> On Wed, 7 Mar 2018, Gaurav Kohli wrote: >>> >>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >>>> index 6fd9773..5f4cc4b 100644 >>>> --- a/mm/oom_kill.c >>>> +++ b/mm/oom_kill.c >>>> @@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct *p) >>>> A A A A A A for_each_thread(p, t) { >>>> A A A A A A A A A task_lock(t); >>>> +A A A A A A A get_task_struct(t); >>>> A A A A A A A A A if (likely(t->mm)) >>>> A A A A A A A A A A A A A goto found; >>>> A A A A A A A A A task_unlock(t); >>>> +A A A A A A A put_task_struct(t); >>>> A A A A A } >>>> A A A A A t = NULL; >>>> A found: >>> We hold rcu_read_lock() here, so perhaps only do get_task_struct() before >>> doing rcu_read_unlock() and we have a non-NULL t? >> Here rcu_read_lock will not help, as our task may change due to below algo: >> >> for_each_thread(p, t) { >> A A A A A A A A task_lock(t); >> +A A A A A A A get_task_struct(t); >> A A A A A A A A if (likely(t->mm)) >> A A A A A A A A A A A A goto found; >> A A A A A A A A task_unlock(t); >> +A A A A A A A put_task_struct(t) >> >> >> So only we can increase usage counter here only at the current task. > static int proc_single_show(struct seq_file *m, void *v) > { > struct inode *inode = m->private; > struct pid_namespace *ns; > struct pid *pid; > struct task_struct *task; > int ret; > > ns = inode->i_sb->s_fs_info; > pid = proc_pid(inode); > task = get_pid_task(pid, PIDTYPE_PID); /* get_task_struct() is called upon success. */ > if (!task) > return -ESRCH; > > ret = PROC_I(inode)->op.proc_show(m, ns, pid, task); > > put_task_struct(task); > return ret; > } > > static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, > struct pid *pid, struct task_struct *task) > { > unsigned long totalpages = totalram_pages + total_swap_pages; > unsigned long points = 0; > > points = oom_badness(task, NULL, NULL, totalpages) * > 1000 / totalpages; /* task->usage > 0 due to proc_single_show() */ > seq_printf(m, "%lu\n", points); > > return 0; > } > > struct task_struct *find_lock_task_mm(struct task_struct *p) /* p->usage > 0 */ > { > struct task_struct *t; > > rcu_read_lock(); > > for_each_thread(p, t) { > task_lock(t); > if (likely(t->mm)) > goto found; > task_unlock(t); > } > t = NULL; > found: > rcu_read_unlock(); > > return t; /* t->usage > 0 even if t != p because t->mm != NULL */ > } > > t->alloc_lock is still held when leaving find_lock_task_mm(), which means > that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at > exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t) > after task_unlock(t) is called. Seems difficult to trigger race window. Maybe > something has preempted because oom_badness() becomes outside of RCU grace > period upon leaving find_lock_task_mm() when called from proc_oom_score(). Hi Tetsuo, Yes it is not easy to reproduce seen twice till now and i agree with your analysis. But David has already fixing this in different way, So that also looks better to me: https://patchwork.kernel.org/patch/10265641/ But if need to keep that code, So we have to bump up the task reference that's only i can think of now. > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. --------------64960A97E4E99F864BB6B291 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

    

    
On 3/8/2018 7:35 PM, Tetsuo Handa wrote:
On 2018/03/08 13:51, Kohli, Gaurav wrote:
On 3/8/2018 2:26 AM, David Rientjes wrote:

On Wed, 7 Mar 2018, Gaurav Kohli wrote:

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 6fd9773..5f4cc4b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -114,9 +114,11 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
A  A A A A A  for_each_thread(p, t) {
A A A A A A A A A  task_lock(t);
+A A A A A A A  get_task_struct(t);
A A A A A A A A A  if (likely(t->mm))
A A A A A A A A A A A A A  goto found;
A A A A A A A A A  task_unlock(t);
+A A A A A A A  put_task_struct(t);
A A A A A  }
A A A A A  t = NULL;
A  found:
We hold rcu_read_lock() here, so perhaps only do get_task_struct() before
doing rcu_read_unlock() and we have a non-NULL t?
Here rcu_read_lock will not help, as our task may change due to below algo:

for_each_thread(p, t) {
A A A A A A A A  task_lock(t);
+A A A A A A A  get_task_struct(t);
A A A A A A A A  if (likely(t->mm))
A A A A A A A A A A A A  goto found;
A A A A A A A A  task_unlock(t);
+A A A A A A A  put_task_struct(t)


So only we can increase usage counter here only at the current task.
static int proc_single_show(struct seq_file *m, void *v)
{
	struct inode *inode = m->private;
	struct pid_namespace *ns;
	struct pid *pid;
	struct task_struct *task;
	int ret;

	ns = inode->i_sb->s_fs_info;
	pid = proc_pid(inode);
	task = get_pid_task(pid, PIDTYPE_PID); /* get_task_struct() is called upon success. */
	if (!task)
		return -ESRCH;

	ret = PROC_I(inode)->op.proc_show(m, ns, pid, task);

	put_task_struct(task);
	return ret;
}

static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
			  struct pid *pid, struct task_struct *task)
{
	unsigned long totalpages = totalram_pages + total_swap_pages;
	unsigned long points = 0;

	points = oom_badness(task, NULL, NULL, totalpages) *
			     1000 / totalpages; /* task->usage > 0 due to proc_single_show() */
	seq_printf(m, "%lu\n", points);

	return 0;
}

struct task_struct *find_lock_task_mm(struct task_struct *p) /* p->usage > 0 */
{
	struct task_struct *t;

	rcu_read_lock();

	for_each_thread(p, t) {
		task_lock(t);
		if (likely(t->mm))
			goto found;
		task_unlock(t);
	}
	t = NULL;
found:
	rcu_read_unlock();

	return t; /* t->usage > 0 even if t != p because t->mm != NULL */
}

t->alloc_lock is still held when leaving find_lock_task_mm(), which means
that t->mm != NULL. But nothing prevents t from setting t->mm = NULL at
exit_mm() from do_exit() and calling exit_creds() from __put_task_struct(t)
after task_unlock(t) is called. Seems difficult to trigger race window. Maybe
something has preempted because oom_badness() becomes outside of RCU grace
period upon leaving find_lock_task_mm() when called from proc_oom_score().
Hi Tetsuo,
Yes it is not easy to reproduce seen twice till now and i agree with your analysis. But David has already fixing this in different way, So that also looks better to me:
https://patchwork.kernel.org/patch/10265641/

    
But if need to keep that code, So we have to bump up the task reference that's only i can think of now.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


    
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
--------------64960A97E4E99F864BB6B291--