Re: [PATCH] mm,oom: Teach lockdep about oom_lock.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm,oom: Teach lockdep about oom_lock.
Date: Sat, 9 Mar 2019 15:02:22 +0900	[thread overview]
Message-ID: <dd3c9f12-84e9-7cf8-1d24-02a9cfbcd509@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20190308151327.GU5232@dhcp22.suse.cz>

On 2019/03/08 20:58, Michal Hocko wrote:
> OK, that makes sense to me. I cannot judge the implementation because I
> am not really familiar with lockdep machinery. Could you explain how it
> doesn't trigger for all other allocations?

This is same with why fs_reclaim_acquire()/fs_reclaim_release() doesn't trigger
for all other allocations. Any allocation request which might involve __GFP_FS
reclaim passes "struct lockdep_map __fs_reclaim_map", and lockdep records it.

>
> Also why it is not sufficient to add the lockdep annotation prior to the
> trylock in __alloc_pages_may_oom?

This is same with why fs_reclaim_acquire()/fs_reclaim_release() is called from
prepare_alloc_pages(). If an allocation request which might involve __GFP_FS
__perform_reclaim() succeeded before actually calling __perform_reclaim(), we
fail to pass "struct lockdep_map __fs_reclaim_map" (which makes it difficult to
check whether there is possibility of deadlock). Likewise, if an allocation
request which might call __alloc_pages_may_oom() succeeded before actually
calling __alloc_pages_may_oom(), we fail to pass oom_lock.lockdep_map (which
makes it difficult to check whether there is possibility of deadlock).

Strictly speaking, there is

	if (tsk_is_oom_victim(current) &&
	    (alloc_flags == ALLOC_OOM ||
	     (gfp_mask & __GFP_NOMEMALLOC)))
		goto nopage;

case where failing to hold oom_lock at __alloc_pages_may_oom() does not
cause a problem. But I think that we should not check tsk_is_oom_victim()
at prepare_alloc_pages().

> It would be also great to pull it out of the code flow and hide it
> behind a helper static inline. Something like
> lockdep_track_oom_alloc_reentrant or a like.

OK. Here is v2 patch.



From ec8d0accf15b4566c065ca8c63a4e1185f0a0c78 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 9 Mar 2019 09:55:08 +0900
Subject: [PATCH v2] mm,oom: Teach lockdep about oom_lock.

Since a thread which succeeded to hold oom_lock must not involve blocking
memory allocations, teach lockdep to consider that blocking memory
allocations might wait for oom_lock at as early location as possible, and
teach lockdep to consider that oom_lock is held by mutex_lock() than by
mutex_trylock().

Also, since the OOM killer is disabled until the OOM reaper or exit_mmap()
sets MMF_OOM_SKIP, teach lockdep to consider that oom_lock is held when
__oom_reap_task_mm() is called.

This patch should not cause lockdep splats unless there is somebody doing
dangerous things (e.g. from OOM notifiers, from the OOM reaper).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/linux/oom.h | 16 ++++++++++++++++
 mm/oom_kill.c       |  9 ++++++++-
 mm/page_alloc.c     |  5 +++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index d079920..8544c23 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -56,6 +56,22 @@ struct oom_control {
 
 extern struct mutex oom_lock;
 
+static inline void oom_reclaim_acquire(gfp_t gfp_mask, unsigned int order)
+{
+	if ((gfp_mask & __GFP_DIRECT_RECLAIM) && !(gfp_mask & __GFP_NORETRY) &&
+	    (!(gfp_mask & __GFP_RETRY_MAYFAIL) ||
+	     order <= PAGE_ALLOC_COSTLY_ORDER))
+		mutex_acquire(&oom_lock.dep_map, 0, 0, _THIS_IP_);
+}
+
+static inline void oom_reclaim_release(gfp_t gfp_mask, unsigned int order)
+{
+	if ((gfp_mask & __GFP_DIRECT_RECLAIM) && !(gfp_mask & __GFP_NORETRY) &&
+	    (!(gfp_mask & __GFP_RETRY_MAYFAIL) ||
+	     order <= PAGE_ALLOC_COSTLY_ORDER))
+		mutex_release(&oom_lock.dep_map, 1, _THIS_IP_);
+}
+
 static inline void set_current_oom_origin(void)
 {
 	current->signal->oom_flag_origin = true;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3a24848..11be7da 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -513,6 +513,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 	 */
 	set_bit(MMF_UNSTABLE, &mm->flags);
 
+	oom_reclaim_acquire(GFP_KERNEL, 0);
 	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
 		if (!can_madv_dontneed_vma(vma))
 			continue;
@@ -544,6 +545,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 			tlb_finish_mmu(&tlb, range.start, range.end);
 		}
 	}
+	oom_reclaim_release(GFP_KERNEL, 0);
 
 	return ret;
 }
@@ -1120,8 +1122,13 @@ void pagefault_out_of_memory(void)
 	if (mem_cgroup_oom_synchronize(true))
 		return;
 
-	if (!mutex_trylock(&oom_lock))
+	if (!mutex_trylock(&oom_lock)) {
+		oom_reclaim_acquire(GFP_KERNEL, 0);
+		oom_reclaim_release(GFP_KERNEL, 0);
 		return;
+	}
+	oom_reclaim_release(GFP_KERNEL, 0);
+	oom_reclaim_acquire(GFP_KERNEL, 0);
 	out_of_memory(&oc);
 	mutex_unlock(&oom_lock);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d0fa5b..e8853a19 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3793,6 +3793,8 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 		schedule_timeout_uninterruptible(1);
 		return NULL;
 	}
+	oom_reclaim_release(gfp_mask, order);
+	oom_reclaim_acquire(gfp_mask, order);
 
 	/*
 	 * Go through the zonelist yet one more time, keep very high watermark
@@ -4651,6 +4653,9 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	fs_reclaim_acquire(gfp_mask);
 	fs_reclaim_release(gfp_mask);
 
+	oom_reclaim_acquire(gfp_mask, order);
+	oom_reclaim_release(gfp_mask, order);
+
 	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
 
 	if (should_fail_alloc_page(gfp_mask, order))
-- 
1.8.3.1

next prev parent reply	other threads:[~2019-03-09  6:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-08 10:22 Tetsuo Handa
2019-03-08 11:03 ` Michal Hocko
2019-03-08 11:29   ` Tetsuo Handa
2019-03-08 11:54     ` Michal Hocko
2019-03-08 11:58       ` Michal Hocko
2019-03-08 15:01         ` Peter Zijlstra
2019-03-08 15:13           ` Michal Hocko
2019-03-09  6:02             ` Tetsuo Handa [this message]
2019-03-11 10:30               ` Michal Hocko
2019-03-12 14:06                 ` Tetsuo Handa
2019-03-12 15:31                   ` Michal Hocko
2019-03-14 13:55                     ` Tetsuo Handa
2019-03-12  8:24               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd3c9f12-84e9-7cf8-1d24-02a9cfbcd509@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox