[RFC 4/4] mm, page_alloc: fix premature OOM when racing with cpuset mems update

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [RFC 4/4] mm, page_alloc: fix premature OOM when racing with cpuset mems update
Date: Tue, 17 Jan 2017 23:16:10 +0100	[thread overview]
Message-ID: <20170117221610.22505-5-vbabka@suse.cz> (raw)
In-Reply-To: <20170117221610.22505-1-vbabka@suse.cz>

Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers
OOM killer in few seconds, despite lots of free memory. The test attemps to
repeatedly fault in memory in one process in a cpuset, while changing allowed
nodes of the cpuset between 0 and 1 in another process.

The problem comes from insufficient protection against cpuset changes, which
can cause get_page_from_freelist() to consider all zones as non-eligible due to
nodemask and/or current->mems_allowed. This was masked in the past by
sufficient retries, but since commit 682a3385e773 ("mm, page_alloc: inline the
fast path of the zonelist iterator") we fix the preferred_zoneref once, and
don't iterate the whole zonelist in further attempts.

A previous patch fixed this problem for current->mems_allowed. However, cpuset
changes also update the policy nodemasks. The fix has two parts. We have to
repeat the preferred_zoneref search when we detect cpuset update by way of
seqcount, and we have to check the seqcount before considering OOM.

Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bbc3f015f796..4db451270b08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3534,6 +3534,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	no_progress_loops = 0;
 	compact_priority = DEF_COMPACT_PRIORITY;
 	cpuset_mems_cookie = read_mems_allowed_begin();
+	ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
+					ac->high_zoneidx, ac->nodemask);
+	if (!ac->preferred_zoneref->zone)
+		goto nopage;
+
 
 	/*
 	 * The fast path uses conservative alloc_flags to succeed only until
@@ -3694,6 +3699,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				&compaction_retries))
 		goto retry;
 
+	if (read_mems_allowed_retry(cpuset_mems_cookie))
+		goto retry_cpuset;
+
 	/* Reclaim has failed us, start killing things */
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)
@@ -3789,6 +3797,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (likely(page))
 		goto out;
 
+no_zone:
 	/*
 	 * Runtime PM, block IO and its error handling path can deadlock
 	 * because I/O on the device might not complete.
@@ -3802,13 +3811,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	 * Also recalculate the starting point for the zonelist iterator or
 	 * we could end up iterating over non-eligible zones endlessly.
 	 */
-	if (unlikely(ac.nodemask != nodemask)) {
-no_zone:
+	if (unlikely(ac.nodemask != nodemask))
 		ac.nodemask = nodemask;
-		ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
-						ac.high_zoneidx, ac.nodemask);
-		/* If we have NULL preferred zone, slowpath wll handle that */
-	}
 
 	page = __alloc_pages_slowpath(alloc_mask, order, &ac);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-01-17 22:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-17 22:16 [RFC 0/4] fix premature OOM due to cpuset races Vlastimil Babka
2017-01-17 22:16 ` [RFC 1/4] mm, page_alloc: fix check for NULL preferred_zone Vlastimil Babka
2017-01-18  9:31   ` Michal Hocko
2017-01-18  9:45     ` Vlastimil Babka
2017-01-18  9:53       ` Michal Hocko
2017-01-18  9:45   ` Mel Gorman
2017-01-17 22:16 ` [RFC 2/4] mm, page_alloc: fix fast-path race with cpuset update or removal Vlastimil Babka
2017-01-18  9:34   ` Michal Hocko
2017-01-18  9:46   ` Mel Gorman
2017-01-17 22:16 ` [RFC 3/4] mm, page_alloc: move cpuset seqcount checking to slowpath Vlastimil Babka
2017-01-18  7:22   ` Hillf Danton
2017-01-18  9:26     ` Vlastimil Babka
2017-01-18  9:40   ` Michal Hocko
2017-01-18  9:48     ` Vlastimil Babka
2017-01-18  9:55       ` Michal Hocko
2017-01-18 10:03   ` Mel Gorman
2017-01-17 22:16 ` Vlastimil Babka [this message]
2017-01-18  7:12   ` [RFC 4/4] mm, page_alloc: fix premature OOM when racing with cpuset mems update Hillf Danton
2017-01-18  9:32     ` Vlastimil Babka
2017-01-18 10:08   ` Mel Gorman
2017-01-18  9:19 ` [RFC 0/4] fix premature OOM due to cpuset races Michal Hocko
2017-01-18 16:20 ` [RFC 5/4] mm, page_alloc: fix premature OOM due to vma mempolicy update Vlastimil Babka
2017-01-18 16:23   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170117221610.22505-5-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=gpkulkarni@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox