linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Greg Thelen <gthelen@google.com>
Cc: kosaki.motohiro@jp.fujitsu.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>,
	linux-mm@kvack.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>
Subject: [PATCH] vmscan: Fix do_try_to_free_pages() return value when priority==0 reclaim failure
Date: Tue,  1 Jun 2010 12:29:41 +0900 (JST)	[thread overview]
Message-ID: <20100601122140.2436.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <xr93sk57yl9o.fsf@ninji.mtv.corp.google.com>

CC to memcg folks.

> I agree with the direction of this patch, but I am seeing a hang when
> testing with mmotm-2010-05-21-16-05.  The following test hangs, unless I
> remove this patch from mmotm:
>   mount -t cgroup none /cgroups -o memory
>   mkdir /cgroups/cg1
>   echo $$ > /cgroups/cg1/tasks
>   dd bs=1024 count=1024 if=/dev/null of=/data/foo
>   echo $$ > /cgroups/tasks
>   echo 1 > /cgroups/cg1/memory.force_empty
> 
> I think the hang is caused by the following portion of
> mem_cgroup_force_empty():
> 	while (nr_retries && mem->res.usage > 0) {
> 		int progress;
> 
> 		if (signal_pending(current)) {
> 			ret = -EINTR;
> 			goto out;
> 		}
> 		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
> 						false, get_swappiness(mem));
> 		if (!progress) {
> 			nr_retries--;
> 			/* maybe some writeback is necessary */
> 			congestion_wait(BLK_RW_ASYNC, HZ/10);
> 		}
> 
> 	}
> 
> With this patch applied, it is possible that when do_try_to_free_pages()
> calls shrink_zones() for priority 0 that shrink_zones() may return 1
> indicating progress, even though no pages may have been reclaimed.
> Because this is a cgroup operation, scanning_global_lru() is false and
> the following portion of do_try_to_free_pages() fails to set ret=0.
> > 	if (ret && scanning_global_lru(sc))
> >  		ret = sc->nr_reclaimed;
> This leaves ret=1 indicating that do_try_to_free_pages() reclaimed 1
> page even though it did not reclaim any pages.  Therefore
> mem_cgroup_force_empty() erroneously believes that
> try_to_free_mem_cgroup_pages() is making progress (one page at a time),
> so there is an endless loop.

Good catch!

Yeah, your analysis is fine. thank you for both your testing and
making analysis.

Unfortunatelly, this logic need more fix. because It have already been
corrupted by another regression. my point is, if priority==0 reclaim 
failure occur, "ret = sc->nr_reclaimed" makes no sense at all.

The fixing patch is here. What do you think?



From 49a395b21fe1b2f864112e71d027ffcafbdc9fc0 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 1 Jun 2010 11:29:50 +0900
Subject: [PATCH] vmscan: Fix do_try_to_free_pages() return value when priority==0 reclaim failure

Greg Thelen reported recent Johannes's stack diet patch makes kernel
hang. His test is following.

  mount -t cgroup none /cgroups -o memory
  mkdir /cgroups/cg1
  echo $$ > /cgroups/cg1/tasks
  dd bs=1024 count=1024 if=/dev/null of=/data/foo
  echo $$ > /cgroups/tasks
  echo 1 > /cgroups/cg1/memory.force_empty

Actually, This OOM hard to try logic have been corrupted
since following two years old patch.

	commit a41f24ea9fd6169b147c53c2392e2887cc1d9247
	Author: Nishanth Aravamudan <nacc@us.ibm.com>
	Date:   Tue Apr 29 00:58:25 2008 -0700

	    page allocator: smarter retry of costly-order allocations

Original intention was "return success if the system have shrinkable
zones though priority==0 reclaim was failure". But the above patch
changed to "return nr_reclaimed if .....". Oh, That forgot nr_reclaimed
may be 0 if priority==0 reclaim failure.

And Johannes's patch made more corrupt. Originally, priority==0 recliam
failure on memcg return 0, but this patch changed to return 1. It
totally confused memcg.

This patch fixes it completely.

Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |   29 ++++++++++++++++-------------
 1 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 915dceb..a204209 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1724,13 +1724,13 @@ static void shrink_zone(int priority, struct zone *zone,
  * If a zone is deemed to be full of pinned pages then just give it a light
  * scan then give up on it.
  */
-static int shrink_zones(int priority, struct zonelist *zonelist,
+static bool shrink_zones(int priority, struct zonelist *zonelist,
 					struct scan_control *sc)
 {
 	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
 	struct zoneref *z;
 	struct zone *zone;
-	int progress = 0;
+	bool all_unreclaimable = true;
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 					sc->nodemask) {
@@ -1757,9 +1757,9 @@ static int shrink_zones(int priority, struct zonelist *zonelist,
 		}
 
 		shrink_zone(priority, zone, sc);
-		progress = 1;
+		all_unreclaimable = false;
 	}
-	return progress;
+	return all_unreclaimable;
 }
 
 /*
@@ -1782,7 +1782,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 					struct scan_control *sc)
 {
 	int priority;
-	unsigned long ret = 0;
+	bool all_unreclaimable; 
 	unsigned long total_scanned = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long lru_pages = 0;
@@ -1813,7 +1813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token();
-		ret = shrink_zones(priority, zonelist, sc);
+		all_unreclaimable = shrink_zones(priority, zonelist, sc);
 		/*
 		 * Don't shrink slabs when reclaiming memory from
 		 * over limit cgroups
@@ -1826,10 +1826,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 			}
 		}
 		total_scanned += sc->nr_scanned;
-		if (sc->nr_reclaimed >= sc->nr_to_reclaim) {
-			ret = sc->nr_reclaimed;
+		if (sc->nr_reclaimed >= sc->nr_to_reclaim)
 			goto out;
-		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -1849,9 +1847,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		    priority < DEF_PRIORITY - 2)
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
-	/* top priority shrink_zones still had more to do? don't OOM, then */
-	if (ret && scanning_global_lru(sc))
-		ret = sc->nr_reclaimed;
+
 out:
 	/*
 	 * Now that we've scanned all the zones at this priority level, note
@@ -1877,7 +1873,14 @@ out:
 	delayacct_freepages_end();
 	put_mems_allowed();
 
-	return ret;
+	if (sc->nr_reclaimed)
+		return sc->nr_reclaimed;
+
+	/* top priority shrink_zones still had more to do? don't OOM, then */
+	if (scanning_global_lru(sc) && !all_unreclaimable)
+		return 1;
+
+	return 0;
 }
 
 unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
-- 
1.6.5.2




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-06-01  3:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-30 23:05 [patch 0/5] vmscan: cut down on struct scan_control Johannes Weiner
2010-04-30 23:05 ` [patch 1/5] vmscan: fix unmapping behaviour for RECLAIM_SWAP Johannes Weiner
2010-05-13  3:02   ` KOSAKI Motohiro
2010-05-19 21:32     ` Johannes Weiner
2010-04-30 23:05 ` [patch 2/5] vmscan: remove may_unmap scan control Johannes Weiner
2010-04-30 23:05 ` [patch 3/5] vmscan: remove all_unreclaimable " Johannes Weiner
2010-05-13  3:25   ` KOSAKI Motohiro
2010-05-19 21:34     ` Johannes Weiner
2010-05-31 18:32   ` Greg Thelen
2010-06-01  3:29     ` KOSAKI Motohiro [this message]
2010-06-01  6:48       ` [PATCH] vmscan: Fix do_try_to_free_pages() return value when priority==0 reclaim failure KAMEZAWA Hiroyuki
2010-06-01  8:10       ` Balbir Singh
2010-06-02  0:33         ` KAMEZAWA Hiroyuki
2010-06-01 14:50       ` Greg Thelen
2010-06-04 14:32       ` Johannes Weiner
2010-04-30 23:05 ` [patch 4/5] vmscan: remove isolate_pages callback scan control Johannes Weiner
2010-05-13  3:29   ` KOSAKI Motohiro
2010-05-19 21:42     ` Johannes Weiner
2010-05-20 23:23       ` KOSAKI Motohiro
2010-04-30 23:05 ` [patch 5/5] vmscan: remove may_swap " Johannes Weiner
2010-05-13  3:36   ` KOSAKI Motohiro
2010-05-19 21:44     ` Johannes Weiner
2010-05-21  0:15       ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100601122140.2436.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox