linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mel@csn.ul.ie>,
	linux-mm@kvack.org
Subject: Re: [patch 3/5] vmscan: remove all_unreclaimable scan control
Date: Mon, 31 May 2010 11:32:51 -0700	[thread overview]
Message-ID: <xr93sk57yl9o.fsf@ninji.mtv.corp.google.com> (raw)
In-Reply-To: <20100430224316.056084208@cmpxchg.org> (Johannes Weiner's message of "Sat, 1 May 2010 01:05:31 +0200")

Johannes Weiner <hannes@cmpxchg.org> writes:
> This scan control is abused to communicate a return value from
> shrink_zones().  Write this idiomatically and remove the knob.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -70,8 +70,6 @@ struct scan_control {
>  
>  	int swappiness;
>  
> -	int all_unreclaimable;
> -
>  	int order;
>  
>  	int lumpy_reclaim;
> @@ -1701,14 +1699,14 @@ static void shrink_zone(int priority, st
>   * If a zone is deemed to be full of pinned pages then just give it a light
>   * scan then give up on it.
>   */
> -static void shrink_zones(int priority, struct zonelist *zonelist,
> +static int shrink_zones(int priority, struct zonelist *zonelist,
>  					struct scan_control *sc)
>  {
>  	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
>  	struct zoneref *z;
>  	struct zone *zone;
> +	int progress = 0;
>  
> -	sc->all_unreclaimable = 1;
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>  					sc->nodemask) {
>  		if (!populated_zone(zone))
> @@ -1724,19 +1722,19 @@ static void shrink_zones(int priority, s
>  
>  			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
>  				continue;	/* Let kswapd poll it */
> -			sc->all_unreclaimable = 0;
>  		} else {
>  			/*
>  			 * Ignore cpuset limitation here. We just want to reduce
>  			 * # of used pages by us regardless of memory shortage.
>  			 */
> -			sc->all_unreclaimable = 0;
>  			mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
>  							priority);
>  		}
>  
>  		shrink_zone(priority, zone, sc);
> +		progress = 1;
>  	}
> +	return progress;
>  }
>  
>  /*
> @@ -1789,7 +1787,7 @@ static unsigned long do_try_to_free_page
>  		sc->nr_scanned = 0;
>  		if (!priority)
>  			disable_swap_token();
> -		shrink_zones(priority, zonelist, sc);
> +		ret = shrink_zones(priority, zonelist, sc);
>  		/*
>  		 * Don't shrink slabs when reclaiming memory from
>  		 * over limit cgroups
> @@ -1826,7 +1824,7 @@ static unsigned long do_try_to_free_page
>  			congestion_wait(BLK_RW_ASYNC, HZ/10);
>  	}
>  	/* top priority shrink_zones still had more to do? don't OOM, then */
> -	if (!sc->all_unreclaimable && scanning_global_lru(sc))
> +	if (ret && scanning_global_lru(sc))
>  		ret = sc->nr_reclaimed;
>  out:
>  	/*

I agree with the direction of this patch, but I am seeing a hang when
testing with mmotm-2010-05-21-16-05.  The following test hangs, unless I
remove this patch from mmotm:
  mount -t cgroup none /cgroups -o memory
  mkdir /cgroups/cg1
  echo $$ > /cgroups/cg1/tasks
  dd bs=1024 count=1024 if=/dev/null of=/data/foo
  echo $$ > /cgroups/tasks
  echo 1 > /cgroups/cg1/memory.force_empty

I think the hang is caused by the following portion of
mem_cgroup_force_empty():
	while (nr_retries && mem->res.usage > 0) {
		int progress;

		if (signal_pending(current)) {
			ret = -EINTR;
			goto out;
		}
		progress = try_to_free_mem_cgroup_pages(mem, GFP_KERNEL,
						false, get_swappiness(mem));
		if (!progress) {
			nr_retries--;
			/* maybe some writeback is necessary */
			congestion_wait(BLK_RW_ASYNC, HZ/10);
		}

	}

With this patch applied, it is possible that when do_try_to_free_pages()
calls shrink_zones() for priority 0 that shrink_zones() may return 1
indicating progress, even though no pages may have been reclaimed.
Because this is a cgroup operation, scanning_global_lru() is false and
the following portion of do_try_to_free_pages() fails to set ret=0.
> 	if (ret && scanning_global_lru(sc))
>  		ret = sc->nr_reclaimed;
This leaves ret=1 indicating that do_try_to_free_pages() reclaimed 1
page even though it did not reclaim any pages.  Therefore
mem_cgroup_force_empty() erroneously believes that
try_to_free_mem_cgroup_pages() is making progress (one page at a time),
so there is an endless loop.

If I apply the following fix, then your patch does not hang and the
system appears to operate correctly.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 915dceb..772913c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1850,7 +1850,7 @@ static unsigned long do_try_to_free_pages(struct
zonelist *zonelist,
                        congestion_wait(BLK_RW_ASYNC, HZ/10);
        }
        /* top priority shrink_zones still had more to do? don't OOM,
        then */
-       if (ret && scanning_global_lru(sc))
+       if (ret)
                ret = sc->nr_reclaimed;
 out:
        /*

I have not done thorough testing, so this may introduce other problems.
Is there a reason not return nr_reclaimed when operating on a cgroup?
This may affect mem_cgroup_hierarchical_reclaim().

--
Greg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-05-31 18:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-30 23:05 [patch 0/5] vmscan: cut down on struct scan_control Johannes Weiner
2010-04-30 23:05 ` [patch 1/5] vmscan: fix unmapping behaviour for RECLAIM_SWAP Johannes Weiner
2010-05-13  3:02   ` KOSAKI Motohiro
2010-05-19 21:32     ` Johannes Weiner
2010-04-30 23:05 ` [patch 2/5] vmscan: remove may_unmap scan control Johannes Weiner
2010-04-30 23:05 ` [patch 3/5] vmscan: remove all_unreclaimable " Johannes Weiner
2010-05-13  3:25   ` KOSAKI Motohiro
2010-05-19 21:34     ` Johannes Weiner
2010-05-31 18:32   ` Greg Thelen [this message]
2010-06-01  3:29     ` [PATCH] vmscan: Fix do_try_to_free_pages() return value when priority==0 reclaim failure KOSAKI Motohiro
2010-06-01  6:48       ` KAMEZAWA Hiroyuki
2010-06-01  8:10       ` Balbir Singh
2010-06-02  0:33         ` KAMEZAWA Hiroyuki
2010-06-01 14:50       ` Greg Thelen
2010-06-04 14:32       ` Johannes Weiner
2010-04-30 23:05 ` [patch 4/5] vmscan: remove isolate_pages callback scan control Johannes Weiner
2010-05-13  3:29   ` KOSAKI Motohiro
2010-05-19 21:42     ` Johannes Weiner
2010-05-20 23:23       ` KOSAKI Motohiro
2010-04-30 23:05 ` [patch 5/5] vmscan: remove may_swap " Johannes Weiner
2010-05-13  3:36   ` KOSAKI Motohiro
2010-05-19 21:44     ` Johannes Weiner
2010-05-21  0:15       ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93sk57yl9o.fsf@ninji.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox