linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 2/6] mm, kswapd: don't reset kswapd_order prematurely
Date: Fri, 28 Jul 2017 11:16:22 +0100	[thread overview]
Message-ID: <20170728101622.uoirf7ryvtoddq7b@techsingularity.net> (raw)
In-Reply-To: <20170727160701.9245-3-vbabka@suse.cz>

On Thu, Jul 27, 2017 at 06:06:57PM +0200, Vlastimil Babka wrote:
> This patch deals with a corner case found when testing kcompactd with a very
> simple testcase that first fragments memory (by creating a large shmem file and
> then punching hole in every even page) and then uses artificial order-9
> GFP_NOWAIT allocations in a loop. This is freshly after virtme-run boot in KVM
> and no other activity.
> 
> What happens is that kswapd always reclaims too little to get over
> compact_gap() in kswapd_shrink_node(), so it doesn't set sc->order to 0, thus
> "goto kswapd_try_sleep" in kswapd() doesn't happen. In the next iteration of
> kswapd() loop, alloc_order and reclaim_order is read again from
> pgdat->kswapd_order, which the previous iteration has reset to 0 and there was
> no other kswapd wakeup meanwhile (the workload inserts short sleeps between
> allocations). With the working order 0, node appears balanced and
> wakeup_kcompactd() does nothing.
> 

The risk with a change like this is that there is an introduction of
kswapd-stuck-at-100%-cpu reclaiming for high order pages. Consider for
example this part

> -static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
> +/*
> + * Return true if kswapd fully slept because pgdat was balanced and there was
> + * no premature wakeup.
> + */
> +static bool kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order,
>  				unsigned int classzone_idx)
>  {
>  	long remaining = 0;
>  	DEFINE_WAIT(wait);
> +	bool ret = false;
>  
>  	if (freezing(current) || kthread_should_stop())
> -		return;
> +		return false;
>  
>  	prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
>  

...

> @@ -3493,23 +3491,32 @@ static int kswapd(void *p)
>  	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
>  	set_freezable();
>  
> -	pgdat->kswapd_order = 0;
> +	pgdat->kswapd_order = alloc_order = reclaim_order = 0;
>  	pgdat->kswapd_classzone_idx = MAX_NR_ZONES;
>  	for ( ; ; ) {
>  		bool ret;
>  
> -		alloc_order = reclaim_order = pgdat->kswapd_order;
> +		alloc_order = reclaim_order = max(alloc_order, pgdat->kswapd_order);
>  		classzone_idx = kswapd_classzone_idx(pgdat, classzone_idx);
>  
>  kswapd_try_sleep:
> -		kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
> -					classzone_idx);
> -
> -		/* Read the new order and classzone_idx */
> -		alloc_order = reclaim_order = pgdat->kswapd_order;
> -		classzone_idx = kswapd_classzone_idx(pgdat, 0);
> -		pgdat->kswapd_order = 0;
> -		pgdat->kswapd_classzone_idx = MAX_NR_ZONES;
> +		if (kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
> +							classzone_idx)) {
> +
> +			/* Read the new order and classzone_idx */
> +			alloc_order = reclaim_order = pgdat->kswapd_order;
> +			classzone_idx = kswapd_classzone_idx(pgdat, 0);
> +			pgdat->kswapd_order = 0;
> +			pgdat->kswapd_classzone_idx = MAX_NR_ZONES;
> +		} else {
> +			/*
> +			 * We failed to sleep, so continue on the current order
> +			 * and classzone_idx, unless they increased.
> +			 */
> +			alloc_order = max(alloc_order, pgdat->kswapd_order);
> +			reclaim_order = max(reclaim_order, pgdat->kswapd_order) ;
> +			classzone_idx = kswapd_classzone_idx(pgdat, classzone_idx);
> +		}
>  
>  		ret = try_to_freeze();
>  		if (kthread_should_stop())

kswapd_try_to_sleep returns true only if it fully slept. Now, consider
a case where kswapd is woken for order-9, fails and there are streaming
allocators that are keeping kswapd awake between the low/high watermark.
Even though all subsequent wakeups are for potentially for order-0, the
false branch above keeps kswapd at order-9.

You should be very wary of keeping kswapd awake for high-order allocations
and somehow defer to either kcompactd or push it into direct reclaim.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-28 10:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 16:06 [RFC PATCH 0/6] proactive kcompactd Vlastimil Babka
2017-07-27 16:06 ` [PATCH 1/6] mm, kswapd: refactor kswapd_try_to_sleep() Vlastimil Babka
2017-07-28  9:38   ` Mel Gorman
2017-07-27 16:06 ` [PATCH 2/6] mm, kswapd: don't reset kswapd_order prematurely Vlastimil Babka
2017-07-28 10:16   ` Mel Gorman [this message]
2017-07-27 16:06 ` [PATCH 3/6] mm, kswapd: reset kswapd's order to 0 when it fails to reclaim enough Vlastimil Babka
2017-07-27 16:06 ` [PATCH 4/6] mm, kswapd: wake up kcompactd when kswapd had too many failures Vlastimil Babka
2017-07-28 10:41   ` Mel Gorman
2017-07-27 16:07 ` [RFC PATCH 5/6] mm, compaction: stop when number of free pages goes below watermark Vlastimil Babka
2017-07-28 10:43   ` Mel Gorman
2017-07-27 16:07 ` [RFC PATCH 6/6] mm: make kcompactd more proactive Vlastimil Babka
2017-07-28 10:58   ` Mel Gorman
2017-08-09 20:58 ` [RFC PATCH 0/6] proactive kcompactd David Rientjes
2017-08-21 14:10   ` Johannes Weiner
2017-08-21 21:40     ` Rik van Riel
2017-08-22 20:57     ` David Rientjes
2017-08-23  5:36     ` Joonsoo Kim
2017-08-23  8:12       ` Vlastimil Babka
2017-08-24  6:24         ` Joonsoo Kim
2017-08-24 11:30           ` Vlastimil Babka
2017-08-24 23:51             ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170728101622.uoirf7ryvtoddq7b@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox