[PATCH] vmscan: get_scan_count selects anon pages conservative

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] vmscan: get_scan_count selects anon pages conservative
@ 2015-03-04 20:03 Shaohua Li
  2015-03-04 20:17 ` Rik van Riel
  2015-03-04 20:52 ` Rik van Riel
  0 siblings, 2 replies; 4+ messages in thread
From: Shaohua Li @ 2015-03-04 20:03 UTC (permalink / raw)
  To: linux-mm
  Cc: Kernel-team, Andrew Morton, Mel Gorman, Rik van Riel, Johannes Weiner

kswapd is a per-node based. Sometimes there is imbalance between nodes,
node A is full of clean file pages (easy to reclaim), node B is
full of anon pages (hard to reclaim). With memory pressure, kswapd will
be waken up for both nodes. The kswapd of node B will try to swap, while
we prefer reclaim pages from node A first. The real issue here is we
don't have a mechanism to prevent memory allocation from a hard-reclaim
node (node B here) if there is an easy-reclaim node (node A) to reclaim
memory.

The swap can happen even with swapiness 0. Below is a simple script to
trigger it. cpu 1 and 8 are in different node, each has 72G memory:
truncate -s 70G img
taskset -c 8 dd if=img of=/dev/null bs=4k
taskset -c 1 usemem 70G

The swap can even easier to trigger because we have a protect mechanism
for situation file pages are less than high watermark. This logic makes
sense but could be more conservative.

This patch doesn't try to fix the kswapd imbalance issue above, but make
get_scan_count more conservative to select anon pages. The protect
mechanism is designed for situation file pages are rotated frequently.
In that situation, page reclaim should be in trouble, eg, priority is
lower. So let's only apply the protect mechanism in that situation. In
pratice, this fixes the swap issue in above test.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd..31b03e6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1990,7 +1990,7 @@ static void get_scan_count(struct lruvec *lruvec, int swappiness,
 	 * thrashing file LRU becomes infinitely more attractive than
 	 * anon pages.  Try to detect this based on file LRU size.
 	 */
-	if (global_reclaim(sc)) {
+	if (global_reclaim(sc) && sc->priority < DEF_PRIORITY - 2) {
 		unsigned long zonefile;
 		unsigned long zonefree;

-- 
1.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vmscan: get_scan_count selects anon pages conservative
  2015-03-04 20:03 [PATCH] vmscan: get_scan_count selects anon pages conservative Shaohua Li
@ 2015-03-04 20:17 ` Rik van Riel
  2015-03-04 20:52 ` Rik van Riel
  1 sibling, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2015-03-04 20:17 UTC (permalink / raw)
  To: Shaohua Li, linux-mm
  Cc: Kernel-team, Andrew Morton, Mel Gorman, Johannes Weiner

On 03/04/2015 03:03 PM, Shaohua Li wrote:
> kswapd is a per-node based. Sometimes there is imbalance between nodes,
> node A is full of clean file pages (easy to reclaim), node B is
> full of anon pages (hard to reclaim). With memory pressure, kswapd will
> be waken up for both nodes. The kswapd of node B will try to swap, while
> we prefer reclaim pages from node A first. The real issue here is we
> don't have a mechanism to prevent memory allocation from a hard-reclaim
> node (node B here) if there is an easy-reclaim node (node A) to reclaim
> memory.
> 
> The swap can happen even with swapiness 0. Below is a simple script to
> trigger it. cpu 1 and 8 are in different node, each has 72G memory:
> truncate -s 70G img
> taskset -c 8 dd if=img of=/dev/null bs=4k
> taskset -c 1 usemem 70G

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5e8eadd..31b03e6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1990,7 +1990,7 @@ static void get_scan_count(struct lruvec *lruvec, int swappiness,
>  	 * thrashing file LRU becomes infinitely more attractive than
>  	 * anon pages.  Try to detect this based on file LRU size.
>  	 */
> -	if (global_reclaim(sc)) {
> +	if (global_reclaim(sc) && sc->priority < DEF_PRIORITY - 2) {
>  		unsigned long zonefile;
>  		unsigned long zonefree;

What kernel does this apply to?

Current upstream does not seem to have the
"sc->priority < DEF_PRIORITY - 2" check, unless
I somehow managed to mess up "git clone" on several
systems.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vmscan: get_scan_count selects anon pages conservative
  2015-03-04 20:03 [PATCH] vmscan: get_scan_count selects anon pages conservative Shaohua Li
  2015-03-04 20:17 ` Rik van Riel
@ 2015-03-04 20:52 ` Rik van Riel
  2015-03-06 21:13   ` Shaohua Li
  1 sibling, 1 reply; 4+ messages in thread
From: Rik van Riel @ 2015-03-04 20:52 UTC (permalink / raw)
  To: Shaohua Li, linux-mm
  Cc: Kernel-team, Andrew Morton, Mel Gorman, Johannes Weiner

On 03/04/2015 03:03 PM, Shaohua Li wrote:
> kswapd is a per-node based. Sometimes there is imbalance between nodes,
> node A is full of clean file pages (easy to reclaim), node B is
> full of anon pages (hard to reclaim). With memory pressure, kswapd will
> be waken up for both nodes. The kswapd of node B will try to swap, while
> we prefer reclaim pages from node A first. The real issue here is we
> don't have a mechanism to prevent memory allocation from a hard-reclaim
> node (node B here) if there is an easy-reclaim node (node A) to reclaim
> memory.
> 
> The swap can happen even with swapiness 0. Below is a simple script to
> trigger it. cpu 1 and 8 are in different node, each has 72G memory:
> truncate -s 70G img
> taskset -c 8 dd if=img of=/dev/null bs=4k
> taskset -c 1 usemem 70G
> 
> The swap can even easier to trigger because we have a protect mechanism
> for situation file pages are less than high watermark. This logic makes
> sense but could be more conservative.
> 
> This patch doesn't try to fix the kswapd imbalance issue above, but make
> get_scan_count more conservative to select anon pages. The protect
> mechanism is designed for situation file pages are rotated frequently.
> In that situation, page reclaim should be in trouble, eg, priority is
> lower. So let's only apply the protect mechanism in that situation. In
> pratice, this fixes the swap issue in above test.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Doh, never mind my earlier comment. I must be too tired
to look at stuff right...

I see how your patch helps avoid the problem, but I am
worried about potential side effects. I suspect it could
lead to page cache thrashing when all zones are low on
page cache memory.

Would it make sense to explicitly check that we are low
on page cache pages in all zones on the scan list, before
forcing anon only scanning, when we get into this function?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] vmscan: get_scan_count selects anon pages conservative
  2015-03-04 20:52 ` Rik van Riel
@ 2015-03-06 21:13   ` Shaohua Li
  0 siblings, 0 replies; 4+ messages in thread
From: Shaohua Li @ 2015-03-06 21:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, Kernel-team, Andrew Morton, Mel Gorman, Johannes Weiner

On Wed, Mar 04, 2015 at 03:52:55PM -0500, Rik van Riel wrote:
> On 03/04/2015 03:03 PM, Shaohua Li wrote:
> > kswapd is a per-node based. Sometimes there is imbalance between nodes,
> > node A is full of clean file pages (easy to reclaim), node B is
> > full of anon pages (hard to reclaim). With memory pressure, kswapd will
> > be waken up for both nodes. The kswapd of node B will try to swap, while
> > we prefer reclaim pages from node A first. The real issue here is we
> > don't have a mechanism to prevent memory allocation from a hard-reclaim
> > node (node B here) if there is an easy-reclaim node (node A) to reclaim
> > memory.
> > 
> > The swap can happen even with swapiness 0. Below is a simple script to
> > trigger it. cpu 1 and 8 are in different node, each has 72G memory:
> > truncate -s 70G img
> > taskset -c 8 dd if=img of=/dev/null bs=4k
> > taskset -c 1 usemem 70G
> > 
> > The swap can even easier to trigger because we have a protect mechanism
> > for situation file pages are less than high watermark. This logic makes
> > sense but could be more conservative.
> > 
> > This patch doesn't try to fix the kswapd imbalance issue above, but make
> > get_scan_count more conservative to select anon pages. The protect
> > mechanism is designed for situation file pages are rotated frequently.
> > In that situation, page reclaim should be in trouble, eg, priority is
> > lower. So let's only apply the protect mechanism in that situation. In
> > pratice, this fixes the swap issue in above test.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> 
> Doh, never mind my earlier comment. I must be too tired
> to look at stuff right...
> 
> I see how your patch helps avoid the problem, but I am
> worried about potential side effects. I suspect it could
> lead to page cache thrashing when all zones are low on
> page cache memory.
> 
> Would it make sense to explicitly check that we are low
> on page cache pages in all zones on the scan list, before
> forcing anon only scanning, when we get into this function?

Ok, we still need to check the priority to make sure kswapd doesn't
stuck to zones without enough file pages. How about this one?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-06 21:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-04 20:03 [PATCH] vmscan: get_scan_count selects anon pages conservative Shaohua Li
2015-03-04 20:17 ` Rik van Riel
2015-03-04 20:52 ` Rik van Riel
2015-03-06 21:13   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox