[PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
@ 2019-01-28 19:35 Rik van Riel
  2019-01-28 19:45 ` Roman Gushchin
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rik van Riel @ 2019-01-28 19:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, kernel-team, Johannes Weiner, Chris Mason,
	Roman Gushchin, Andrew Morton, Michal Hocko

There are a few issues with the way the number of slab objects to
scan is calculated in do_shrink_slab.  First, for zero-seek slabs,
we could leave the last object around forever. That could result
in pinning a dying cgroup into memory, instead of reclaiming it.
The fix for that is trivial.

Secondly, small slabs receive much more pressure, relative to their
size, than larger slabs, due to "rounding up" the minimum number of
scanned objects to batch_size.

We can keep the pressure on all slabs equal relative to their size
by accumulating the scan pressure on small slabs over time, resulting
in sometimes scanning an object, instead of always scanning several.

This results in lower system CPU use, and a lower major fault rate,
as actively used entries from smaller caches get reclaimed less
aggressively, and need to be reloaded/recreated less often.

Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chris Mason <clm@fb.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: kernel-team@fb.com
Tested-by: Chris Mason <clm@fb.com>
---
 include/linux/shrinker.h |  1 +
 mm/vmscan.c              | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 9443cafd1969..7a9a1a0f935c 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -65,6 +65,7 @@ struct shrinker {
 
 	long batch;	/* reclaim batch size, 0 = default */
 	int seeks;	/* seeks to recreate an obj */
+	int small_scan;	/* accumulate pressure on slabs with few objects */
 	unsigned flags;
 
 	/* These are for internal use */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a714c4f800e9..0e375bd7a8b6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -488,18 +488,28 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
 		 * them aggressively under memory pressure to keep
 		 * them from causing refetches in the IO caches.
 		 */
-		delta = freeable / 2;
+		delta = (freeable + 1)/ 2;
 	}
 
 	/*
 	 * Make sure we apply some minimal pressure on default priority
-	 * even on small cgroups. Stale objects are not only consuming memory
+	 * even on small cgroups, by accumulating pressure across multiple
+	 * slab shrinker runs. Stale objects are not only consuming memory
 	 * by themselves, but can also hold a reference to a dying cgroup,
 	 * preventing it from being reclaimed. A dying cgroup with all
 	 * corresponding structures like per-cpu stats and kmem caches
 	 * can be really big, so it may lead to a significant waste of memory.
 	 */
-	delta = max_t(unsigned long long, delta, min(freeable, batch_size));
+	if (!delta) {
+		shrinker->small_scan += freeable;
+
+		delta = shrinker->small_scan >> priority;
+		shrinker->small_scan -= delta << priority;
+
+		delta *= 4;
+		do_div(delta, shrinker->seeks);
+
+	}
 
 	total_scan += delta;
 	if (total_scan < 0) {


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 19:35 [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs Rik van Riel
@ 2019-01-28 19:45 ` Roman Gushchin
  2019-01-28 19:54 ` Andrew Morton
  2019-01-28 21:31 ` Johannes Weiner
  2 siblings, 0 replies; 7+ messages in thread
From: Roman Gushchin @ 2019-01-28 19:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, Kernel Team, Johannes Weiner,
	Chris Mason, Andrew Morton, Michal Hocko, hange-folder>?

On Mon, Jan 28, 2019 at 02:35:35PM -0500, Rik van Riel wrote:
> There are a few issues with the way the number of slab objects to
> scan is calculated in do_shrink_slab.  First, for zero-seek slabs,
> we could leave the last object around forever. That could result
> in pinning a dying cgroup into memory, instead of reclaiming it.
> The fix for that is trivial.
> 
> Secondly, small slabs receive much more pressure, relative to their
> size, than larger slabs, due to "rounding up" the minimum number of
> scanned objects to batch_size.
> 
> We can keep the pressure on all slabs equal relative to their size
> by accumulating the scan pressure on small slabs over time, resulting
> in sometimes scanning an object, instead of always scanning several.
> 
> This results in lower system CPU use, and a lower major fault rate,
> as actively used entries from smaller caches get reclaimed less
> aggressively, and need to be reloaded/recreated less often.
> 
> Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
> Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: kernel-team@fb.com
> Tested-by: Chris Mason <clm@fb.com>

Hi, Rik!

There is a couple of formatting issues (see below), but other than that
the patch looks very good to me. Thanks!

Acked-by: Roman Gushchin <guro@fb.com>

> ---
>  include/linux/shrinker.h |  1 +
>  mm/vmscan.c              | 16 +++++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 9443cafd1969..7a9a1a0f935c 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -65,6 +65,7 @@ struct shrinker {
>  
>  	long batch;	/* reclaim batch size, 0 = default */
>  	int seeks;	/* seeks to recreate an obj */
> +	int small_scan;	/* accumulate pressure on slabs with few objects */
>  	unsigned flags;
>  
>  	/* These are for internal use */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a714c4f800e9..0e375bd7a8b6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -488,18 +488,28 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		 * them aggressively under memory pressure to keep
>  		 * them from causing refetches in the IO caches.
>  		 */
> -		delta = freeable / 2;
> +		delta = (freeable + 1)/ 2;
                                      ^
                                      A space is missing here.
>  	}
>  
>  	/*
>  	 * Make sure we apply some minimal pressure on default priority
> -	 * even on small cgroups. Stale objects are not only consuming memory
> +	 * even on small cgroups, by accumulating pressure across multiple
> +	 * slab shrinker runs. Stale objects are not only consuming memory
>  	 * by themselves, but can also hold a reference to a dying cgroup,
>  	 * preventing it from being reclaimed. A dying cgroup with all
>  	 * corresponding structures like per-cpu stats and kmem caches
>  	 * can be really big, so it may lead to a significant waste of memory.
>  	 */
> -	delta = max_t(unsigned long long, delta, min(freeable, batch_size));
> +	if (!delta) {
> +		shrinker->small_scan += freeable;
> +
> +		delta = shrinker->small_scan >> priority;
> +		shrinker->small_scan -= delta << priority;
> +
> +		delta *= 4;
> +		do_div(delta, shrinker->seeks);
> +

This empty line can be removed, I believe.

> +	}
>  
>  	total_scan += delta;
>  	if (total_scan < 0) {
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 19:35 [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs Rik van Riel
  2019-01-28 19:45 ` Roman Gushchin
@ 2019-01-28 19:54 ` Andrew Morton
  2019-01-28 20:03   ` Rik van Riel
  2019-01-28 21:31 ` Johannes Weiner
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2019-01-28 19:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, kernel-team, Johannes Weiner,
	Chris Mason, Roman Gushchin, Michal Hocko

On Mon, 28 Jan 2019 14:35:35 -0500 Rik van Riel <riel@surriel.com> wrote:

> There are a few issues with the way the number of slab objects to
> scan is calculated in do_shrink_slab.  First, for zero-seek slabs,
> we could leave the last object around forever. That could result
> in pinning a dying cgroup into memory, instead of reclaiming it.
> The fix for that is trivial.
> 
> Secondly, small slabs receive much more pressure, relative to their
> size, than larger slabs, due to "rounding up" the minimum number of
> scanned objects to batch_size.
> 
> We can keep the pressure on all slabs equal relative to their size
> by accumulating the scan pressure on small slabs over time, resulting
> in sometimes scanning an object, instead of always scanning several.
> 
> This results in lower system CPU use, and a lower major fault rate,
> as actively used entries from smaller caches get reclaimed less
> aggressively, and need to be reloaded/recreated less often.
> 
> Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
> Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: kernel-team@fb.com
> Tested-by: Chris Mason <clm@fb.com>

I added your Signed-off-by:

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -488,18 +488,28 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		 * them aggressively under memory pressure to keep
>  		 * them from causing refetches in the IO caches.
>  		 */
> -		delta = freeable / 2;
> +		delta = (freeable + 1)/ 2;
>  	}
>  
>  	/*
>  	 * Make sure we apply some minimal pressure on default priority
> -	 * even on small cgroups. Stale objects are not only consuming memory
> +	 * even on small cgroups, by accumulating pressure across multiple
> +	 * slab shrinker runs. Stale objects are not only consuming memory
>  	 * by themselves, but can also hold a reference to a dying cgroup,
>  	 * preventing it from being reclaimed. A dying cgroup with all
>  	 * corresponding structures like per-cpu stats and kmem caches
>  	 * can be really big, so it may lead to a significant waste of memory.
>  	 */
> -	delta = max_t(unsigned long long, delta, min(freeable, batch_size));
> +	if (!delta) {
> +		shrinker->small_scan += freeable;
> +
> +		delta = shrinker->small_scan >> priority;
> +		shrinker->small_scan -= delta << priority;
> +
> +		delta *= 4;
> +		do_div(delta, shrinker->seeks);

What prevents shrinker->small_scan from over- or underflowing over time?

> +	}
>  
>  	total_scan += delta;
>  	if (total_scan < 0) {

I'll add this:





whitespace fixes, per Roman

--- a/mm/vmscan.c~mmslabvmscan-accumulate-gradual-pressure-on-small-slabs-fix
+++ a/mm/vmscan.c
@@ -488,7 +488,7 @@ static unsigned long do_shrink_slab(stru
 		 * them aggressively under memory pressure to keep
 		 * them from causing refetches in the IO caches.
 		 */
-		delta = (freeable + 1)/ 2;
+		delta = (freeable + 1) / 2;
 	}
 
 	/*
@@ -508,7 +508,6 @@ static unsigned long do_shrink_slab(stru
 
 		delta *= 4;
 		do_div(delta, shrinker->seeks);
-
 	}
 
 	total_scan += delta;
_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 19:54 ` Andrew Morton
@ 2019-01-28 20:03   ` Rik van Riel
  2019-01-28 20:10     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2019-01-28 20:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, kernel-team, Johannes Weiner,
	Chris Mason, Roman Gushchin, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On Mon, 2019-01-28 at 11:54 -0800, Andrew Morton wrote:
> On Mon, 28 Jan 2019 14:35:35 -0500 Rik van Riel <riel@surriel.com>
> wrote:
> 
> >  	/*
> >  	 * Make sure we apply some minimal pressure on default priority
> > -	 * even on small cgroups. Stale objects are not only consuming
> > memory
> > +	 * even on small cgroups, by accumulating pressure across
> > multiple
> > +	 * slab shrinker runs. Stale objects are not only consuming
> > memory
> >  	 * by themselves, but can also hold a reference to a dying
> > cgroup,
> >  	 * preventing it from being reclaimed. A dying cgroup with all
> >  	 * corresponding structures like per-cpu stats and kmem caches
> >  	 * can be really big, so it may lead to a significant waste of
> > memory.
> >  	 */
> > -	delta = max_t(unsigned long long, delta, min(freeable,
> > batch_size));
> > +	if (!delta) {
> > +		shrinker->small_scan += freeable;
> > +
> > +		delta = shrinker->small_scan >> priority;
> > +		shrinker->small_scan -= delta << priority;
> > +
> > +		delta *= 4;
> > +		do_div(delta, shrinker->seeks);
> 
> What prevents shrinker->small_scan from over- or underflowing over
> time?

We only go into this code path if
delta >> DEF_PRIORITY is zero.

That is, freeable is smaller than 4096.

> I'll add this:

> whitespace fixes, per Roman

Awesome, thank you!

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 20:03   ` Rik van Riel
@ 2019-01-28 20:10     ` Andrew Morton
  2019-01-28 20:34       ` Rik van Riel
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2019-01-28 20:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, kernel-team, Johannes Weiner,
	Chris Mason, Roman Gushchin, Michal Hocko

On Mon, 28 Jan 2019 15:03:28 -0500 Rik van Riel <riel@surriel.com> wrote:

> On Mon, 2019-01-28 at 11:54 -0800, Andrew Morton wrote:
> > On Mon, 28 Jan 2019 14:35:35 -0500 Rik van Riel <riel@surriel.com>
> > wrote:
> > 
> > >  	/*
> > >  	 * Make sure we apply some minimal pressure on default priority
> > > -	 * even on small cgroups. Stale objects are not only consuming
> > > memory
> > > +	 * even on small cgroups, by accumulating pressure across
> > > multiple
> > > +	 * slab shrinker runs. Stale objects are not only consuming
> > > memory
> > >  	 * by themselves, but can also hold a reference to a dying
> > > cgroup,
> > >  	 * preventing it from being reclaimed. A dying cgroup with all
> > >  	 * corresponding structures like per-cpu stats and kmem caches
> > >  	 * can be really big, so it may lead to a significant waste of
> > > memory.
> > >  	 */
> > > -	delta = max_t(unsigned long long, delta, min(freeable,
> > > batch_size));
> > > +	if (!delta) {
> > > +		shrinker->small_scan += freeable;
> > > +
> > > +		delta = shrinker->small_scan >> priority;
> > > +		shrinker->small_scan -= delta << priority;
> > > +
> > > +		delta *= 4;
> > > +		do_div(delta, shrinker->seeks);
> > 
> > What prevents shrinker->small_scan from over- or underflowing over
> > time?
> 
> We only go into this code path if
> delta >> DEF_PRIORITY is zero.
> 
> That is, freeable is smaller than 4096.
> 

I'm still not understanding.  If `freeable' always has a value of (say)
1, we'll eventually overflow shrinker->small_scan?  Or at least, it's
unobvious why this cannot happen.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 20:10     ` Andrew Morton
@ 2019-01-28 20:34       ` Rik van Riel
  0 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2019-01-28 20:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, kernel-team, Johannes Weiner,
	Chris Mason, Roman Gushchin, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]

On Mon, 2019-01-28 at 12:10 -0800, Andrew Morton wrote:
> On Mon, 28 Jan 2019 15:03:28 -0500 Rik van Riel <riel@surriel.com>
> wrote:
> 
> > On Mon, 2019-01-28 at 11:54 -0800, Andrew Morton wrote:
> > > On Mon, 28 Jan 2019 14:35:35 -0500 Rik van Riel <riel@surriel.com
> > > >
> > > wrote:
> > > 
> > > > memory.
> > > >  	 */
> > > > -	delta = max_t(unsigned long long, delta, min(freeable,
> > > > batch_size));
> > > > +	if (!delta) {
> > > > +		shrinker->small_scan += freeable;
> > > > +
> > > > +		delta = shrinker->small_scan >> priority;
> > > > +		shrinker->small_scan -= delta << priority;

When delta is a non-zero number, we subtract (delta << priority)
from shrinker->small_scan.

That should happen every time delta >= (1<<priority), which is
4096 for DEF_PRIORITY.

> > > > +
> > > > +		delta *= 4;
> > > > +		do_div(delta, shrinker->seeks);
> > > 
> > > What prevents shrinker->small_scan from over- or underflowing
> > > over
> > > time?
> > 
> > We only go into this code path if
> > delta >> DEF_PRIORITY is zero.
> > 
> > That is, freeable is smaller than 4096.
> > 
> 
> I'm still not understanding.  If `freeable' always has a value of
> (say)
> 1, we'll eventually overflow shrinker->small_scan?  Or at least, it's
> unobvious why this cannot happen.
-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs
  2019-01-28 19:35 [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs Rik van Riel
  2019-01-28 19:45 ` Roman Gushchin
  2019-01-28 19:54 ` Andrew Morton
@ 2019-01-28 21:31 ` Johannes Weiner
  2 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2019-01-28 21:31 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, linux-mm, kernel-team, Chris Mason, Roman Gushchin,
	Andrew Morton, Michal Hocko

On Mon, Jan 28, 2019 at 02:35:35PM -0500, Rik van Riel wrote:
> There are a few issues with the way the number of slab objects to
> scan is calculated in do_shrink_slab.  First, for zero-seek slabs,
> we could leave the last object around forever. That could result
> in pinning a dying cgroup into memory, instead of reclaiming it.
> The fix for that is trivial.
> 
> Secondly, small slabs receive much more pressure, relative to their
> size, than larger slabs, due to "rounding up" the minimum number of
> scanned objects to batch_size.
> 
> We can keep the pressure on all slabs equal relative to their size
> by accumulating the scan pressure on small slabs over time, resulting
> in sometimes scanning an object, instead of always scanning several.
> 
> This results in lower system CPU use, and a lower major fault rate,
> as actively used entries from smaller caches get reclaimed less
> aggressively, and need to be reloaded/recreated less often.
> 
> Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
> Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects")
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: kernel-team@fb.com
> Tested-by: Chris Mason <clm@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-28 21:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 19:35 [PATCH] mm,slab,vmscan: accumulate gradual pressure on small slabs Rik van Riel
2019-01-28 19:45 ` Roman Gushchin
2019-01-28 19:54 ` Andrew Morton
2019-01-28 20:03   ` Rik van Riel
2019-01-28 20:10     ` Andrew Morton
2019-01-28 20:34       ` Rik van Riel
2019-01-28 21:31 ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox