linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4] Allow compaction of unevictable pages
@ 2015-03-12 15:22 Eric B Munson
  2015-03-12 15:26 ` Eric B Munson
  2015-03-12 19:30 ` Michal Hocko
  0 siblings, 2 replies; 4+ messages in thread
From: Eric B Munson @ 2015-03-12 15:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric B Munson, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The mlock
desctription does not promise that all page faults will be avoided, only
major ones so this protection is not necessary.  This extra protection
can cause problems for applications that are using mlock to avoid
swapping pages out, but require order > 0 allocations to continue to
succeed in a fragmented environment.  This patch adds a sysctl entry
that will be used to allow root to enable compaction of unevictable
pages.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable sysctl is 0, I cannot allocate hugepages after
fragmenting memory.  When the value is set to 1, allocations succeed.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
Changes from V3:
Instead of removing the ISOLATE_UNEVICTABLE mode and checks, allow the
sysadmin to control if compaction of unevictable pages is allowable.

 include/linux/compaction.h |    1 +
 kernel/sysctl.c            |    7 +++++++
 mm/compaction.c            |    3 +++
 3 files changed, 11 insertions(+)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a014559..9dd7e7c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 extern int sysctl_extfrag_threshold;
 extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 			void __user *buffer, size_t *length, loff_t *ppos);
+extern int sysctl_compact_unevictable;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 88ea2d6..cc1a678 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1313,6 +1313,13 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &min_extfrag_threshold,
 		.extra2		= &max_extfrag_threshold,
 	},
+	{
+		.procname	= "compact_unevictable",
+		.data		= &sysctl_compact_unevictable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 
 #endif /* CONFIG_COMPACTION */
 	{
diff --git a/mm/compaction.c b/mm/compaction.c
index 8c0d945..b2c1e4e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1046,6 +1046,8 @@ typedef enum {
 	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
 } isolate_migrate_t;
 
+int sysctl_compact_unevictable;
+
 /*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
@@ -1057,6 +1059,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	unsigned long low_pfn, end_pfn;
 	struct page *page;
 	const isolate_mode_t isolate_mode =
+		(sysctl_compact_unevictable ? ISOLATE_UNEVICTABLE: 0) |
 		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V4] Allow compaction of unevictable pages
  2015-03-12 15:22 [PATCH V4] Allow compaction of unevictable pages Eric B Munson
@ 2015-03-12 15:26 ` Eric B Munson
  2015-03-12 19:30 ` Michal Hocko
  1 sibling, 0 replies; 4+ messages in thread
From: Eric B Munson @ 2015-03-12 15:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Thomas Gleixner, Christoph Lameter,
	Peter Zijlstra, Mel Gorman, David Rientjes, Rik van Riel,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3918 bytes --]

On Thu, 12 Mar 2015, Eric B Munson wrote:

> Currently, pages which are marked as unevictable are protected from
> compaction, but not from other types of migration.  The mlock
> desctription does not promise that all page faults will be avoided, only
> major ones so this protection is not necessary.  This extra protection
> can cause problems for applications that are using mlock to avoid
> swapping pages out, but require order > 0 allocations to continue to
> succeed in a fragmented environment.  This patch adds a sysctl entry
> that will be used to allow root to enable compaction of unevictable
> pages.
> 
> To illustrate this problem I wrote a quick test program that mmaps a
> large number of 1MB files filled with random data.  These maps are
> created locked and read only.  Then every other mmap is unmapped and I
> attempt to allocate huge pages to the static huge page pool.  When the
> compact_unevictable sysctl is 0, I cannot allocate hugepages after
> fragmenting memory.  When the value is set to 1, allocations succeed.
> 
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from V3:
> Instead of removing the ISOLATE_UNEVICTABLE mode and checks, allow the
> sysadmin to control if compaction of unevictable pages is allowable.
> 
>  include/linux/compaction.h |    1 +
>  kernel/sysctl.c            |    7 +++++++
>  mm/compaction.c            |    3 +++
>  3 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index a014559..9dd7e7c 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>  extern int sysctl_extfrag_threshold;
>  extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
>  			void __user *buffer, size_t *length, loff_t *ppos);
> +extern int sysctl_compact_unevictable;
>  
>  extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 88ea2d6..cc1a678 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1313,6 +1313,13 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= &min_extfrag_threshold,
>  		.extra2		= &max_extfrag_threshold,
>  	},
> +	{
> +		.procname	= "compact_unevictable",
> +		.data		= &sysctl_compact_unevictable,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
>  
>  #endif /* CONFIG_COMPACTION */
>  	{
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8c0d945..b2c1e4e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1046,6 +1046,8 @@ typedef enum {
>  	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
>  } isolate_migrate_t;
>  
> +int sysctl_compact_unevictable;
> +
>  /*
>   * Isolate all pages that can be migrated from the first suitable block,
>   * starting at the block pointed to by the migrate scanner pfn within
> @@ -1057,6 +1059,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	unsigned long low_pfn, end_pfn;
>  	struct page *page;
>  	const isolate_mode_t isolate_mode =
> +		(sysctl_compact_unevictable ? ISOLATE_UNEVICTABLE: 0) |

Sorry, missed the space following the :, if this idea is acceptable, I
will send a patch with the correct whitespace.

>  		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
>  
>  	/*
> -- 
> 1.7.9.5
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V4] Allow compaction of unevictable pages
  2015-03-12 15:22 [PATCH V4] Allow compaction of unevictable pages Eric B Munson
  2015-03-12 15:26 ` Eric B Munson
@ 2015-03-12 19:30 ` Michal Hocko
  2015-03-12 19:45   ` Eric B Munson
  1 sibling, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2015-03-12 19:30 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Thu 12-03-15 11:22:56, Eric B Munson wrote:
> Currently, pages which are marked as unevictable are protected from
> compaction, but not from other types of migration.  The mlock
> desctription does not promise that all page faults will be avoided, only
> major ones so this protection is not necessary.  This extra protection
> can cause problems for applications that are using mlock to avoid
> swapping pages out, but require order > 0 allocations to continue to
> succeed in a fragmented environment.  This patch adds a sysctl entry
> that will be used to allow root to enable compaction of unevictable
> pages.

It would be appropriate to add a justification for the sysctl, because
it is not obvious from the above description. mlock preventing from the
swapout is not sufficient to justify it. It is the real time extension
mentioned by Peter in the previous version which makes it worth a new
user visible knob.

I would also argue that the knob should be enabled by default because
the real time extension requires an additional changes anyway (rt-kernel
at least) while general usage doesn't need such a strong requirement.

You also should provide a knob description to
Documentation/sysctl/vm.txt

> To illustrate this problem I wrote a quick test program that mmaps a
> large number of 1MB files filled with random data.  These maps are
> created locked and read only.  Then every other mmap is unmapped and I
> attempt to allocate huge pages to the static huge page pool.  When the
> compact_unevictable sysctl is 0, I cannot allocate hugepages after
> fragmenting memory.  When the value is set to 1, allocations succeed.
> 
> Signed-off-by: Eric B Munson <emunson@akamai.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

After the above things are fixed
Acked-by: Michal Hocko <mhocko@suse.cz>

One minor suggestion below

> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 88ea2d6..cc1a678 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1313,6 +1313,13 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= &min_extfrag_threshold,
>  		.extra2		= &max_extfrag_threshold,
>  	},
> +	{
> +		.procname	= "compact_unevictable",
> +		.data		= &sysctl_compact_unevictable,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,

You can use .extra1 = &zero and .extra2 = &one to reduce the value
space.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V4] Allow compaction of unevictable pages
  2015-03-12 19:30 ` Michal Hocko
@ 2015-03-12 19:45   ` Eric B Munson
  0 siblings, 0 replies; 4+ messages in thread
From: Eric B Munson @ 2015-03-12 19:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Thomas Gleixner,
	Christoph Lameter, Peter Zijlstra, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/12/2015 03:30 PM, Michal Hocko wrote:
> On Thu 12-03-15 11:22:56, Eric B Munson wrote:
>> Currently, pages which are marked as unevictable are protected
>> from compaction, but not from other types of migration.  The
>> mlock desctription does not promise that all page faults will be
>> avoided, only major ones so this protection is not necessary.
>> This extra protection can cause problems for applications that
>> are using mlock to avoid swapping pages out, but require order >
>> 0 allocations to continue to succeed in a fragmented environment.
>> This patch adds a sysctl entry that will be used to allow root to
>> enable compaction of unevictable pages.
> 
> It would be appropriate to add a justification for the sysctl,
> because it is not obvious from the above description. mlock
> preventing from the swapout is not sufficient to justify it. It is
> the real time extension mentioned by Peter in the previous version
> which makes it worth a new user visible knob.
> 
> I would also argue that the knob should be enabled by default
> because the real time extension requires an additional changes
> anyway (rt-kernel at least) while general usage doesn't need such a
> strong requirement.

Thanks for the review, I will incorporate your suggestions into a V5.
 I agree that many users will want to set this to 1, but keeping the
default to 0 maintains the behavior of the kernel today.  I'd like to
have the real time folks say that they are okay with a default of 1
before I make that change.

> 
> You also should provide a knob description to 
> Documentation/sysctl/vm.txt

Will do.

> 
>> To illustrate this problem I wrote a quick test program that
>> mmaps a large number of 1MB files filled with random data.  These
>> maps are created locked and read only.  Then every other mmap is
>> unmapped and I attempt to allocate huge pages to the static huge
>> page pool.  When the compact_unevictable sysctl is 0, I cannot
>> allocate hugepages after fragmenting memory.  When the value is
>> set to 1, allocations succeed.
>> 
>> Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Vlastimil
>> Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> 
>> Cc: Christoph Lameter <cl@linux.com> Cc: Peter Zijlstra
>> <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David
>> Rientjes <rientjes@google.com> Cc: Rik van Riel
>> <riel@redhat.com> Cc: linux-mm@kvack.org Cc:
>> linux-kernel@vger.kernel.org
> 
> After the above things are fixed Acked-by: Michal Hocko
> <mhocko@suse.cz>
> 
> One minor suggestion below
> 
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index
>> 88ea2d6..cc1a678 100644 --- a/kernel/sysctl.c +++
>> b/kernel/sysctl.c @@ -1313,6 +1313,13 @@ static struct ctl_table
>> vm_table[] = { .extra1		= &min_extfrag_threshold, .extra2		=
>> &max_extfrag_threshold, }, +	{ +		.procname	=
>> "compact_unevictable", +		.data		= &sysctl_compact_unevictable, +
>> .maxlen		= sizeof(int), +		.mode		= 0644, +		.proc_handler	=
>> proc_dointvec,
> 
> You can use .extra1 = &zero and .extra2 = &one to reduce the value 
> space.
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBAgAGBQJVAezjAAoJELbVsDOpoOa9MHwP/j4nvm3dFm00qjOX4RKxsalz
cjhQxuozKnRU9H+OPS3dXXoqDGdpLaLu6CsUsu8FGiJj3zLgUNxea+quJSnSmYVz
8fO5VqhgA3alu7R7zSF3MtOjLzyOoP5/+jDNiNUDLL8sUzg/3hKXLUgBO9R1VU4Q
yD0Yuhw5veNLOvF57xhMCk/quCydIvZV9kAJyTr+fgoY4b8wLyp+QAcqi2lGMCBj
4W9lXtO1abG+gu/m5zAhXLX7MS+ZRQtA070G+kmkY7Z95DtKePGitNjLN7+X9EI6
F1073D+GtiEOJhC+xNOc6Xzwpfl4vRghg4jj6aTkSSrb+sY5/byuKg06p8rMRfef
pJrqjprbBNqiAP95z7X9H6FWty31kx6ZVXtM8CA9/XDqabJGgGs0qmDwPVf264+M
8ySZy5wPRE85yNUKElpvDnx7+t1gka8vDy3bVO+zPsJV3ZqSwhgAiYTTL6u2f/Qe
QwMXWgu4PaAeq0Wltrd/OtA6Fu9H9A91rkk8t69ctPkTjZYCgN0UDGzaa0WpH9SB
H2mz2B3+AE2sdpuoBoVQ62SU4/7PiBIT/ILRuzgQnnNELZFStjstRZPrnVSAYRKI
E5ArRQHfMwbortIiz9KH8SoibTyS0QZiXuKua6LXPwGTnvYqsSN8Jz1XoytV3I5G
MRhLUI7k4dgaVHPTVUYb
=qBj6
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-12 19:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-12 15:22 [PATCH V4] Allow compaction of unevictable pages Eric B Munson
2015-03-12 15:26 ` Eric B Munson
2015-03-12 19:30 ` Michal Hocko
2015-03-12 19:45   ` Eric B Munson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox