linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tmpfs not interleaving properly
@ 2012-05-23 13:28 Nathan Zimmer
  2012-05-23 20:03 ` Rik van Riel
  2012-05-23 22:20 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: Nathan Zimmer @ 2012-05-23 13:28 UTC (permalink / raw)
  To: Hugh Dickins, Nick Piggin, Christoph Lameter, Lee Schermerhorn, akpm
  Cc: linux-kernel, linux-mm, stable


When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
When there are many small files the lower nodes fill up disproportionately.
My proposed solution is to start a file at a randomly chosen node.

Cc: Christoph Lameter <cl@linux.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nathan T Zimmer <nzimmer@sgi.com>


diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 79ab255..38eda26 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -17,6 +17,7 @@ struct shmem_inode_info {
 		char		*symlink;	/* unswappable short symlink */
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
+	int			node_offset;	/* bias for interleaved nodes */
 	struct list_head	swaplist;	/* chain of maybes on swap */
 	struct list_head	xattr_list;	/* list of shmem_xattr */
 	struct inode		vfs_inode;
diff --git a/mm/shmem.c b/mm/shmem.c
index f99ff3e..58ef512 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -819,7 +819,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
-	pvma.vm_pgoff = index;
+	pvma.vm_pgoff = index + info->node_offset;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
@@ -1153,6 +1153,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			inode->i_fop = &shmem_file_operations;
 			mpol_shared_policy_init(&info->policy,
 						 shmem_get_sbmpol(sbinfo));
+			info->node_offset = node_random(&node_online_map);
 			break;
 		case S_IFDIR:
 			inc_nlink(inode);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tmpfs not interleaving properly
  2012-05-23 13:28 [PATCH] tmpfs not interleaving properly Nathan Zimmer
@ 2012-05-23 20:03 ` Rik van Riel
  2012-05-23 22:20 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2012-05-23 20:03 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Hugh Dickins, Nick Piggin, Christoph Lameter, Lee Schermerhorn,
	akpm, linux-kernel, linux-mm, stable

On 05/23/2012 09:28 AM, Nathan Zimmer wrote:
>
> When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
> When there are many small files the lower nodes fill up disproportionately.
> My proposed solution is to start a file at a randomly chosen node.
>
> Cc: Christoph Lameter<cl@linux.com>
> Cc: Nick Piggin<npiggin@gmail.com>
> Cc: Hugh Dickins<hughd@google.com>
> Cc: Lee Schermerhorn<lee.schermerhorn@hp.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Nathan T Zimmer<nzimmer@sgi.com>

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tmpfs not interleaving properly
  2012-05-23 13:28 [PATCH] tmpfs not interleaving properly Nathan Zimmer
  2012-05-23 20:03 ` Rik van Riel
@ 2012-05-23 22:20 ` Andrew Morton
  2012-05-25 20:46   ` Nathan Zimmer
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2012-05-23 22:20 UTC (permalink / raw)
  To: Nathan Zimmer
  Cc: Hugh Dickins, Nick Piggin, Christoph Lameter, Lee Schermerhorn,
	linux-kernel, linux-mm, stable

On Wed, 23 May 2012 13:28:21 +0000
Nathan Zimmer <nzimmer@sgi.com> wrote:

> 
> When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
> When there are many small files the lower nodes fill up disproportionately.
> My proposed solution is to start a file at a randomly chosen node.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -17,6 +17,7 @@ struct shmem_inode_info {
>  		char		*symlink;	/* unswappable short symlink */
>  	};
>  	struct shared_policy	policy;		/* NUMA memory alloc policy */
> +	int			node_offset;	/* bias for interleaved nodes */
>  	struct list_head	swaplist;	/* chain of maybes on swap */
>  	struct list_head	xattr_list;	/* list of shmem_xattr */
>  	struct inode		vfs_inode;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index f99ff3e..58ef512 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -819,7 +819,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
>  
>  	/* Create a pseudo vma that just contains the policy */
>  	pvma.vm_start = 0;
> -	pvma.vm_pgoff = index;
> +	pvma.vm_pgoff = index + info->node_offset;
>  	pvma.vm_ops = NULL;
>  	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
>  
> @@ -1153,6 +1153,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
>  			inode->i_fop = &shmem_file_operations;
>  			mpol_shared_policy_init(&info->policy,
>  						 shmem_get_sbmpol(sbinfo));
> +			info->node_offset = node_random(&node_online_map);
>  			break;
>  		case S_IFDIR:
>  			inc_nlink(inode);

The patch seems a bit arbitrary and hacky.  It would have helped if you
had fully described how it works, and why this implementation was
chosen.

- Why alter (actually, lie about!) the offset-into-file?  Could we
  have similarly perturbed the address arg to alloc_page_vma() to do
  the spreading?

- The patch is dependent upon MPOL_INTERLEAVE being in effect, isn't
  it?  How do we guarantee that it is in force here?

- We look up the policy via mpol_shared_policy_lookup() using the
  unperturbed index.  Why?  Should we be using index+info->node_offset
  there?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tmpfs not interleaving properly
  2012-05-23 22:20 ` Andrew Morton
@ 2012-05-25 20:46   ` Nathan Zimmer
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Zimmer @ 2012-05-25 20:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nathan Zimmer, Hugh Dickins, Nick Piggin, Christoph Lameter,
	Lee Schermerhorn, linux-kernel, linux-mm, stable

On Wed, May 23, 2012 at 03:20:11PM -0700, Andrew Morton wrote:
> On Wed, 23 May 2012 13:28:21 +0000
> Nathan Zimmer <nzimmer@sgi.com> wrote:
> 
> > 
> > When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
> > When there are many small files the lower nodes fill up disproportionately.
> > My proposed solution is to start a file at a randomly chosen node.
> > 
> > ...
> >
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -17,6 +17,7 @@ struct shmem_inode_info {
> >  		char		*symlink;	/* unswappable short symlink */
> >  	};
> >  	struct shared_policy	policy;		/* NUMA memory alloc policy */
> > +	int			node_offset;	/* bias for interleaved nodes */
> >  	struct list_head	swaplist;	/* chain of maybes on swap */
> >  	struct list_head	xattr_list;	/* list of shmem_xattr */
> >  	struct inode		vfs_inode;
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index f99ff3e..58ef512 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -819,7 +819,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> >  
> >  	/* Create a pseudo vma that just contains the policy */
> >  	pvma.vm_start = 0;
> > -	pvma.vm_pgoff = index;
> > +	pvma.vm_pgoff = index + info->node_offset;
> >  	pvma.vm_ops = NULL;
> >  	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
> >  
> > @@ -1153,6 +1153,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
> >  			inode->i_fop = &shmem_file_operations;
> >  			mpol_shared_policy_init(&info->policy,
> >  						 shmem_get_sbmpol(sbinfo));
> > +			info->node_offset = node_random(&node_online_map);
> >  			break;
> >  		case S_IFDIR:
> >  			inc_nlink(inode);
> 
> The patch seems a bit arbitrary and hacky.  It would have helped if you
> had fully described how it works, and why this implementation was
> chosen.
> 
The patch attempt to spread out the node usage by starting files at nodes other
then 0.  node_offset is set to a random node when the inode is allocated.  

> - Why alter (actually, lie about!) the offset-into-file?  Could we
>   have similarly perturbed the address arg to alloc_page_vma() to do
>   the spreading?
> 
Using the address arg would be better.  It also makes clear that we should
still be using the index for looking up the memory policy.

> - The patch is dependent upon MPOL_INTERLEAVE being in effect, isn't
>   it?  How do we guarantee that it is in force here?
> 
The node_offset is only used when MPOL_INTERLEAVE is in effect. However
node_offset is set unconditionally.  It would be quite easy to only generate
the offset when the policy is set to interleave. 

> - We look up the policy via mpol_shared_policy_lookup() using the
>   unperturbed index.  Why?  Should we be using index+info->node_offset
>   there?
> 
This concern should be obviated using the address arg instead of 'altering' the
vm_pgoff.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] tmpfs not interleaving properly
@ 2012-05-16 20:00 Nathan Zimmer
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Zimmer @ 2012-05-16 20:00 UTC (permalink / raw)
  To: Hugh Dickins, Nick Piggin, Christoph Lameter, Lee Schermerhorn
  Cc: linux-kernel, linux-mm, stable

When tmpfs has the memory policy interleaved it always starts allocating at each file at node 0.
When there are many small files the lower nodes fill up disproportionately.
My proposed solution is to start a file at a randomly chosen node.

Cc: Christoph Lameter <cl@linux.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nathan T Zimmer <nzimmer@sgi.com>


diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 79ab255..38eda26 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -17,6 +17,7 @@ struct shmem_inode_info {
 		char		*symlink;	/* unswappable short symlink */
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
+	int			node_offset;	/* bias for interleaved nodes */
 	struct list_head	swaplist;	/* chain of maybes on swap */
 	struct list_head	xattr_list;	/* list of shmem_xattr */
 	struct inode		vfs_inode;
diff --git a/mm/shmem.c b/mm/shmem.c
index f99ff3e..58ef512 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -819,7 +819,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 
 	/* Create a pseudo vma that just contains the policy */
 	pvma.vm_start = 0;
-	pvma.vm_pgoff = index;
+	pvma.vm_pgoff = index + info->node_offset;
 	pvma.vm_ops = NULL;
 	pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
 
@@ -1153,6 +1153,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			inode->i_fop = &shmem_file_operations;
 			mpol_shared_policy_init(&info->policy,
 						 shmem_get_sbmpol(sbinfo));
+			info->node_offset = node_random(&node_online_map);
 			break;
 		case S_IFDIR:
 			inc_nlink(inode);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-05-25 20:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-23 13:28 [PATCH] tmpfs not interleaving properly Nathan Zimmer
2012-05-23 20:03 ` Rik van Riel
2012-05-23 22:20 ` Andrew Morton
2012-05-25 20:46   ` Nathan Zimmer
  -- strict thread matches above, loose matches on Subject: below --
2012-05-16 20:00 Nathan Zimmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox