linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Steve Longerbeam <stevel@mvista.com>
Cc: Ray Bryant <raybry@sgi.com>, Andi Kleen <ak@muc.de>,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	Dave Hansen <haveblue@us.ibm.com>,
	Marcello Tosatti <marcelo.tosatti@cyclades.com>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, andrew morton <akpm@osdl.org>
Subject: Re: page migration patchset
Date: Wed, 5 Jan 2005 20:55:05 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.44.0501052008160.8705-100000@localhost.localdomain> (raw)
In-Reply-To: <41DC34EF.7010507@mvista.com>

Hi Steve,

On Wed, 5 Jan 2005, Steve Longerbeam wrote:
> 
> Note that Andrew had to drop my patch from 2.6.10, because
> the 4-level page tables feature was re-implemented using a
> different interface, which broke my patch. So Andrew asked me
> to re-do the patch for inclusion in 2.6.11. That gives us ~2 months
> to work on integrating the page migration and NUMA mempolicy
> filemap patches.

Something I found odd about your patch was that you had filemap.c
doing NUMA policy one way (via mapping->policy), but left shmem.c
doing NUMA policy another way (via info->policy).  I was preparing
a patch against 2.6.10-rc3-mm1 to clean that up, but got diverted.

Or was I missing a significant distinction?

I seem also to have concluded that destroy_inode ought always to
do the mpol_free_shared_policy itself rather than leaving it to the
filesystem's ->destroy_inode; but offhand can't remember my reasoning
(just to match alloc_inode doing its _init, or a more vital reason?).
Does that make sense to you?

Below is the patch I was working on then (like you I don't have NUMA,
so it was only build tested): would you like to factor it into yours,
or would you prefer me to come along and add it to -mm after yours
has gone in?

I did think your patch would be better split into two (if not more):
the (straightforward) implementation of mapping->policy, and then
the (more complex) page migration business on top of that.

There is more cleanup I'd like to do (or even better, let someone
else do!) in that area: not originating in your patch, but I loathe
the way the vma interface demands construction of a temporary struct
vm_area_struct (pvma) on the stack to get things done - to me that
just indicates the interface is wrong.  The user interface must of
course stay, but should be better handled internally.  Separate job.

(And I still don't know what should be done about NUMA policy versus
swap: it has not been anyone's priority, but swapin_readahead's NUMA
belief that swap is laid out linearly following vmas is quite wrong.
Should page migration be used instead?  Should swap be divided into
per-node extents?  Does swap readahead really serve a useful purpose,
or could we just delete that code?  Should NUMA policy on a file be
determining NUMA policy on private swap copies of that file?  Feel
free to ignore these questions, they're really not on your track;
but I can't glance at that code without wondering, and someone
reading this mail might have better ideas.)

Hugh

--- 2.6.10-rc3-mm1/fs/inode.c	2004-12-14 11:15:38.000000000 +0000
+++ linux/fs/inode.c	2004-12-14 12:02:01.751655096 +0000
@@ -178,12 +178,11 @@ void destroy_inode(struct inode *inode) 
 	if (inode_has_buffers(inode))
 		BUG();
 	security_inode_free(inode);
+	mpol_free_shared_policy(&inode->i_mapping->policy);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
-	else {
-		mpol_free_shared_policy(&inode->i_mapping->policy);
+	else
 		kmem_cache_free(inode_cachep, (inode));
-	}
 }
 EXPORT_SYMBOL(destroy_inode);
 
--- 2.6.10-rc3-mm1/include/linux/mempolicy.h	2004-12-14 11:15:39.000000000 +0000
+++ linux/include/linux/mempolicy.h	2004-12-14 12:02:01.784650080 +0000
@@ -156,6 +156,9 @@ struct page *alloc_page_shared_policy(un
 extern void numa_default_policy(void);
 extern void numa_policy_init(void);
 
+int generic_file_set_policy(struct vm_area_struct *, struct mempolicy *);
+struct mempolicy *generic_file_get_policy(struct vm_area_struct *, unsigned long);
+
 #else
 
 struct mempolicy {};
--- 2.6.10-rc3-mm1/include/linux/mm.h	2004-12-14 11:15:39.000000000 +0000
+++ linux/include/linux/mm.h	2004-12-14 12:02:01.834642480 +0000
@@ -555,15 +555,10 @@ extern void show_free_areas(void);
 #ifdef CONFIG_SHMEM
 struct page *shmem_nopage(struct vm_area_struct *vma,
 			unsigned long address, int *type);
-int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new);
-struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
-					unsigned long addr);
 int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
 #define shmem_nopage filemap_nopage
 #define shmem_lock(a, b, c) 	({0;})	/* always in memory, no need to lock */
-#define shmem_set_policy(a, b)	(0)
-#define shmem_get_policy(a, b)	(NULL)
 #endif
 struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags);
 
--- 2.6.10-rc3-mm1/include/linux/shmem_fs.h	2004-10-18 22:56:50.000000000 +0100
+++ linux/include/linux/shmem_fs.h	2004-12-14 12:02:01.835642328 +0000
@@ -14,7 +14,6 @@ struct shmem_inode_info {
 	unsigned long		alloced;	/* data pages alloced to file */
 	unsigned long		swapped;	/* subtotal assigned to swap */
 	unsigned long		next_index;	/* highest alloced index + 1 */
-	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct page		*i_indirect;	/* top indirect blocks page */
 	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
 	struct list_head	swaplist;	/* chain of maybes on swap */
--- 2.6.10-rc3-mm1/ipc/shm.c	2004-12-14 11:15:39.000000000 +0000
+++ linux/ipc/shm.c	2004-12-14 12:02:01.885634728 +0000
@@ -168,8 +168,8 @@ static struct vm_operations_struct shm_v
 	.close	= shm_close,	/* callback for when the vm-area is released */
 	.nopage	= shmem_nopage,
 #ifdef CONFIG_NUMA
-	.set_policy = shmem_set_policy,
-	.get_policy = shmem_get_policy,
+	.set_policy = generic_file_set_policy,
+	.get_policy = generic_file_get_policy,
 #endif
 };
 
--- 2.6.10-rc3-mm1/mm/shmem.c	2004-12-14 11:15:40.000000000 +0000
+++ linux/mm/shmem.c	2004-12-14 12:02:01.930627888 +0000
@@ -879,10 +879,10 @@ static struct page *shmem_swapin_async(s
 	return page;
 }
 
-struct page *shmem_swapin(struct shmem_inode_info *info, swp_entry_t entry,
-			  unsigned long idx)
+struct page *shmem_swapin(struct address_space *mapping,
+			swp_entry_t entry, unsigned long idx)
 {
-	struct shared_policy *p = &info->policy;
+	struct shared_policy *p = &mapping->policy;
 	int i, num;
 	struct page *page;
 	unsigned long offset;
@@ -898,27 +898,13 @@ struct page *shmem_swapin(struct shmem_i
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 	return shmem_swapin_async(p, entry, idx);
 }
-
-static struct page *
-shmem_alloc_page(unsigned long gfp, struct shmem_inode_info *info,
-		 unsigned long idx)
-{
-	return alloc_page_shared_policy(gfp, &info->policy, idx);
-}
 #else
-static inline struct page *
-shmem_swapin(struct shmem_inode_info *info,swp_entry_t entry,unsigned long idx)
+static inline struct page *shmem_swapin(struct address_space *mapping,
+			swp_entry_t entry, unsigned long idx)
 {
 	swapin_readahead(entry, 0, NULL);
 	return read_swap_cache_async(entry, NULL, 0);
 }
-
-static inline struct page *
-shmem_alloc_page(unsigned long gfp,struct shmem_inode_info *info,
-				 unsigned long idx)
-{
-	return alloc_page(gfp);
-}
 #endif
 
 /*
@@ -980,7 +966,7 @@ repeat:
 				inc_page_state(pgmajfault);
 				*type = VM_FAULT_MAJOR;
 			}
-			swappage = shmem_swapin(info, swap, idx);
+			swappage = shmem_swapin(mapping, swap, idx);
 			if (!swappage) {
 				spin_lock(&info->lock);
 				entry = shmem_swp_alloc(info, idx, sgp);
@@ -1092,9 +1078,7 @@ repeat:
 
 		if (!filepage) {
 			spin_unlock(&info->lock);
-			filepage = shmem_alloc_page(mapping_gfp_mask(mapping),
-						    info,
-						    idx);
+			filepage = page_cache_alloc(mapping, idx);
 			if (!filepage) {
 				shmem_unacct_blocks(info->flags, 1);
 				shmem_free_blocks(inode, 1);
@@ -1206,24 +1190,6 @@ static int shmem_populate(struct vm_area
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
-int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
-{
-	struct inode *i = vma->vm_file->f_dentry->d_inode;
-	return mpol_set_shared_policy(&SHMEM_I(i)->policy, vma, new);
-}
-
-struct mempolicy *
-shmem_get_policy(struct vm_area_struct *vma, unsigned long addr)
-{
-	struct inode *i = vma->vm_file->f_dentry->d_inode;
-	unsigned long idx;
-
-	idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
-	return mpol_shared_policy_lookup(&SHMEM_I(i)->policy, idx);
-}
-#endif
-
 int shmem_lock(struct file *file, int lock, struct user_struct *user)
 {
 	struct inode *inode = file->f_dentry->d_inode;
@@ -1293,7 +1259,6 @@ shmem_get_inode(struct super_block *sb, 
 		case S_IFREG:
 			inode->i_op = &shmem_inode_operations;
 			inode->i_fop = &shmem_file_operations;
-			mpol_shared_policy_init(&info->policy);
 			break;
 		case S_IFDIR:
 			inode->i_nlink++;
@@ -1303,11 +1268,6 @@ shmem_get_inode(struct super_block *sb, 
 			inode->i_fop = &simple_dir_operations;
 			break;
 		case S_IFLNK:
-			/*
-			 * Must not load anything in the rbtree,
-			 * mpol_free_shared_policy will not be called.
-			 */
-			mpol_shared_policy_init(&info->policy);
 			break;
 		}
 	} else if (sbinfo) {
@@ -2026,10 +1986,6 @@ static struct inode *shmem_alloc_inode(s
 
 static void shmem_destroy_inode(struct inode *inode)
 {
-	if ((inode->i_mode & S_IFMT) == S_IFREG) {
-		/* only struct inode is valid if it's an inline symlink */
-		mpol_free_shared_policy(&SHMEM_I(inode)->policy);
-	}
 	kmem_cache_free(shmem_inode_cachep, SHMEM_I(inode));
 }
 
@@ -2135,8 +2091,8 @@ static struct vm_operations_struct shmem
 	.nopage		= shmem_nopage,
 	.populate	= shmem_populate,
 #ifdef CONFIG_NUMA
-	.set_policy     = shmem_set_policy,
-	.get_policy     = shmem_get_policy,
+	.set_policy     = generic_file_set_policy,
+	.get_policy     = generic_file_get_policy,
 #endif
 };
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  parent reply	other threads:[~2005-01-05 20:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-05  0:32 Ray Bryant
2005-01-05  2:07 ` Andi Kleen
2005-01-05  3:20   ` Ray Bryant
2005-01-05 18:41     ` Steve Longerbeam
2005-01-05 19:23       ` Ray Bryant
2005-01-05 23:00         ` Steve Longerbeam
2005-01-05 23:16           ` Ray Bryant
2005-01-05 20:55       ` Hugh Dickins [this message]
     [not found]         ` <41DC7EAD.8010407@mvista.com>
2005-01-06 14:43           ` Andi Kleen
2005-01-06 16:00             ` Ray Bryant
2005-01-06 17:50               ` Christoph Lameter
2005-01-06 19:29                 ` Andi Kleen
2005-01-06 22:30             ` William Lee Irwin III
2005-01-06 23:08               ` Andrew Morton
2005-01-06 23:15                 ` William Lee Irwin III
2005-01-06 23:21               ` Ray Bryant
2005-01-06 23:35                 ` William Lee Irwin III
2005-01-06 23:53               ` Anton Blanchard
2005-01-07  0:06                 ` William Lee Irwin III
2005-01-07  0:31                 ` Andi Kleen
2005-01-06 23:43             ` Steve Longerbeam
2005-01-06 23:58               ` William Lee Irwin III
2005-01-11 15:38       ` Ray Bryant
2005-01-11 19:00         ` Steve Longerbeam
2005-01-11 19:30           ` Ray Bryant
2005-01-11 20:59             ` Steve Longerbeam
2005-01-12 12:35         ` Robin Holt
2005-01-12 18:12           ` Hugh Dickins
2005-01-12 18:45             ` Ray Bryant
2005-01-12 18:53             ` Andrew Morton
2005-01-06 20:59 Ray Bryant
2005-01-06 23:04 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0501052008160.8705-100000@localhost.localdomain \
    --to=hugh@veritas.com \
    --cc=ak@muc.de \
    --cc=akpm@osdl.org \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=raybry@sgi.com \
    --cc=stevel@mvista.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox