linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, nish.aravamudan@gmail.com,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	clameter@sgi.com, ak@suse.de
Subject: [PATCH/RFC 8/8] Mapped File Policy: fix show_numa_maps()
Date: Thu, 24 May 2007 13:29:22 -0400	[thread overview]
Message-ID: <20070524172922.13933.8558.sendpatchset@localhost> (raw)
In-Reply-To: <20070524172821.13933.80093.sendpatchset@localhost>

Mapped file Policy 8/8 - fix show_numa_maps()

Against 2.6.22-rc2-mm1

This patch updates the procfs numa_maps display to handle multiple
shared policy ranges on a single vma.  numa_maps() still uses the
procfs task maps infrastructure, but provides wrappers around the
maps seq_file ops to handle shared policy "submaps", if any.

Also, this patch fixes a problem with numa_maps for shared mappings:
Before this [mapped file policy] patch series, numa_maps could show
you different results for shared mappings depending on which task you
examined.  A task which has installed shared policies on sub-ranges
of the shared region will show the policies on the sub-ranges, as the
vmas for that task were split when the policies were installed.  
Another task that shares the region, but didn't install any policies,
will show a single [default?] policy for the entire region as it is
mapped by a single vma in such a task.  By displaying the policies 
directly from the shared policy structure, we now see the same info
from each task that maps the segment.

The patch expands the proc_maps_private struct [#ifdef CONFIG_NUMA]
to track the existence of and progress through a submap for the
"current" vma.  For vmas with shared policy submaps, a new 
function--get_numa_submap()--in mm/mempolicy.c allocates and
populates an array of the policy ranges in the shared policy.
To facilitate this, the shared policy struct tracks the number
of ranges [sp_nodes] in the tree.

The nm_* numa_map seq_file wrappers pass the range to be displayed
to show_numa_map() via the saddr and eaddr members added to the
proc_maps_private struct.  The patch modifies show_numa_map() to
use these members, where appropriate, instead of vm_start, vm_end.

As before, once the internal page size buffer is full, seq_read()
suspends the display, drops the mmap_sem and exits the read.
During this time the vma list can change.  However, even within a
single seq_read(), the shared_policy "submap" can be changed by
other mappers.  We could only prevent this by holding the shared
policy spin_lock or otherwise holding off other mappers.  It doesn't
seem worth the effort, as the numa_map is only a snap_shot in any
case.  So, this patch makes a best effort [at least as good as
unpatched task map code, I think] to perform a single scan over the
address space, displaying the policies and page state/location
for policy ranges "snapped" under spin lock into the "submap"
array mentioned above.


Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 fs/proc/task_mmu.c            |  191 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/mempolicy.h     |    5 +
 include/linux/mm.h            |    6 +
 include/linux/proc_fs.h       |   12 ++
 include/linux/shared_policy.h |    3 
 mm/mempolicy.c                |   56 +++++++++++-
 6 files changed, 263 insertions(+), 10 deletions(-)

Index: Linux/include/linux/proc_fs.h
===================================================================
--- Linux.orig/include/linux/proc_fs.h	2007-05-23 10:57:07.000000000 -0400
+++ Linux/include/linux/proc_fs.h	2007-05-23 11:34:51.000000000 -0400
@@ -281,12 +281,24 @@ static inline struct proc_dir_entry *PDE
 	return PROC_I(inode)->pde;
 }
 
+struct mpol_range {
+	unsigned long saddr;
+	unsigned long eaddr;
+};
+
 struct proc_maps_private {
 	struct pid *pid;
 	struct task_struct *task;
 #ifdef CONFIG_MMU
 	struct vm_area_struct *tail_vma;
 #endif
+
+#ifdef CONFIG_NUMA
+	struct vm_area_struct *vma;	/* preserved over seq_reads */
+	unsigned long saddr;
+	unsigned long eaddr;		/* preserved over seq_reads */
+	struct mpol_range *range, *ranges; /* preserved ... */
+#endif
 };
 
 #endif /* _LINUX_PROC_FS_H */
Index: Linux/include/linux/shared_policy.h
===================================================================
--- Linux.orig/include/linux/shared_policy.h	2007-05-23 11:34:40.000000000 -0400
+++ Linux/include/linux/shared_policy.h	2007-05-23 11:34:51.000000000 -0400
@@ -25,7 +25,8 @@ struct sp_node {
 
 struct shared_policy {
 	struct rb_root root;
-	spinlock_t lock;
+	spinlock_t     lock;
+ 	int            nr_sp_nodes;	/* for numa_maps */
 };
 
 extern struct shared_policy *mpol_shared_policy_new(int, nodemask_t *);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h	2007-05-23 11:34:48.000000000 -0400
+++ Linux/include/linux/mm.h	2007-05-23 11:34:51.000000000 -0400
@@ -1070,6 +1070,12 @@ static inline pgoff_t vma_addr_to_pgoff(
 		(vma->vm_pgoff >> (shift - PAGE_SHIFT));
 }
 
+static inline pgoff_t vma_pgoff_to_addr(struct vm_area_struct *vma,
+		pgoff_t pgoff)
+{
+	return ((pgoff - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start;
+}
+
 int generic_file_set_policy(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end, struct mempolicy *new);
 struct mempolicy *generic_file_get_policy(struct vm_area_struct *vma,
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h	2007-05-23 11:34:46.000000000 -0400
+++ Linux/include/linux/mempolicy.h	2007-05-23 11:34:51.000000000 -0400
@@ -149,6 +149,11 @@ int do_migrate_pages(struct mm_struct *m
 
 extern void *cpuset_being_rebound;	/* Trigger mpol_copy vma rebind */
 
+struct seq_file;
+extern int show_numa_map(struct seq_file *, void *);
+struct mpol_range;
+extern struct mpol_range *get_numa_submap(struct vm_area_struct *);
+
 #else
 
 struct mempolicy {};
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c	2007-05-23 11:34:50.000000000 -0400
+++ Linux/mm/mempolicy.c	2007-05-23 11:34:51.000000000 -0400
@@ -1494,6 +1494,7 @@ static void sp_insert(struct shared_poli
 	}
 	rb_link_node(&new->nd, parent, p);
 	rb_insert_color(&new->nd, &sp->root);
+	++sp->nr_sp_nodes;
 	PDprintk("inserting %lx-%lx: %d\n", new->start, new->end,
 		 new->policy ? new->policy->policy : 0);
 }
@@ -1523,6 +1524,7 @@ static void sp_delete(struct shared_poli
 	rb_erase(&n->nd, &sp->root);
 	mpol_free(n->policy);
 	kmem_cache_free(sn_cache, n);
+	--sp->nr_sp_nodes;
 }
 
 struct sp_node *
@@ -1600,6 +1602,7 @@ struct shared_policy *mpol_shared_policy
 		return NULL;
 	sp->root = RB_ROOT;
 	spin_lock_init(&sp->lock);
+	sp->nr_sp_nodes = 0;
 
 	if (policy != MPOL_DEFAULT) {
 		struct mempolicy *newpol;
@@ -1932,9 +1935,9 @@ int show_numa_map(struct seq_file *m, vo
 		return 0;
 
 	mpol_to_str(buffer, sizeof(buffer),
-			    get_vma_policy(priv->task, vma, vma->vm_start));
+			    get_vma_policy(priv->task, vma, priv->saddr));
 
-	seq_printf(m, "%08lx %s", vma->vm_start, buffer);
+	seq_printf(m, "%08lx %s", priv->saddr, buffer);
 
 	if (file) {
 		seq_printf(m, " file=");
@@ -1947,10 +1950,10 @@ int show_numa_map(struct seq_file *m, vo
 	}
 
 	if (is_vm_hugetlb_page(vma)) {
-		check_huge_range(vma, vma->vm_start, vma->vm_end, md);
+		check_huge_range(vma, priv->saddr, priv->eaddr, md);
 		seq_printf(m, " huge");
 	} else {
-		check_pgd_range(vma, vma->vm_start, vma->vm_end,
+		check_pgd_range(vma, priv->saddr, priv->eaddr,
 				&node_online_map, MPOL_MF_STATS, md);
 	}
 
@@ -1990,3 +1993,48 @@ out:
 	return 0;
 }
 
+/*
+ * alloc/populate array of shared policy ranges for show_numa_map()
+ */
+struct mpol_range *get_numa_submap(struct vm_area_struct *vma)
+{
+	struct shared_policy *sp;
+	struct mpol_range *ranges, *range;
+	struct rb_node *rbn;
+	int nranges;
+
+	BUG_ON(!vma->vm_file);
+	sp = mapping_shared_policy(vma->vm_file->f_mapping);
+	if (!sp)
+		return NULL;
+
+	nranges = sp->nr_sp_nodes;
+	if (!nranges)
+		return NULL;
+
+	ranges = kzalloc((nranges + 1) * sizeof(*ranges), GFP_KERNEL);
+	if (!ranges)
+		return NULL;	/* pretend there are none */
+
+	range = ranges;
+	spin_lock(&sp->lock);
+	/*
+	 * # of ranges could have changes since we checked, but that is
+	 * unlikely, so this is close enough [as long as it's safe].
+	 */
+	rbn = rb_first(&sp->root);
+	/*
+	 * count nodes to ensure we leave one empty range struct
+	 * in case node added between check and alloc
+	 */
+	while (rbn && nranges--) {
+		struct sp_node *spn = rb_entry(rbn, struct sp_node, nd);
+		range->saddr = vma_pgoff_to_addr(vma, spn->start);
+		range->eaddr = vma_pgoff_to_addr(vma, spn->end);
+		++range;
+		rbn = rb_next(rbn);
+	}
+
+	spin_unlock(&sp->lock);
+	return ranges;
+}
Index: Linux/fs/proc/task_mmu.c
===================================================================
--- Linux.orig/fs/proc/task_mmu.c	2007-05-23 10:57:02.000000000 -0400
+++ Linux/fs/proc/task_mmu.c	2007-05-23 11:34:51.000000000 -0400
@@ -498,7 +498,188 @@ const struct file_operations proc_clear_
 #endif
 
 #ifdef CONFIG_NUMA
-extern int show_numa_map(struct seq_file *m, void *v);
+/*
+ * numa_maps uses procfs task maps file operations, with wrappers
+ * to handle mpol submaps--policy ranges within a vma
+ */
+
+/*
+ * start processing a new vma for show_numa_maps
+ */
+static void nm_vma_start(struct proc_maps_private *priv,
+			struct vm_area_struct *vma)
+{
+	if (!vma)
+		return;
+	priv->vma = vma;	/* saved across read()s */
+
+	priv->saddr = vma->vm_start;
+	if (!(vma->vm_flags & VM_SHARED) || !vma->vm_file ||
+		!vma->vm_file->f_mapping->spolicy) {
+		/*
+		 * usual case:  no submap
+		 */
+		priv->eaddr = vma->vm_end;
+		return;
+	}
+
+	priv->range = priv->ranges = get_numa_submap(vma);
+	if (!priv->range) {
+		priv->eaddr = vma->vm_end;	/* empty shared policy */
+		return;
+	}
+
+	/*
+	 * restart suspended submap where we left off
+	 */
+	while (priv->range->eaddr && priv->range->eaddr < priv->eaddr)
+		++priv->range;
+
+	if (!priv->range->eaddr)
+		priv->eaddr = vma->vm_end;
+	else if (priv->saddr < priv->range->saddr)
+		priv->eaddr = priv->range->saddr; /* show gap [default pol] */
+	else
+		priv->eaddr = priv->range->eaddr; /* show range */
+}
+
+/*
+ * done with numa_maps vma:  reset so we start a new
+ * vma on next seq_read.
+ */
+static void nm_vma_stop(struct proc_maps_private *priv)
+{
+	if (priv->ranges)
+		kfree(priv->ranges);
+	priv->ranges = priv->range = NULL;
+	priv->vma = NULL;
+}
+
+/*
+ * Advance to next vma in mm or next subrange in vma.
+ * mmap_sem held during a single seq_read(), but shared
+ * policy ranges can be modified at any time by other
+ * mappers.  We just continue to display the ranges we
+ * found when we started the vma.
+ */
+static void *nm_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct proc_maps_private *priv = m->private;
+	struct vm_area_struct *vma = v;
+
+	if (!priv->range || priv->eaddr >= vma->vm_end) {
+		/*
+		 * usual case:  no submap or end of vma
+		 * re: '>=' -- in case we got here from nm_start()
+		 * and vma @ pos truncated to < priv->eaddr
+		 */
+		nm_vma_stop(priv);
+		vma = m_next(m, v, pos);
+		nm_vma_start(priv, vma);
+		return vma;
+	}
+
+	/*
+	 * Advance to next range in submap
+	 */
+	priv->saddr = priv->eaddr;
+	if (priv->eaddr == priv->range->saddr) {
+		/*
+		 * just processed a gap in the submap
+		 */
+		priv->eaddr = min(priv->range->eaddr, vma->vm_end);
+		return vma;	/* show the range */
+	}
+
+	++priv->range;
+	if (!priv->range->eaddr)
+		priv->eaddr = vma->vm_end;	/* past end of ranges */
+	else if (priv->saddr < priv->range->saddr)
+		priv->eaddr = priv->range->saddr; /* gap in submap */
+	else
+		priv->eaddr = min(priv->range->eaddr, vma->vm_end);
+
+	return vma;
+}
+
+/*
+ * [Re]start scan for new seq_read().
+ * N.B., much could have changes in mm, as we dropped the mmap_sem
+ * between reads().  Need to call m_start() to find vma at pos.
+ */
+static void *nm_start(struct seq_file *m, loff_t *pos)
+{
+	struct proc_maps_private *priv = m->private;
+	struct vm_area_struct *vma;
+
+	if (!priv->range) {
+		/*
+		 * usual case:  1st after open, or finished prev vma
+		 */
+		vma = m_start(m, pos);
+		nm_vma_start(priv, vma);
+		return vma;
+	}
+
+	/*
+	 * Continue with submap of "current" vma.  However, vma could have
+	 * been unmapped, split, truncated, ... between read()s.
+	 * Reset "last_addr" to simulate seek;  find vma by 'pos'.
+	 */
+	m->version = 0;
+	--(*pos);		/* seq_read() incremented it */
+	vma = m_start(m, pos);
+	if (vma != priv->vma)
+		goto new_vma;
+	/*
+	 * Same vma, but could have different ranges or could be entirely
+	 * different vma.
+	 */
+	if (vma->vm_start > priv->eaddr)
+		goto new_vma;	/* starts past last range displayed */
+	if (priv->eaddr < vma->vm_end) {
+		/*
+		 * vma at pos still covers eaddr--where we left off.  Submap
+		 * could have changed, but we'll keep reporting ranges we found
+		 * earlier up to vm_end.
+		 * We hope it is very unlikely that submap changed.
+		 */
+		return nm_next(m, vma, pos);
+	}
+
+	/*
+	 * Already reported past end of vma; find next vma past eaddr
+	 */
+	while (vma && vma->vm_end < priv->eaddr)
+		vma = m_next(m, vma, pos);
+
+new_vma:
+	/*
+	 * new vma at pos;  continue from ~ last eaddr
+	 */
+	nm_vma_stop(priv);
+	nm_vma_start(priv, vma);
+	return vma;
+}
+
+/*
+ * Suspend display of numa_map--e.g., buffer full?
+ */
+static void nm_stop(struct seq_file *m, void *v)
+{
+	struct proc_maps_private *priv = m->private;
+	struct vm_area_struct *vma = v;
+
+	if (!vma || priv->eaddr >= vma->vm_end) {
+		nm_vma_stop(priv);
+	}
+	/*
+	 * leave state in priv for nm_start(); but drop the
+	 * mmap_sem and unref the mm
+	 */
+	m_stop(m, v);
+}
+
 
 static int show_numa_map_checked(struct seq_file *m, void *v)
 {
@@ -512,10 +693,10 @@ static int show_numa_map_checked(struct 
 }
 
 static struct seq_operations proc_pid_numa_maps_op = {
-        .start  = m_start,
-        .next   = m_next,
-        .stop   = m_stop,
-        .show   = show_numa_map_checked
+	.start  = nm_start,
+	.next   = nm_next,
+	.stop   = nm_stop,
+	.show   = show_numa_map_checked
 };
 
 static int numa_maps_open(struct inode *inode, struct file *file)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-05-24 17:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-24 17:28 [PATCH/RFC 0/8] Mapped File Policy Overview Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 1/8] Mapped File Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 2/8] Mapped File Policy: allocate shared policies as needed Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 3/8] Mapped File Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 4/8] Mapped File Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 5/8] Mapped File Policy: Factor alloc_page_pol routine Lee Schermerhorn
2007-05-24 17:29 ` [PATCH/RFC 6/8] Mapped File Policy: use file policy for page cache allocations Lee Schermerhorn
2007-05-24 17:29 ` [PATCH/RFC 7/8] Mapped File Policy: fix migration of private mappings Lee Schermerhorn
2007-05-24 17:29 ` Lee Schermerhorn [this message]
2007-05-24 19:24 ` [PATCH/RFC 0/8] Mapped File Policy Overview Christoph Lameter
2007-05-24 20:46   ` Lee Schermerhorn
2007-05-24 20:41 ` Andi Kleen
2007-05-24 21:05   ` Lee Schermerhorn
2007-05-24 21:17     ` Christoph Lameter
2007-05-25 14:55       ` Lee Schermerhorn
2007-05-25 15:25         ` Christoph Lameter
2007-05-25 16:06           ` Lee Schermerhorn
2007-05-25 16:24             ` Christoph Lameter
2007-05-25 17:37               ` Lee Schermerhorn
2007-05-25 19:10                 ` Christoph Lameter
2007-05-25 21:12                   ` Lee Schermerhorn
2007-05-25 21:43                     ` Christoph Lameter
2007-05-25 21:01                 ` Andi Kleen
2007-05-25 21:41                   ` Lee Schermerhorn
2007-05-25 21:46                     ` Christoph Lameter
2007-05-29 13:57                       ` Lee Schermerhorn
2007-05-25 21:03           ` Andi Kleen
2007-05-25 21:14             ` Lee Schermerhorn
2007-05-25 22:44               ` Andi Kleen
2007-05-29 14:17                 ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070524172922.13933.8558.sendpatchset@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=nish.aravamudan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox