From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, nish.aravamudan@gmail.com,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
clameter@sgi.com, ak@suse.de
Subject: [PATCH/RFC 8/8] Mapped File Policy: fix show_numa_maps()
Date: Thu, 24 May 2007 13:29:22 -0400 [thread overview]
Message-ID: <20070524172922.13933.8558.sendpatchset@localhost> (raw)
In-Reply-To: <20070524172821.13933.80093.sendpatchset@localhost>
Mapped file Policy 8/8 - fix show_numa_maps()
Against 2.6.22-rc2-mm1
This patch updates the procfs numa_maps display to handle multiple
shared policy ranges on a single vma. numa_maps() still uses the
procfs task maps infrastructure, but provides wrappers around the
maps seq_file ops to handle shared policy "submaps", if any.
Also, this patch fixes a problem with numa_maps for shared mappings:
Before this [mapped file policy] patch series, numa_maps could show
you different results for shared mappings depending on which task you
examined. A task which has installed shared policies on sub-ranges
of the shared region will show the policies on the sub-ranges, as the
vmas for that task were split when the policies were installed.
Another task that shares the region, but didn't install any policies,
will show a single [default?] policy for the entire region as it is
mapped by a single vma in such a task. By displaying the policies
directly from the shared policy structure, we now see the same info
from each task that maps the segment.
The patch expands the proc_maps_private struct [#ifdef CONFIG_NUMA]
to track the existence of and progress through a submap for the
"current" vma. For vmas with shared policy submaps, a new
function--get_numa_submap()--in mm/mempolicy.c allocates and
populates an array of the policy ranges in the shared policy.
To facilitate this, the shared policy struct tracks the number
of ranges [sp_nodes] in the tree.
The nm_* numa_map seq_file wrappers pass the range to be displayed
to show_numa_map() via the saddr and eaddr members added to the
proc_maps_private struct. The patch modifies show_numa_map() to
use these members, where appropriate, instead of vm_start, vm_end.
As before, once the internal page size buffer is full, seq_read()
suspends the display, drops the mmap_sem and exits the read.
During this time the vma list can change. However, even within a
single seq_read(), the shared_policy "submap" can be changed by
other mappers. We could only prevent this by holding the shared
policy spin_lock or otherwise holding off other mappers. It doesn't
seem worth the effort, as the numa_map is only a snap_shot in any
case. So, this patch makes a best effort [at least as good as
unpatched task map code, I think] to perform a single scan over the
address space, displaying the policies and page state/location
for policy ranges "snapped" under spin lock into the "submap"
array mentioned above.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
fs/proc/task_mmu.c | 191 ++++++++++++++++++++++++++++++++++++++++--
include/linux/mempolicy.h | 5 +
include/linux/mm.h | 6 +
include/linux/proc_fs.h | 12 ++
include/linux/shared_policy.h | 3
mm/mempolicy.c | 56 +++++++++++-
6 files changed, 263 insertions(+), 10 deletions(-)
Index: Linux/include/linux/proc_fs.h
===================================================================
--- Linux.orig/include/linux/proc_fs.h 2007-05-23 10:57:07.000000000 -0400
+++ Linux/include/linux/proc_fs.h 2007-05-23 11:34:51.000000000 -0400
@@ -281,12 +281,24 @@ static inline struct proc_dir_entry *PDE
return PROC_I(inode)->pde;
}
+struct mpol_range {
+ unsigned long saddr;
+ unsigned long eaddr;
+};
+
struct proc_maps_private {
struct pid *pid;
struct task_struct *task;
#ifdef CONFIG_MMU
struct vm_area_struct *tail_vma;
#endif
+
+#ifdef CONFIG_NUMA
+ struct vm_area_struct *vma; /* preserved over seq_reads */
+ unsigned long saddr;
+ unsigned long eaddr; /* preserved over seq_reads */
+ struct mpol_range *range, *ranges; /* preserved ... */
+#endif
};
#endif /* _LINUX_PROC_FS_H */
Index: Linux/include/linux/shared_policy.h
===================================================================
--- Linux.orig/include/linux/shared_policy.h 2007-05-23 11:34:40.000000000 -0400
+++ Linux/include/linux/shared_policy.h 2007-05-23 11:34:51.000000000 -0400
@@ -25,7 +25,8 @@ struct sp_node {
struct shared_policy {
struct rb_root root;
- spinlock_t lock;
+ spinlock_t lock;
+ int nr_sp_nodes; /* for numa_maps */
};
extern struct shared_policy *mpol_shared_policy_new(int, nodemask_t *);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h 2007-05-23 11:34:48.000000000 -0400
+++ Linux/include/linux/mm.h 2007-05-23 11:34:51.000000000 -0400
@@ -1070,6 +1070,12 @@ static inline pgoff_t vma_addr_to_pgoff(
(vma->vm_pgoff >> (shift - PAGE_SHIFT));
}
+static inline pgoff_t vma_pgoff_to_addr(struct vm_area_struct *vma,
+ pgoff_t pgoff)
+{
+ return ((pgoff - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start;
+}
+
int generic_file_set_policy(struct vm_area_struct *vma,
unsigned long start, unsigned long end, struct mempolicy *new);
struct mempolicy *generic_file_get_policy(struct vm_area_struct *vma,
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h 2007-05-23 11:34:46.000000000 -0400
+++ Linux/include/linux/mempolicy.h 2007-05-23 11:34:51.000000000 -0400
@@ -149,6 +149,11 @@ int do_migrate_pages(struct mm_struct *m
extern void *cpuset_being_rebound; /* Trigger mpol_copy vma rebind */
+struct seq_file;
+extern int show_numa_map(struct seq_file *, void *);
+struct mpol_range;
+extern struct mpol_range *get_numa_submap(struct vm_area_struct *);
+
#else
struct mempolicy {};
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-05-23 11:34:50.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-05-23 11:34:51.000000000 -0400
@@ -1494,6 +1494,7 @@ static void sp_insert(struct shared_poli
}
rb_link_node(&new->nd, parent, p);
rb_insert_color(&new->nd, &sp->root);
+ ++sp->nr_sp_nodes;
PDprintk("inserting %lx-%lx: %d\n", new->start, new->end,
new->policy ? new->policy->policy : 0);
}
@@ -1523,6 +1524,7 @@ static void sp_delete(struct shared_poli
rb_erase(&n->nd, &sp->root);
mpol_free(n->policy);
kmem_cache_free(sn_cache, n);
+ --sp->nr_sp_nodes;
}
struct sp_node *
@@ -1600,6 +1602,7 @@ struct shared_policy *mpol_shared_policy
return NULL;
sp->root = RB_ROOT;
spin_lock_init(&sp->lock);
+ sp->nr_sp_nodes = 0;
if (policy != MPOL_DEFAULT) {
struct mempolicy *newpol;
@@ -1932,9 +1935,9 @@ int show_numa_map(struct seq_file *m, vo
return 0;
mpol_to_str(buffer, sizeof(buffer),
- get_vma_policy(priv->task, vma, vma->vm_start));
+ get_vma_policy(priv->task, vma, priv->saddr));
- seq_printf(m, "%08lx %s", vma->vm_start, buffer);
+ seq_printf(m, "%08lx %s", priv->saddr, buffer);
if (file) {
seq_printf(m, " file=");
@@ -1947,10 +1950,10 @@ int show_numa_map(struct seq_file *m, vo
}
if (is_vm_hugetlb_page(vma)) {
- check_huge_range(vma, vma->vm_start, vma->vm_end, md);
+ check_huge_range(vma, priv->saddr, priv->eaddr, md);
seq_printf(m, " huge");
} else {
- check_pgd_range(vma, vma->vm_start, vma->vm_end,
+ check_pgd_range(vma, priv->saddr, priv->eaddr,
&node_online_map, MPOL_MF_STATS, md);
}
@@ -1990,3 +1993,48 @@ out:
return 0;
}
+/*
+ * alloc/populate array of shared policy ranges for show_numa_map()
+ */
+struct mpol_range *get_numa_submap(struct vm_area_struct *vma)
+{
+ struct shared_policy *sp;
+ struct mpol_range *ranges, *range;
+ struct rb_node *rbn;
+ int nranges;
+
+ BUG_ON(!vma->vm_file);
+ sp = mapping_shared_policy(vma->vm_file->f_mapping);
+ if (!sp)
+ return NULL;
+
+ nranges = sp->nr_sp_nodes;
+ if (!nranges)
+ return NULL;
+
+ ranges = kzalloc((nranges + 1) * sizeof(*ranges), GFP_KERNEL);
+ if (!ranges)
+ return NULL; /* pretend there are none */
+
+ range = ranges;
+ spin_lock(&sp->lock);
+ /*
+ * # of ranges could have changes since we checked, but that is
+ * unlikely, so this is close enough [as long as it's safe].
+ */
+ rbn = rb_first(&sp->root);
+ /*
+ * count nodes to ensure we leave one empty range struct
+ * in case node added between check and alloc
+ */
+ while (rbn && nranges--) {
+ struct sp_node *spn = rb_entry(rbn, struct sp_node, nd);
+ range->saddr = vma_pgoff_to_addr(vma, spn->start);
+ range->eaddr = vma_pgoff_to_addr(vma, spn->end);
+ ++range;
+ rbn = rb_next(rbn);
+ }
+
+ spin_unlock(&sp->lock);
+ return ranges;
+}
Index: Linux/fs/proc/task_mmu.c
===================================================================
--- Linux.orig/fs/proc/task_mmu.c 2007-05-23 10:57:02.000000000 -0400
+++ Linux/fs/proc/task_mmu.c 2007-05-23 11:34:51.000000000 -0400
@@ -498,7 +498,188 @@ const struct file_operations proc_clear_
#endif
#ifdef CONFIG_NUMA
-extern int show_numa_map(struct seq_file *m, void *v);
+/*
+ * numa_maps uses procfs task maps file operations, with wrappers
+ * to handle mpol submaps--policy ranges within a vma
+ */
+
+/*
+ * start processing a new vma for show_numa_maps
+ */
+static void nm_vma_start(struct proc_maps_private *priv,
+ struct vm_area_struct *vma)
+{
+ if (!vma)
+ return;
+ priv->vma = vma; /* saved across read()s */
+
+ priv->saddr = vma->vm_start;
+ if (!(vma->vm_flags & VM_SHARED) || !vma->vm_file ||
+ !vma->vm_file->f_mapping->spolicy) {
+ /*
+ * usual case: no submap
+ */
+ priv->eaddr = vma->vm_end;
+ return;
+ }
+
+ priv->range = priv->ranges = get_numa_submap(vma);
+ if (!priv->range) {
+ priv->eaddr = vma->vm_end; /* empty shared policy */
+ return;
+ }
+
+ /*
+ * restart suspended submap where we left off
+ */
+ while (priv->range->eaddr && priv->range->eaddr < priv->eaddr)
+ ++priv->range;
+
+ if (!priv->range->eaddr)
+ priv->eaddr = vma->vm_end;
+ else if (priv->saddr < priv->range->saddr)
+ priv->eaddr = priv->range->saddr; /* show gap [default pol] */
+ else
+ priv->eaddr = priv->range->eaddr; /* show range */
+}
+
+/*
+ * done with numa_maps vma: reset so we start a new
+ * vma on next seq_read.
+ */
+static void nm_vma_stop(struct proc_maps_private *priv)
+{
+ if (priv->ranges)
+ kfree(priv->ranges);
+ priv->ranges = priv->range = NULL;
+ priv->vma = NULL;
+}
+
+/*
+ * Advance to next vma in mm or next subrange in vma.
+ * mmap_sem held during a single seq_read(), but shared
+ * policy ranges can be modified at any time by other
+ * mappers. We just continue to display the ranges we
+ * found when we started the vma.
+ */
+static void *nm_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct proc_maps_private *priv = m->private;
+ struct vm_area_struct *vma = v;
+
+ if (!priv->range || priv->eaddr >= vma->vm_end) {
+ /*
+ * usual case: no submap or end of vma
+ * re: '>=' -- in case we got here from nm_start()
+ * and vma @ pos truncated to < priv->eaddr
+ */
+ nm_vma_stop(priv);
+ vma = m_next(m, v, pos);
+ nm_vma_start(priv, vma);
+ return vma;
+ }
+
+ /*
+ * Advance to next range in submap
+ */
+ priv->saddr = priv->eaddr;
+ if (priv->eaddr == priv->range->saddr) {
+ /*
+ * just processed a gap in the submap
+ */
+ priv->eaddr = min(priv->range->eaddr, vma->vm_end);
+ return vma; /* show the range */
+ }
+
+ ++priv->range;
+ if (!priv->range->eaddr)
+ priv->eaddr = vma->vm_end; /* past end of ranges */
+ else if (priv->saddr < priv->range->saddr)
+ priv->eaddr = priv->range->saddr; /* gap in submap */
+ else
+ priv->eaddr = min(priv->range->eaddr, vma->vm_end);
+
+ return vma;
+}
+
+/*
+ * [Re]start scan for new seq_read().
+ * N.B., much could have changes in mm, as we dropped the mmap_sem
+ * between reads(). Need to call m_start() to find vma at pos.
+ */
+static void *nm_start(struct seq_file *m, loff_t *pos)
+{
+ struct proc_maps_private *priv = m->private;
+ struct vm_area_struct *vma;
+
+ if (!priv->range) {
+ /*
+ * usual case: 1st after open, or finished prev vma
+ */
+ vma = m_start(m, pos);
+ nm_vma_start(priv, vma);
+ return vma;
+ }
+
+ /*
+ * Continue with submap of "current" vma. However, vma could have
+ * been unmapped, split, truncated, ... between read()s.
+ * Reset "last_addr" to simulate seek; find vma by 'pos'.
+ */
+ m->version = 0;
+ --(*pos); /* seq_read() incremented it */
+ vma = m_start(m, pos);
+ if (vma != priv->vma)
+ goto new_vma;
+ /*
+ * Same vma, but could have different ranges or could be entirely
+ * different vma.
+ */
+ if (vma->vm_start > priv->eaddr)
+ goto new_vma; /* starts past last range displayed */
+ if (priv->eaddr < vma->vm_end) {
+ /*
+ * vma at pos still covers eaddr--where we left off. Submap
+ * could have changed, but we'll keep reporting ranges we found
+ * earlier up to vm_end.
+ * We hope it is very unlikely that submap changed.
+ */
+ return nm_next(m, vma, pos);
+ }
+
+ /*
+ * Already reported past end of vma; find next vma past eaddr
+ */
+ while (vma && vma->vm_end < priv->eaddr)
+ vma = m_next(m, vma, pos);
+
+new_vma:
+ /*
+ * new vma at pos; continue from ~ last eaddr
+ */
+ nm_vma_stop(priv);
+ nm_vma_start(priv, vma);
+ return vma;
+}
+
+/*
+ * Suspend display of numa_map--e.g., buffer full?
+ */
+static void nm_stop(struct seq_file *m, void *v)
+{
+ struct proc_maps_private *priv = m->private;
+ struct vm_area_struct *vma = v;
+
+ if (!vma || priv->eaddr >= vma->vm_end) {
+ nm_vma_stop(priv);
+ }
+ /*
+ * leave state in priv for nm_start(); but drop the
+ * mmap_sem and unref the mm
+ */
+ m_stop(m, v);
+}
+
static int show_numa_map_checked(struct seq_file *m, void *v)
{
@@ -512,10 +693,10 @@ static int show_numa_map_checked(struct
}
static struct seq_operations proc_pid_numa_maps_op = {
- .start = m_start,
- .next = m_next,
- .stop = m_stop,
- .show = show_numa_map_checked
+ .start = nm_start,
+ .next = nm_next,
+ .stop = nm_stop,
+ .show = show_numa_map_checked
};
static int numa_maps_open(struct inode *inode, struct file *file)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-24 17:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-24 17:28 [PATCH/RFC 0/8] Mapped File Policy Overview Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 1/8] Mapped File Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 2/8] Mapped File Policy: allocate shared policies as needed Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 3/8] Mapped File Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 4/8] Mapped File Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-05-24 17:28 ` [PATCH/RFC 5/8] Mapped File Policy: Factor alloc_page_pol routine Lee Schermerhorn
2007-05-24 17:29 ` [PATCH/RFC 6/8] Mapped File Policy: use file policy for page cache allocations Lee Schermerhorn
2007-05-24 17:29 ` [PATCH/RFC 7/8] Mapped File Policy: fix migration of private mappings Lee Schermerhorn
2007-05-24 17:29 ` Lee Schermerhorn [this message]
2007-05-24 19:24 ` [PATCH/RFC 0/8] Mapped File Policy Overview Christoph Lameter
2007-05-24 20:46 ` Lee Schermerhorn
2007-05-24 20:41 ` Andi Kleen
2007-05-24 21:05 ` Lee Schermerhorn
2007-05-24 21:17 ` Christoph Lameter
2007-05-25 14:55 ` Lee Schermerhorn
2007-05-25 15:25 ` Christoph Lameter
2007-05-25 16:06 ` Lee Schermerhorn
2007-05-25 16:24 ` Christoph Lameter
2007-05-25 17:37 ` Lee Schermerhorn
2007-05-25 19:10 ` Christoph Lameter
2007-05-25 21:12 ` Lee Schermerhorn
2007-05-25 21:43 ` Christoph Lameter
2007-05-25 21:01 ` Andi Kleen
2007-05-25 21:41 ` Lee Schermerhorn
2007-05-25 21:46 ` Christoph Lameter
2007-05-29 13:57 ` Lee Schermerhorn
2007-05-25 21:03 ` Andi Kleen
2007-05-25 21:14 ` Lee Schermerhorn
2007-05-25 22:44 ` Andi Kleen
2007-05-29 14:17 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070524172922.13933.8558.sendpatchset@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nish.aravamudan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox