From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>,
"Paul E. McKenney" <paulmck@us.ibm.com>,
linux-mm@kvack.org, akpm@linux-foundation.org, nacc@us.ibm.com
Subject: Re: [PATCH/RFC 0/11] Shared Policy Overview
Date: Thu, 28 Jun 2007 09:42:17 -0400 [thread overview]
Message-ID: <1183038137.5697.16.camel@localhost> (raw)
In-Reply-To: <200706280001.16383.ak@suse.de>
On Thu, 2007-06-28 at 00:01 +0200, Andi Kleen wrote:
> > The zonelist from MPOL_BIND is passed to __alloc_pages. As a result the
> > RCU lock must be held over the call into the page allocator with reclaim
> > etc etc. Note that the zonelist is part of the policy structure.
>
> Yes I realized this at some point too. RCU doesn't work here because
> __alloc_pages can sleep. Have to use the reference counts even though
> it adds atomic operations.
>
> > I think one prerequisite to memory policy uses like this is work out how a
> > memory policy can be handled by the page allocator in such a way that
> >
> > 1. The use is lightweight and does not impact performance.
>
> The current mempolicies are all lightweight and zero cost in the main
> allocator path.
>
> The only outlier is still cpusets which does strange stuff, but you
> can't blame mempolicies for that.
Andi, Christoph:
Here is a proposed approach for reference counting based on my factoring
of alloc_page_vma() into get_vma_policy() and alloc_page_pol(). I've
created a patch that would slot into my shared policy series after
#6--the factoring mentioned above. I've tried to avoid taking a
reference count in the common cases of default system policy and the
current task's mempolicy. I think it's safe and, I hope, less costly to
do the tests and avoid the ref than to go ahead and acquire the cache
line for write.
I'm not sure that the check for current task's policy is necessary in
get_file_policy() because it is always called in the context of the
current task--i.e., task == current. I'm not even sure that we need the
task argument to get_file_policy. I included it to match the call
to get_vma_policy(). Could [should?] probably be removed.
Note, I've updated my series slightly since last post, to avoid using a
pseudo-vma in shmem_alloc_page(); adding some additional
documentation, ... You'll see this in the patch below. I can send the
revised patch #6 if you like. I still need to think about
shmem_swapin_async() and read_swap_cache_async(). With some work, I
think I can avoid the pseudo vma there as well. Later, tho'.
Here's the patch-untested. thoughts on this approach?
Shared Mapped File Policy "6.1/11" fix policy reference counts
This patch acquires a reference count on vma policy and on task policy
when acquired from a task with a different policy--e.g., from
show_numa_map()--and frees that reference after allocating a page or
after converting the policy to a displayable string in show_numa_map().
Avoid the taking the reference count on the system default policy or the
current task's task policy. Note that if show_numa_map() is called from
the context of a relative of the target task with the same task mempolicy,
we won't take an extra reference either. This is safe, because the policy
remains referenced by the calling task during the mpol_to_str() processing.
Call __mpol_free() [a.k.a. the "slow path"] directly from alloc_page_pol()
and show_numa_map(), where we know we have non-NULL policy, if policy is
not the system default policy and not the current task's policy. By
calling __mpol_free() directly in these two places, we avoid the extra
check for null policy in mpol_free() [admittedly a "cheap check"].
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/mempolicy.c | 35 ++++++++++++++++++++++++++++-------
mm/shmem.c | 1 -
2 files changed, 28 insertions(+), 8 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-06-28 09:14:39.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-06-28 09:34:20.000000000 -0400
@@ -1124,25 +1124,32 @@ asmlinkage long compat_sys_mbind(compat_
* @task - fall back to this task's policy if no vma policy at @addr
* @vma - vma struct containing @addr and possible policy
* @addr - virtual address in @vma for which to get policy
+ * Note: policy returned with an extra reference if the VMA has a non-NULL,
+ * non-DEFAULT policy or the policy is the task policy for a task other
+ * than "current".
*/
static struct mempolicy * get_vma_policy(struct task_struct *task,
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
+ int shared_pol = 0;
if (vma) {
/*
* use get_policy op, if any, for shared mappings
*/
if ((vma->vm_flags & (VM_SHARED|VM_NONLINEAR)) == VM_SHARED &&
- vma->vm_ops && vma->vm_ops->get_policy)
+ vma->vm_ops && vma->vm_ops->get_policy) {
pol = vma->vm_ops->get_policy(vma, addr);
- else if (vma->vm_policy &&
+ shared_pol = 1; /* if non-NULL, that is */
+ } else if (vma->vm_policy &&
vma->vm_policy->policy != MPOL_DEFAULT)
pol = vma->vm_policy;
}
if (!pol)
pol = &default_policy;
+ else if (!shared_pol && pol != current->mempolicy)
+ mpol_get(pol);
return pol;
}
@@ -1158,11 +1165,17 @@ struct mempolicy *get_file_policy(struct
{
struct shared_policy *sp = x->spolicy;
struct mempolicy *pol = task->mempolicy;
+ int shared_pol = 0;
- if (sp)
+ if (sp) {
pol = mpol_shared_policy_lookup(sp, pgoff);
+ shared_pol = 1; /* if non-NULL, that is */
+ }
+
if (!pol)
pol = &default_policy;
+ else if (!shared_pol && pol != current->mempolicy)
+ mpol_get(pol);
return pol;
}
@@ -1290,18 +1303,23 @@ static struct page *alloc_page_interleav
/*
* alloc_page_pol() -- allocate a page based on policy,offset.
* @gfp - gfp mask [flags + zone] for allocation
- * @pol - policy to use for allocation
+ * @pol - policy to use for allocation; must mpol_free()
* @pgoff - page offset for interleaving -- used only if interleave policy
*/
struct page *alloc_page_pol(gfp_t gfp, struct mempolicy *pol, pgoff_t pgoff)
{
+ struct page *page;
+
if (unlikely(pol->policy == MPOL_INTERLEAVE)) {
unsigned nid;
nid = offset_il_node(pol, pgoff);
return alloc_page_interleave(gfp, 0, nid);
}
- return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+ page = __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+ if (pol != &default_policy && pol != current->mempolicy)
+ __mpol_free(pol);
+ return page;
}
EXPORT_SYMBOL(alloc_page_pol);
@@ -2018,6 +2036,7 @@ int show_numa_map(struct seq_file *m, vo
struct numa_maps *md;
struct file *file = vma->vm_file;
struct mm_struct *mm = vma->vm_mm;
+ struct mempolicy *pol;
int n;
char buffer[50];
@@ -2028,8 +2047,10 @@ int show_numa_map(struct seq_file *m, vo
if (!md)
return 0;
- mpol_to_str(buffer, sizeof(buffer),
- get_vma_policy(priv->task, vma, priv->saddr));
+ pol = get_vma_policy(priv->task, vma, priv->saddr);
+ mpol_to_str(buffer, sizeof(buffer), pol);
+ if (pol != &default_policy && pol != current->mempolicy)
+ __mpol_free(pol);
seq_printf(m, "%08lx %s", priv->saddr, buffer);
Index: Linux/mm/shmem.c
===================================================================
--- Linux.orig/mm/shmem.c 2007-06-28 09:14:57.000000000 -0400
+++ Linux/mm/shmem.c 2007-06-28 09:27:33.000000000 -0400
@@ -1005,7 +1005,6 @@ shmem_alloc_page(gfp_t gfp, struct share
pol = mpol_shared_policy_lookup(sp, idx);
page = alloc_page_pol(gfp | __GFP_ZERO, pol, idx);
- mpol_free(pol);
return page;
}
#else
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-28 13:42 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-25 19:52 Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 1/11] Shared Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 2/11] Shared Policy: allocate shared policies as needed Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 3/11] Shared Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 4/11] Shared Policy: fix show_numa_maps() Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 5/11] Shared Policy: Add hugepage shmem policy vm_ops Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 6/11] Shared Policy: Factor alloc_page_pol routine Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 7/11] Shared Policy: use shared policy for page cache allocations Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 8/11] Shared Policy: fix migration of private mappings Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 9/11] Shared Policy: mapped file policy persistence model Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 10/11] Shared Policy: per cpuset shared file policy control Lee Schermerhorn
2007-06-25 21:10 ` Paul Jackson
2007-06-27 17:33 ` Lee Schermerhorn
2007-06-27 19:52 ` Paul Jackson
2007-06-27 20:22 ` Lee Schermerhorn
2007-06-27 20:36 ` Paul Jackson
2007-06-25 19:53 ` [PATCH/RFC 11/11] Shared Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-06-26 22:17 ` [PATCH/RFC 0/11] Shared Policy Overview Christoph Lameter
2007-06-27 13:43 ` Lee Schermerhorn
2007-06-26 22:21 ` Christoph Lameter
2007-06-26 22:42 ` Andi Kleen
2007-06-27 3:25 ` Christoph Lameter
2007-06-27 20:14 ` Lee Schermerhorn
2007-06-27 18:14 ` Lee Schermerhorn
2007-06-27 21:37 ` Christoph Lameter
2007-06-27 22:01 ` Andi Kleen
2007-06-27 22:08 ` Christoph Lameter
2007-06-27 23:46 ` Paul E. McKenney
2007-06-28 0:14 ` Andi Kleen
2007-06-29 21:47 ` Lee Schermerhorn
2007-06-28 13:42 ` Lee Schermerhorn [this message]
2007-06-28 22:02 ` Andi Kleen
2007-06-29 17:14 ` Lee Schermerhorn
2007-06-29 17:42 ` Andi Kleen
2007-06-30 18:34 ` [PATCH/RFC] Fix Mempolicy Ref Counts - was " Lee Schermerhorn
2007-07-03 18:09 ` Christoph Lameter
2007-06-29 1:39 ` Christoph Lameter
2007-06-29 9:01 ` Andi Kleen
2007-06-29 14:05 ` Christoph Lameter
2007-06-29 17:41 ` Lee Schermerhorn
2007-06-29 20:15 ` Christoph Lameter
2007-06-29 13:22 ` Lee Schermerhorn
2007-06-29 14:18 ` Christoph Lameter
2007-06-27 23:36 ` Lee Schermerhorn
2007-06-29 1:41 ` Christoph Lameter
2007-06-29 13:30 ` Lee Schermerhorn
2007-06-29 14:20 ` Andi Kleen
2007-06-29 21:40 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1183038137.5697.16.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
--cc=paulmck@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox