From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>,
linux-mm@kvack.org, Eric Whitney <eric.whitney@hp.com>,
David Rientjes <rientjes@google.com>, Paul Jackson <pj@sgi.com>
Subject: Re: [NUMA] Fix memory policy refcounting
Date: Tue, 06 Nov 2007 15:08:11 -0500 [thread overview]
Message-ID: <1194379691.5317.101.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0711061139230.30127@schroedinger.engr.sgi.com>
On Tue, 2007-11-06 at 11:43 -0800, Christoph Lameter wrote:
> On Tue, 6 Nov 2007, Lee Schermerhorn wrote:
>
> > We always seem to rathole on that subject. I just hoped to head that
> > off...
>
> Well fix this and the rathole will be gone.,
I'll hold you to that! :-)
>
> > > What do you mean by in use? If a vma can potentially use a shared policy
> > > in a rbtree then it is in use right?
> >
> > Not really--not for shared policies. Again, another task is allowed to
> > remove or replace the shared policies at any time, regardless of the
> > number of task's attached to the segment. We can't differentiate
> > between simple attachment and current use. We need the lookup-time
> > ref/unref to know that the policy is actually in use. We can still
> > replace it in the tree while it's "in use". This will remove the tree's
> > reference on the policy, but the policy won't be freed until the task
> > holding the extra ref drops it.
>
> Stil unclear as to why we need lookup time ref/unref. A task can replace
> the shared policy at any time you just need to update the refcounts. If
> you have a pointer to the policy in the vma then its possible to do so.
A pointer in the vma won't work. Different tasks could apply policies
on different ranges and shared policy semantics dictate that all tasks
see the same policy for a particular offset in the region--modulo
set/get races. The only way we could keep a pointer in the vma would be
to split the vmas in every task that has the shared region attached
whenever any task changes the policy of a range of the region, so that
all tasks have the same set of vma's all pointing to the same set of
policies in the tree. I don't think we can be changing other task's
address space externally like this. And it still wouldn't work, I
think, for shared policy semantics--again, except maybe with some sort
of rcu mechanism. More below on what constitutes actual "use".
>
> > I suppose we could stick any replaced mempolicy on a list associated
> > with the segment and keep them there until all tasks detach from the
> > shared segment. Not too much of a memory leak, as long as a task
>
> Well you have the refcount on the policy? Why keep the mempolicy around?
A non-zero ref count is what keeps the policy around. It implies that
some structure has a pointer to the policy, or some task is actively
examining the policy and will drop the reference when finsished with it.
[The latter is what's NOT happening now for shared policy.]
>
> > > AFAICT: If you take a reference on the shared policy for each
> > > vma then you can tell from the references that the policy is in use.
> >
> > See above. A vma reference does not constitute use for a shared policy.
>
> Why not? What does constitute "use" of a shared policy? A page that has
> used the policy?
Currently, when you lookup the policy [based on offset] in the rbtree
under spin_lock, the lookup function does an mpol_get() before dropping
the lock. Now, you can use the policy to allocate a page or to report
via get_mempolicy(MPOL_F_ADDR) or show_numa_maps()/mpol_to_str(). When
you're finished with the policy, you mpol_free() to release the
reference. While you're holding this ref, another task that has the
shared region attached can replace/delete the policy, removing it from
the rbtree and dropping the rbtree's reference via mpol_free(). Now,
the only reference to the policy is any reference held by a task that
has looked it up, but not yet mpol_free()ed it. When the last task
holding such a reference releases it, we'll free it back to the kmem
cache.
This is the type of use that I can't infer from vma counts or even vma
pointer refs. I should be able to replace the vma pointer/ref at any
time when the shared policy changes, and mpol_free() the policy for each
such vma pointer/ref. That leaves no ref to hold the policy should it
be in use [as discussed above].
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-11-06 20:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-26 23:41 Christoph Lameter
2007-10-29 15:48 ` Lee Schermerhorn
2007-10-29 20:24 ` Christoph Lameter
2007-10-29 21:34 ` Lee Schermerhorn
2007-10-29 21:43 ` Christoph Lameter
2007-10-30 16:39 ` Lee Schermerhorn
2007-10-30 18:42 ` Christoph Lameter
2007-10-30 20:18 ` Lee Schermerhorn
2007-11-06 18:56 ` Lee Schermerhorn
2007-11-06 19:15 ` Christoph Lameter
2007-11-06 19:35 ` Lee Schermerhorn
2007-11-06 19:43 ` Christoph Lameter
2007-11-06 20:08 ` Lee Schermerhorn [this message]
2007-11-06 20:19 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1194379691.5317.101.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox