From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>,
linux-mm@kvack.org, akpm@linux-foundation.org, nacc@us.ibm.com,
ak@suse.de
Subject: Re: [PATCH/RFC 0/11] Shared Policy Overview
Date: Wed, 27 Jun 2007 19:36:47 -0400 [thread overview]
Message-ID: <1182987407.7199.61.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0706271427400.31227@schroedinger.engr.sgi.com>
On Wed, 2007-06-27 at 14:37 -0700, Christoph Lameter wrote:
> On Wed, 27 Jun 2007, Lee Schermerhorn wrote:
>
> > Well, I DO need to ask Dr. RCU [Paul McK.] to take a look at the patch,
> > but this is how I understand RCU to work...
>
> RCU is not in doubt here.
>
> > > Just by looking at the description: It
> > > cannot work. Any allocator use of a memory policy must use rcu locks
> > > otherwise the memory policy can vanish from under us while allocating a
> > > page.
> >
> > The only place we need to worry about is "get_file_policy()", and--that
> > is the only place one can attempt to lookup a shared policy w/o holding
> > the [user virtual] address space locked [mmap_sem] which pins the shared
> > mapping of the file, so the i_mmap_writable count can't go to zero, so
> > we can't attempt to free the policy. And even then, it's only an issue
> > for file descriptor accessed page cache allocs. Lookups called from the
> > fault path do have the user vas locked during the fault, so the policy
> > can't go away. But, because __page_cache_alloc() calls
> > get_file_policy() to lookup the policy at the faulting page offset, it
> > uses RCU on the read side, anyway. I should probably write up the
> > entire locking picture for this, huh?
>
> The zonelist from MPOL_BIND is passed to __alloc_pages. As a result the
> RCU lock must be held over the call into the page allocator with reclaim
> etc etc. Note that the zonelist is part of the policy structure.
OK, I see your issue now. Policies that are looked up in a shared
policy are automatically reference counted on lookup. But, as I've seen
discussed in the other policy reference counting thread, I'm not
decrementing the count. I think this will be easy to add into my
factored "alloc_page_pol"--the mpol_free(), that is. However, it will
require that we actually take a reference on all the other policies when
we acquire them for allocation, so that we can free the reference when
the allocation completes. Something you'd like to avoid, but I don't
see how we can for non-atomic allocations. Might be able to special
case the system default policy and not reference count that, as it can
never go away--for now, anyway...
>
> > > If we can make this work then RCU should be used for all policies so that
> > > we can get rid of the requirement that policies can only be modified from
> > > the task context that created it.
> >
> > Yean, I think that's possible...
>
> Great if you can me that work.
I was only considering the replacement of the pointer. The indefinite
sleep in the allocation is a killer, tho'.
>
> I just looked at the shmem implementation. Without RCU you must increment
> a refcount in the policy structure. That is done on every
> single allocation. Which will create yet another bouncing cacheline if you
> do concurrent allocations from the same shmem segment. Performance did not
> seem to have been such a concern for shmem policies since this was a one
> off. Again this is a hack that you are trying to generalize. There is
> trouble all over the place if you do that.
As I mentioned, the increment is already there and always was. Just no
decrement.
And I don't think that referencing counting a shared object is a hack.
It's standard procedure. If it weren't for the possiblity of sleeping
indefinitely in allocation/reclaim [and reclaim delays are REALLY
indefinite!], you could use a deferred free, like RCU. But, the only
time you know that the allocation is finished is when you return from
the alloc call, so you need to release the reference there.
As far as bouncing cache lines during an allocation: for shared object
policy, either this [bouncing] dies out when all pages of the object are
finally allocated--i.e., it's start-up overhead, or we're constantly
recycling pages because they don't all fit in memory. In the latter
case, the cache line bounce will be small compared to the reclaim and
rereading of the page from the file system or swap [shmem case].
Again, we may be able to special case the system default policy, and
task policy is private to a task/thread, so I don't think that's too
much of a problem, right?
>
> I think one prerequisite to memory policy uses like this is work out how a
> memory policy can be handled by the page allocator in such a way that
>
> 1. The use is lightweight and does not impact performance.
I agree that use of memory policies should have a net decrease in
performance. However, nothing is for free. It's a tradeoff. If you
don't need policies or if they hurt worse than they help, don't use
them. No performance impact. If locality matters and policies help
more than they cost, use them.
>
> 2. The policy that is passed to the allocators is context independent.
> I.e. it needs to be independent of the cpuset context and the process
> context. That would allow f.e. to store a policy and then apply it to
> readahead. AFAIK this means that the policy struct needs to contain
> the memory policy plus the cpuset and the current node.
Maybe. or maybe something different. Laudable goals, anyway. Let's
discuss in the NUMA BOF.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-27 23:36 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-25 19:52 Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 1/11] Shared Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 2/11] Shared Policy: allocate shared policies as needed Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 3/11] Shared Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 4/11] Shared Policy: fix show_numa_maps() Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 5/11] Shared Policy: Add hugepage shmem policy vm_ops Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 6/11] Shared Policy: Factor alloc_page_pol routine Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 7/11] Shared Policy: use shared policy for page cache allocations Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 8/11] Shared Policy: fix migration of private mappings Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 9/11] Shared Policy: mapped file policy persistence model Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 10/11] Shared Policy: per cpuset shared file policy control Lee Schermerhorn
2007-06-25 21:10 ` Paul Jackson
2007-06-27 17:33 ` Lee Schermerhorn
2007-06-27 19:52 ` Paul Jackson
2007-06-27 20:22 ` Lee Schermerhorn
2007-06-27 20:36 ` Paul Jackson
2007-06-25 19:53 ` [PATCH/RFC 11/11] Shared Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-06-26 22:17 ` [PATCH/RFC 0/11] Shared Policy Overview Christoph Lameter
2007-06-27 13:43 ` Lee Schermerhorn
2007-06-26 22:21 ` Christoph Lameter
2007-06-26 22:42 ` Andi Kleen
2007-06-27 3:25 ` Christoph Lameter
2007-06-27 20:14 ` Lee Schermerhorn
2007-06-27 18:14 ` Lee Schermerhorn
2007-06-27 21:37 ` Christoph Lameter
2007-06-27 22:01 ` Andi Kleen
2007-06-27 22:08 ` Christoph Lameter
2007-06-27 23:46 ` Paul E. McKenney
2007-06-28 0:14 ` Andi Kleen
2007-06-29 21:47 ` Lee Schermerhorn
2007-06-28 13:42 ` Lee Schermerhorn
2007-06-28 22:02 ` Andi Kleen
2007-06-29 17:14 ` Lee Schermerhorn
2007-06-29 17:42 ` Andi Kleen
2007-06-30 18:34 ` [PATCH/RFC] Fix Mempolicy Ref Counts - was " Lee Schermerhorn
2007-07-03 18:09 ` Christoph Lameter
2007-06-29 1:39 ` Christoph Lameter
2007-06-29 9:01 ` Andi Kleen
2007-06-29 14:05 ` Christoph Lameter
2007-06-29 17:41 ` Lee Schermerhorn
2007-06-29 20:15 ` Christoph Lameter
2007-06-29 13:22 ` Lee Schermerhorn
2007-06-29 14:18 ` Christoph Lameter
2007-06-27 23:36 ` Lee Schermerhorn [this message]
2007-06-29 1:41 ` Christoph Lameter
2007-06-29 13:30 ` Lee Schermerhorn
2007-06-29 14:20 ` Andi Kleen
2007-06-29 21:40 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1182987407.7199.61.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
--cc=paulmck@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox