From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>, Gleb Natapov <glebn@voltaire.com>,
linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] Document Linux Memory Policy
Date: Fri, 01 Jun 2007 17:05:43 -0400 [thread overview]
Message-ID: <1180731944.5278.146.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0706011242250.3598@schroedinger.engr.sgi.com>
On Fri, 2007-06-01 at 12:48 -0700, Christoph Lameter wrote:
> On Fri, 1 Jun 2007, Lee Schermerhorn wrote:
>
> > > Same here and I wish we had a clean memory region based implementation.
> > > But that is just what your patches do *not* provide. Instead they are file
> > > based. They should be memory region based.
> > >
> > > Would you please come up with such a solution?
> >
> > Christoph:
> >
> > I don't understand what you mean by "memory region based".
>
> Memory policies are controlling allocations for regions of memory of a
> process. They are not file based policies (they may have been on Tru64).
By "regions of memory of a process" do you mean VMAs? These are not
shared between processes, so installing a policy in a VMA of one task
will not affect pages faulted in by other cooperating tasks of the
application.
Actually, in Tru64, policies were attached to those "memory
object"--separate from the inode, but still shared by all mappings of
the file in separate tasks. [Bob Picco's design, IIRC.] Doesn't matter
where you attach the policies. You need to share them between tasks and
they need to control allocations of pages for the mapping--pages that
happen to live in the page cache.
>
> > So, for a shared memory mapped file, the inode+address_space--i.e., the
> > in-memory incarnation of the file--is as close to a "memory region" as
>
> Not at all. Consider a mmapped memory region by a database. The database
> is running on nodes 5-8 and has specified an interleave policy for the
> data.
If the memory region is a shared mmap'd file and the data base consists
of multiple tasks, you can't do this today if you don't want to prefault
in the entire file]--especially if you want to keep your task policy
default/local so that task heap and stack pages stay local.
Maybe you're thinking of a multithreaded task? You're right. You don't
need shared policy. You've only got one address space mapping the file.
And one page table... Somewhat problematic on NUMA systems, as you've
pointed out in the context of Nick's page cache replication patch/rfc.
One reason to use separate tasks sharing files and shmem on a NUMA
system.
>
> Now another process starts on node 1 and it also mapped to mmap the same
> file used by the database. It specifies allocation on node 1 and then
> terminates.
>
> Now the database will attempt to satisfy its big memory needs from node 1?
>
> This scheme is not working.
Red Herring. The same scenario can occur with shmem today. And don't
try to play the "shmem is different" card. For this scenario, they're
the same. If "node 1 task" can mmap your file and specify a different
policy, it can attach your shmem segment and specify a different policy,
with the same result.
And, why would the task on node 1 do that? In this scenario, these are
not cooperating tasks; or it's an application bug. You want to penalize
well behaved, cooperating tasks that are part of a single application,
sharing application private files because you can come up with scenarios
based on non-cooperating or buggy tasks to which you've allowed access
to your application's files?
As it stands today, and as we've been discussing with Gleb, a multitask
application cannot map a file shared and place different ranges on
different nodes reliably without prefaulting in all of the pages. Gleb
was even willing to install the identical policies from each
task--something I don't think he should have to do--but even this would
not achieve his desired results. This is much more serious shortcoming
than the scenario you describe above. We CAN prevent your scenario.
Just don't give non-cooperating tasks access to files whose
policy/location you care about? Same as for shmem.
>
> > You're usually gung-ho about locality on a NUMA platform, avoiding off
> > node access or page allocations, respecting the fast path, ... Why the
> > resistance here?
>
> Yes I want consistent memory policies. There are already consistency
> issues that need to be solved. Forcing in a Tru64 concept of file memory
> allocation policies will just make the situation worse.
It's NOT a Tru64 concept, Christoph. Another Red Herring. It's about
consistent support of memory policies on any object that I can map into
my address space. And if that object is a disk-based file that lives
in the page cache, and we want to preserve coherency between file
descriptor and shared, memory mapped access [believe me, we do], then
the policy applied to the object needs to affect all page allocations
for that file--even those caused by non-cooperating or buggy tasks, if
we allow them access to the files.
>
> And shmem is not really something that should be taken as a general rule.
I disagree. The shared policy support that shmem is exactly what I want
for shared mmaped files. I'm willing to deal with the same issues that
shmem has in order to get shared, mapped file semantics for my shared
regions.
> Shmem allocations can be controlled via a kernel boot option. They exist
> even after a process terminates. etc etc.
Once again. If you have a use case for shared file policies persisting
after the process terminates [and I suspect not, 'cause you don't even
want them in the first place] then raise that as a requirement. We can
add that--as a subsequent patch. If you have a use case for policies
persisting over system reboot [shmem policies don't, by the way], I
expect the file system folks could come up with a way to attach policies
to files that get loaded when the file is opened or when mmap'ed. It
would still require the in-kernel mechanism to attach policies to the
in-memory structure[s]. This capability is useful without either.
And, Christoph, again, adding shared policy support to shared file
mappings doesn't add any warts or inconsistent behavior that isn't
already there with policy applied to mmap'ed files. Default behavior is
the same--wart-for-wart. Yes, shared policies on mmaped files will have
the same risks as shared policy on shmem does today--e.g., your
scenario--but we find the shared policies on shmem useful enough that
we've all been willing to manage that.
Later,
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-01 21:05 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-29 19:33 Lee Schermerhorn
2007-05-29 20:04 ` Christoph Lameter
2007-05-29 20:16 ` Andi Kleen
2007-05-30 16:17 ` Lee Schermerhorn
2007-05-30 17:41 ` Christoph Lameter
2007-05-31 8:20 ` Michael Kerrisk
2007-05-31 14:49 ` Lee Schermerhorn
2007-05-31 15:56 ` Michael Kerrisk
2007-06-01 21:15 ` [PATCH] enhance memory policy sys call man pages v1 Lee Schermerhorn
2007-07-23 6:11 ` Michael Kerrisk
2007-07-23 6:32 ` mbind.2 man page patch Michael Kerrisk
2007-07-23 14:26 ` Lee Schermerhorn
2007-07-26 17:19 ` Michael Kerrisk
2007-07-26 18:06 ` Lee Schermerhorn
2007-07-26 18:18 ` Michael Kerrisk
2007-07-23 6:32 ` get_mempolicy.2 " Michael Kerrisk
2007-07-28 9:31 ` Michael Kerrisk
2007-08-09 18:43 ` Lee Schermerhorn
2007-08-09 20:57 ` Michael Kerrisk
2007-08-16 20:05 ` Andi Kleen
2007-08-18 5:50 ` Michael Kerrisk
2007-08-21 15:45 ` Lee Schermerhorn
2007-08-22 4:10 ` Michael Kerrisk
2007-08-22 16:08 ` [PATCH] Mempolicy Man Pages 2.64 1/3 - mbind.2 Lee Schermerhorn
2007-08-27 11:29 ` Michael Kerrisk
2007-08-22 16:10 ` [PATCH] Mempolicy Man Pages 2.64 2/3 - set_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-22 16:12 ` [PATCH] Mempolicy Man Pages 2.64 3/3 - get_mempolicy.2 Lee Schermerhorn
2007-08-27 11:30 ` Michael Kerrisk
2007-08-27 10:46 ` get_mempolicy.2 man page patch Michael Kerrisk
2007-07-23 6:33 ` set_mempolicy.2 " Michael Kerrisk
2007-05-30 16:55 ` [PATCH] Document Linux Memory Policy Lee Schermerhorn
2007-05-30 17:56 ` Christoph Lameter
2007-05-31 6:18 ` Gleb Natapov
2007-05-31 6:41 ` Christoph Lameter
2007-05-31 6:47 ` Gleb Natapov
2007-05-31 6:56 ` Christoph Lameter
2007-05-31 7:11 ` Gleb Natapov
2007-05-31 7:24 ` Christoph Lameter
2007-05-31 7:39 ` Gleb Natapov
2007-05-31 17:43 ` Christoph Lameter
2007-05-31 17:07 ` Lee Schermerhorn
2007-05-31 10:43 ` Andi Kleen
2007-05-31 11:04 ` Gleb Natapov
2007-05-31 11:30 ` Gleb Natapov
2007-05-31 15:26 ` Lee Schermerhorn
2007-05-31 17:41 ` Gleb Natapov
2007-05-31 18:56 ` Lee Schermerhorn
2007-05-31 20:06 ` Gleb Natapov
2007-05-31 20:43 ` Andi Kleen
2007-06-01 9:38 ` Gleb Natapov
2007-06-01 10:21 ` Andi Kleen
2007-06-01 12:25 ` Gleb Natapov
2007-06-01 13:09 ` Andi Kleen
2007-06-01 17:15 ` Lee Schermerhorn
2007-06-01 18:43 ` Christoph Lameter
2007-06-01 19:38 ` Lee Schermerhorn
2007-06-01 19:48 ` Christoph Lameter
2007-06-01 21:05 ` Lee Schermerhorn [this message]
2007-06-01 21:56 ` Christoph Lameter
2007-06-04 13:46 ` Lee Schermerhorn
2007-06-04 16:34 ` Christoph Lameter
2007-06-04 17:02 ` Lee Schermerhorn
2007-06-04 17:11 ` Christoph Lameter
2007-06-04 20:23 ` Andi Kleen
2007-06-04 21:51 ` Christoph Lameter
2007-06-05 14:30 ` Lee Schermerhorn
2007-06-01 20:28 ` Gleb Natapov
2007-06-01 20:45 ` Christoph Lameter
2007-06-01 21:10 ` Lee Schermerhorn
2007-06-01 21:58 ` Christoph Lameter
2007-06-02 7:23 ` Gleb Natapov
2007-05-31 11:47 ` Andi Kleen
2007-05-31 11:59 ` Gleb Natapov
2007-05-31 12:15 ` Andi Kleen
2007-05-31 12:18 ` Gleb Natapov
2007-05-31 18:28 ` Lee Schermerhorn
2007-05-31 18:35 ` Christoph Lameter
2007-05-31 19:29 ` Lee Schermerhorn
2007-05-31 19:25 ` Paul Jackson
2007-05-31 20:22 ` Lee Schermerhorn
2007-05-29 20:07 ` Andi Kleen
2007-05-30 16:04 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1180731944.5278.146.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=glebn@voltaire.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox