From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Brent Casavant <bcasavan@sgi.com>
Cc: Andi Kleen <ak@suse.de>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
hugh@veritas.com
Subject: Re: [PATCH] Use MPOL_INTERLEAVE for tmpfs files
Date: Tue, 02 Nov 2004 17:30:25 -0800 [thread overview]
Message-ID: <255590000.1099445425@flay> (raw)
In-Reply-To: <Pine.SGI.4.58.0411021832240.79056@kzerza.americas.sgi.com>
--On Tuesday, November 02, 2004 19:12:10 -0600 Brent Casavant <bcasavan@sgi.com> wrote:
> On Tue, 2 Nov 2004, Martin J. Bligh wrote:
>
>> > The manner I'm concerned about is when a long-lived file (long-lived
>> > meaning at least for the duration of the run of a large multithreaded app)
>> > is placed in memory as an accidental artifact of the CPU which happened
>> > to create the file.
>>
>> Agreed, I see that point - if it's a globally accessed file that's
>> created by one CPU, you want it spread around. However ... how the hell
>> is the VM meant to distinguish that? The correct way is for the application
>> to tell us that - ie use the NUMA API.
>
> Right. The application can already tell us that without this patch,
> but only if the file is mapped. Using writes there doesn't appear to
> be any way to control this behavior (unless I overlooked something).
> So it is impossible to use normal system utilities (i.e. cp, dd, tar)
> and get appropriate placement.
Ah yes, I see your point.
> This change gives us a method to get a different default (i.e write)
> behavior.
OK. Might still be more useful to have it per filehandle, but we'd need
a way to send info on an existing filehandle.
> Yep. Also, bear in mind that many applications/utilities are not going
> to be very savvy in regard to tmpfs placement, and really shouldn't be.
> I wouldn't expect a compiler to have special code to lay out the generated
> objects in a friendly manner. Currently, all it takes is one unaware
> application/utility to mess things up for us.
Well, on certain extreme workloads, yes. But not for most people.
> With local-by-default placement, as today, every single application and
> utility needs to cooperate in order to be successful.
>
> Which, I guess, might be something to consider (thinking out loud here)...
I don't like apps having to co-operate any more than you do, as they
obviously won't. However, remember that most workloads won't hit this,
and there are a couple of other approaches:
1) your global switch is making more sense to me now.
2) We can balance nodes under mem pressure (we don't currently)
Number 2 fixes a variety of evils, and we've been talking about fixing
that for a while (see the previous one on pagecache ... do we have to
fix each case?).
What scares me is that what all these discussions are pointing towards
is "we want global striping for all allocations, because Linux is too
crap to deal with cross-node balancing pressure". I don't want to see
us go that way ;-). Some things are more obviously global than others
(eg shmem is more likely to be shared) ... whether we think the usage
of tmpfs is local or global is very much up to debate though.
There are a few other indicators I could see us using as hints from the
OS that might help - the size of the file vs the amount of mem in the
node, for one. How many processes have the file open, and which nodes
they're on for another. Neither of which are flawless, or even terribly
simple, but we're making heuristic guesses.
> An application which uses the NUMA API obviously cares about placement.
> If it causes data to be placed locally when it would otherwise be placed
> globally, it will not cause a problem for a seperate application/utility
> which doesn't care (much?) about locality.
99.9% of apps won't be NUMA aware, nor should they have to be. Standard
code should just work out the box ... I'm not going to take the viewpoint
that these are only specialist servers running one or two apps - they'll
run all sorts of stuff, esp with the advent of AMD in the marketplace,
and hopefully Intel will get their ass into gear and do local mem soon
as well.
> However, if an application/utility which does not care (much?) about
> locality fails to use the NUMA API and causes data to be placed locally,
> it may very well cause a problem for a seperate application which does
> care about locality.
>
> Thus, to me, a default of spreading the allocation globally makes
> sense. The burden of adding code to "do the right thing" falls to
> the applications which care. No other program requires changes and
> as such everything will "just work".
>
> But, once again, I fess up to a bias. :)
Yeah ;-) I'm strongly against taking the road of "NUMA performance will
suck unless your app is NUMA aware" (ie default to non-local alloc).
OTOH, I don't like us crapping out under imbalance either. However,
I think there are other ways to solve this (see above).
>> Another way might be a tmpfs mount option ... I'd prefer that to a sysctl
>> personally, but maybe others wouldn't. Hugh, is that nuts?
>
> I kind of like that -- it shouldn't be too hard to stuff into the tmpfs
> superblock. But I agree, Hugh knows better than I do.
OK, that'd be under the sysadmin's control fairly easily at least.
M>
next prev parent reply other threads:[~2004-11-03 1:30 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-11-02 1:07 Brent Casavant
2004-11-02 1:43 ` Dave Hansen
2004-11-02 9:13 ` Andi Kleen
2004-11-02 15:46 ` Martin J. Bligh
2004-11-02 15:55 ` Andi Kleen
2004-11-02 16:55 ` Martin J. Bligh
2004-11-02 22:17 ` Brent Casavant
2004-11-02 22:51 ` Martin J. Bligh
2004-11-03 1:12 ` Brent Casavant
2004-11-03 1:30 ` Martin J. Bligh [this message]
2004-11-03 8:44 ` Hugh Dickins
2004-11-03 9:01 ` Andi Kleen
2004-11-03 16:32 ` Brent Casavant
2004-11-03 21:00 ` Martin J. Bligh
2004-11-08 19:58 ` Brent Casavant
2004-11-08 20:57 ` Martin J. Bligh
2004-11-09 19:04 ` Hugh Dickins
2004-11-09 20:09 ` Martin J. Bligh
2004-11-09 21:08 ` Hugh Dickins
2004-11-09 22:07 ` Martin J. Bligh
2004-11-10 2:41 ` Brent Casavant
2004-11-10 14:20 ` Hugh Dickins
2004-11-11 19:48 ` Hugh Dickins
2004-11-11 23:10 ` Brent Casavant
2004-11-15 22:07 ` Brent Casavant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=255590000.1099445425@flay \
--to=mbligh@aracnet.com \
--cc=ak@suse.de \
--cc=bcasavan@sgi.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox