Re: [NUMA] Display and modify the memory policy of a process through /proc/<pid>/numa_policy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Paul Jackson <pj@sgi.com>
To: Christoph Lameter <clameter@engr.sgi.com>
Cc: ak@suse.de, kenneth.w.chen@intel.com, linux-mm@kvack.org,
	linux-ia64@vger.kernel.org
Subject: Re: [NUMA] Display and modify the memory policy of a process through /proc/<pid>/numa_policy
Date: Sat, 16 Jul 2005 15:39:01 -0700	[thread overview]
Message-ID: <20050716153901.3082f1d8.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.62.0507160808570.21470@schroedinger.engr.sgi.com>

Christoph wrote:
> We can number the vma's if that makes you feel better and refer to the 
> number of the vma.

I really doubt you want to go down that path.

VMA's are a kernel internel detail.  They can come and go, be merged
and split, in the dark of the night, transparent to user space.

One might argue that virtual address ranges (rather than VMAs) are
appropriate to be manipulated from outside the task, on the other
side of the position that Andi takes.  Not that I am arguing such --
Andi is making stronger points against such than I am able to refute.

But VMA's are not really visible, except via diagnostic displays
(and Andi makes a good case against even those) to user space; not
even the VMA's within one's own task.

No, I don't think you want to consider numbering VMA's for the purposes
of manipulating them.

===

My intuition is that we are seeing a clash of computing models here.

Main memory is becoming hundreds, even thousands, of times slower
than internal CPU operations.  And main memory is, under the rubric
"NUMA", becoming no longer a monolithic resource on larger systems.
Within a few years, high end workstations will join the ranks of
NUMA systems, just as they have already joined the ranks of SMP
systems.

What was once the private business of each individual task, it's
address space and the placement of its memory, is now becoming the
proper business of system wide administration.  This is because memory
placement can have substantial affects on system and job performance.

We see a variation of this clash on issues of how the kernel should
consume and place its internal memory for caches, buffers and such.
What used to be the private business of the kernel, guided by the
rule that it is best to consume almost all available memory to
cache something, is becoming a system problem, as it can be counter
productive on NUMA systems.

Folks like SGI, on the high end of big honkin NUMA iron, are seeing
it first.  As system architectures become more complex, and scaled
down NUMA architectures become in more widespread use, others will
see it as well, though with no doubt different tradeoffs and ordering
of requirements than SGI and its competitors notice currently.

However, I would presume that Andi is entirely correct, and that
the architecture of the kernel does not allow one task safely to
manipulate another's address space or memory placement there under,
at least not in a way that us mere mortals can understand.

A rock and a hard place.

Just brainstorming ... if one could load a kernel module that could
be called in the context of a target task, either when it is returning
from kernel space or was entering or leaving a timer/resched interrupt
taken while in user space, where that module could, if it was so
coded, munge the tasks memory placement, then would this provide a
basis for solutions to some of these problems?

I am presuming that the hooks for such a module, given it was under
GPL license, would be modest, and of minimum burden to the great
majority of systems that had no such need.

In short, when memory management and placement has such a dominant
impact on overall system performance, it cannot remain solely the
private business of each application.  We need to look for a safe and
simple means to enable external (from outside the task) management
of a tasks memory, without requiring massive surgery on a large body
of critical code that is (quite properly) not designed to handle such.

And we have to co-exist with the folks pushing Linux in the other
direction, embedded in wrist watches or whatever.  Those folks will
properly refuse to waste any non-trivial number brain or CPU cycles
on NUMA requirements.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-07-16 22:39 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-15  1:39 Christoph Lameter
2005-07-15  3:50 ` Paul Jackson
2005-07-15  4:52 ` Chen, Kenneth W
2005-07-15  5:07   ` Christoph Lameter
2005-07-15  5:55     ` Chen, Kenneth W
2005-07-15  6:05     ` Paul Jackson
2005-07-15 11:46       ` Andi Kleen
2005-07-15 16:06       ` Christoph Lameter
2005-07-15 21:04         ` Paul Jackson
2005-07-15 21:12           ` Andi Kleen
2005-07-15 21:20             ` Christoph Lameter
2005-07-15 21:47               ` Andi Kleen
2005-07-15 21:55                 ` Christoph Lameter
2005-07-15 22:07                   ` Andi Kleen
2005-07-15 22:30                     ` Christoph Lameter
2005-07-15 22:37                       ` Andi Kleen
2005-07-15 22:49                         ` Christoph Lameter
2005-07-15 22:56                           ` Andi Kleen
2005-07-15 23:11                             ` Christoph Lameter
2005-07-15 23:44                               ` Andi Kleen
2005-07-15 23:56                                 ` Christoph Lameter
2005-07-16  2:01                                   ` Andi Kleen
2005-07-16 15:14                                     ` Christoph Lameter
2005-07-16 22:39                                       ` Paul Jackson [this message]
2005-07-16 23:30                                     ` Paul Jackson
2005-07-17  1:55                                       ` Christoph Lameter
2005-07-17  3:50                                         ` Paul Jackson
2005-07-17  5:56                                           ` Christoph Lameter
2005-07-17  7:22                                             ` Paul Jackson
2005-07-17  3:21                                       ` Christoph Lameter
2005-07-17  4:51                                         ` Paul Jackson
2005-07-17  6:00                                           ` Christoph Lameter
2005-07-17  8:17                                             ` Paul Jackson
2005-07-16  0:00                                 ` David Singleton
2005-07-16  0:16                                 ` Steve Neuner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050716153901.3082f1d8.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ak@suse.de \
    --cc=clameter@engr.sgi.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox