linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul Menage" <menage@google.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: linux-mm@kvack.org, akpm@osdl.org
Subject: Re: [RFC][PATCH 1/1] Expose per-node reclaim and migration to userspace
Date: Thu, 30 Nov 2006 01:45:49 -0800	[thread overview]
Message-ID: <6599ad830611300145gae22510te7eaa63edf539ad1@mail.gmail.com> (raw)
In-Reply-To: <456EA28C.8070508@yahoo.com.au>

On 11/30/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >> AFAIK they do that in their higher level APIs (at least HPC numa does).
> >
> >
> > Could you point me at an example?
>
> kernel/cpuset.c:cpuset_migrate_mm

No, that doesn't really do what we want. It basically just calls
do_migrate_pages, which has the drawbacks of:

- it has no way to try to migrate memory from one source node to
multiple destination nodes.

- it doesn't (as far as I can tell) migrate unmapped file pages in the
page cache.

- it scans every page table entry of every mm in the process. If your
nodes are relatively small compared to your processes, this is likely
to be much more heavyweight than just trying to migrate each page in a
node. (I realise that there are some unsolved implementation issues
with migrating pages whilst not holding an mmap_sem of an mm that's
mapping them; that's something that we would need to solve)

>
> How about "try to change the memory reservation charge of this
> 'container' from xMB to yMB"? Underneath that API, your fakenode
> controller would do the node reclaim and consolidation stuff --
> but it could be implemented completely differently in the case of
> a different type of controller.

How would it make decisions such as which node to free up (e.g.
userspace might have a strong preference for keeping a job on one
particular real node, or moving it to a different one.) I think that
policy decisions like this belong in userspace, in the same way that
the existing cpusets API provides a way to say "this cpuset uses these
nodes" rather than "this cpuset should have N nodes".

If the API was expressive enough to say "try to shrink this cpuset by
X MB, with amount Y of effort, trying to evict nodes in the priority
order A,B,C" that might be a good start.

>
> >> The cpusets code is definitely similar to what memory resource control
> >> needs. I don't think that a resource control API needs to be tied to
> >> such granular, hard limits as the fakenodes code provides though. But
> >> maybe I'm wrong and it really would be acceptable for everyone.
> >
> >
> > Ah. This isn't intended to be specifically a "resource control API".
> > It's more intended to be an API that could be useful for certain kinds
> > of resource control, but could also be generically useful.
>
> If it is exporting any kind of implementation details, then it needs
> to be justified with a specific user that can't be implemented in a
> better way, IMO.

It's not really exporting any more implementation details than the
existing cpusets API (i.e. explicitly binding a job to a set of nodes
chosen by userspace). The only true exposed implementation detail is
the "priority" value from try_to_free_pages, and that could be
abstracted away as a value in some range 0-N where 0 means "try very
hard" and N means "hardly try at all", and it wouldn't have to be
directly linked to the try_to_free_pages() priority.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-11-30  9:45 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-29  3:06 [RFC][PATCH 0/1] Node-based reclaim/migration menage
2006-11-29  3:06 ` [RFC][PATCH 1/1] Expose per-node reclaim and migration to userspace menage
2006-11-29  6:07   ` Nick Piggin
2006-11-29 21:57     ` Paul Menage
2006-11-30  4:13       ` Christoph Lameter
2006-11-30  4:18         ` Paul Menage
2006-11-30  7:38       ` Nick Piggin
2006-11-30  7:57         ` Paul Menage
2006-11-30  8:26           ` Nick Piggin
2006-11-30  8:39             ` Paul Menage
2006-11-30  8:55               ` Nick Piggin
2006-11-30  9:06                 ` Paul Menage
2006-11-30  9:21                   ` Nick Piggin
2006-11-30  9:45                     ` Paul Menage [this message]
2006-11-30 10:15                       ` Nick Piggin
2006-11-30 10:40                         ` Paul Menage
2006-11-30 11:04                           ` Nick Piggin
2006-11-30 11:23                             ` Paul Menage
2006-11-30 11:35                               ` Nick Piggin
2006-11-30  0:18   ` KAMEZAWA Hiroyuki
2006-11-30  0:25     ` Paul Menage
2006-11-30  0:38       ` KAMEZAWA Hiroyuki
2006-11-30  4:15       ` Christoph Lameter
2006-11-30  4:10   ` Christoph Lameter
2006-11-30  0:31 ` [RFC][PATCH 0/1] Node-based reclaim/migration KAMEZAWA Hiroyuki
2006-11-30  0:31   ` Paul Menage
2006-11-30  4:11     ` KAMEZAWA Hiroyuki
2006-11-30  4:17     ` Christoph Lameter
2006-11-30 10:45       ` Paul Menage
2006-11-30 11:12         ` KAMEZAWA Hiroyuki
2006-11-30 11:25           ` Paul Menage
2006-11-30 12:18             ` KAMEZAWA Hiroyuki
2006-11-30 18:28             ` Christoph Lameter
2006-11-30 18:35               ` Paul Menage
2006-11-30 18:39                 ` Christoph Lameter
2006-11-30 19:09                   ` Paul Menage
2006-11-30 19:42                     ` Christoph Lameter
2006-11-30 19:53                       ` Paul Menage
2006-11-30 20:00                         ` Christoph Lameter
2006-11-30 20:07                           ` Paul Menage
2006-11-30 20:15                             ` Christoph Lameter
2006-11-30 21:33                               ` Paul Menage
2006-11-30 23:41                                 ` Christoph Lameter
2006-11-30 23:48                                   ` Paul Menage
2006-12-01  2:23                                     ` Christoph Lameter
2006-12-01 19:32                                       ` Paul Menage
2006-12-01 19:56                                         ` Christoph Lameter
2006-12-01  2:44                                     ` KAMEZAWA Hiroyuki
2006-12-01  2:43                                       ` Christoph Lameter
2006-12-01  2:59                                         ` KAMEZAWA Hiroyuki
2006-12-01  2:44                                       ` Christoph Lameter
2006-12-01  3:10                                         ` KAMEZAWA Hiroyuki
2006-12-01  5:28                                           ` Christoph Lameter
2006-11-30  4:04 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6599ad830611300145gae22510te7eaa63edf539ad1@mail.gmail.com \
    --to=menage@google.com \
    --cc=akpm@osdl.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox