linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Paul Menage <menage@google.com>
Cc: linux-mm@kvack.org, akpm@osdl.org
Subject: Re: [RFC][PATCH 1/1] Expose per-node reclaim and migration to userspace
Date: Thu, 30 Nov 2006 22:04:43 +1100	[thread overview]
Message-ID: <456EBACB.9080304@yahoo.com.au> (raw)
In-Reply-To: <6599ad830611300240x388ef00s60183bc3a105ed2a@mail.gmail.com>

Paul Menage wrote:
> On 11/30/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>>
>> I know it doesn't do what you want. It is an example of using page
>> migration under a higher level API, which I thought is what you
>> wanted to see.
> 
> 
> I'd been talking about the possibility of doing "try to move all
> memory from this node to this other set of nodes"; that wasn't an
> example of such an API.

Oh, well I was talking about using higher level API rather than
migrate directly!

>> Now you're talking about physical nodes as well, which is definitely
>> a problem you get when mixing the two.
>>
>> But there is no reason why you shouldn't be able to specify physical
>> nodes, while also altering the reservation. Even if that does mean
>> hiding the fake nodes from the cpuset interface.
> 
> 
> I think it should be possible to expose the real numa topology via the
> fake topology (e.g. all fake nodes on the same real node appear to be
> fairly close together, compared to any fake nodes on a different real
> node). So I don't think it's necessary to have a separate abstraction
> for fake vs physical nodes.

Well if you want to do (real) node affinity then you need some
separation of course.

But I'm not sure that there is a good reason to use the same
abstraction. Maybe there is, but I think it needs more discussion
(unless I missed something in the past couple of weeks were you
managed to get all memory resource controller groups to agree with
your fakenodes approach).

>> >> If it is exporting any kind of implementation details, then it needs
>> >> to be justified with a specific user that can't be implemented in a
>> >> better way, IMO.
>> >
>> >
>> > It's not really exporting any more implementation details than the
>> > existing cpusets API (i.e. explicitly binding a job to a set of nodes
>> > chosen by userspace). The only true exposed implementation detail is
>> > the "priority" value from try_to_free_pages, and that could be
>> > abstracted away as a value in some range 0-N where 0 means "try very
>> > hard" and N means "hardly try at all", and it wouldn't have to be
>> > directly linked to the try_to_free_pages() priority.
>>
>> Or the fact that memory reservation is implemented with nodes.
> 
> 
> Right, but to me that's a pretty fundamental design decision, rather
> than an implementation detail.

It is a design of the implementation.

The policy is to be able to reserve memory for specific groups of tasks.
And the best API is one where userspace specifies policy. Now there
might be a few tweaks or lower level hints or calls needed to make the
implementation work really optimally. But those should be added later,
and when they are found to be required (and not just maybe useful).

So I see nothing wrong with your exposing these things to userspace if
the goal is to test implementation or get a prototype working quickly.
But if you're talking about the upstream kernel, then I think you need
to start at a much higher level.

>> I'm
>> still not convinced that idea is the best way to export memory
>> control to userspace, regardless of whether it is quick and easy to
>> develop (or even deploy, at google).
> 
> 
> Maybe not the best way for all memory control, but it has certain big
> advantages, such as leveraging the existing numa support, and not
> requiring additional per-page overhead or LRU complexity.

Oh I agree. And I think it is one of the better implementations I have
seen. But I don't like the API.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-11-30 11:04 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-29  3:06 [RFC][PATCH 0/1] Node-based reclaim/migration menage
2006-11-29  3:06 ` [RFC][PATCH 1/1] Expose per-node reclaim and migration to userspace menage
2006-11-29  6:07   ` Nick Piggin
2006-11-29 21:57     ` Paul Menage
2006-11-30  4:13       ` Christoph Lameter
2006-11-30  4:18         ` Paul Menage
2006-11-30  7:38       ` Nick Piggin
2006-11-30  7:57         ` Paul Menage
2006-11-30  8:26           ` Nick Piggin
2006-11-30  8:39             ` Paul Menage
2006-11-30  8:55               ` Nick Piggin
2006-11-30  9:06                 ` Paul Menage
2006-11-30  9:21                   ` Nick Piggin
2006-11-30  9:45                     ` Paul Menage
2006-11-30 10:15                       ` Nick Piggin
2006-11-30 10:40                         ` Paul Menage
2006-11-30 11:04                           ` Nick Piggin [this message]
2006-11-30 11:23                             ` Paul Menage
2006-11-30 11:35                               ` Nick Piggin
2006-11-30  0:18   ` KAMEZAWA Hiroyuki
2006-11-30  0:25     ` Paul Menage
2006-11-30  0:38       ` KAMEZAWA Hiroyuki
2006-11-30  4:15       ` Christoph Lameter
2006-11-30  4:10   ` Christoph Lameter
2006-11-30  0:31 ` [RFC][PATCH 0/1] Node-based reclaim/migration KAMEZAWA Hiroyuki
2006-11-30  0:31   ` Paul Menage
2006-11-30  4:11     ` KAMEZAWA Hiroyuki
2006-11-30  4:17     ` Christoph Lameter
2006-11-30 10:45       ` Paul Menage
2006-11-30 11:12         ` KAMEZAWA Hiroyuki
2006-11-30 11:25           ` Paul Menage
2006-11-30 12:18             ` KAMEZAWA Hiroyuki
2006-11-30 18:28             ` Christoph Lameter
2006-11-30 18:35               ` Paul Menage
2006-11-30 18:39                 ` Christoph Lameter
2006-11-30 19:09                   ` Paul Menage
2006-11-30 19:42                     ` Christoph Lameter
2006-11-30 19:53                       ` Paul Menage
2006-11-30 20:00                         ` Christoph Lameter
2006-11-30 20:07                           ` Paul Menage
2006-11-30 20:15                             ` Christoph Lameter
2006-11-30 21:33                               ` Paul Menage
2006-11-30 23:41                                 ` Christoph Lameter
2006-11-30 23:48                                   ` Paul Menage
2006-12-01  2:23                                     ` Christoph Lameter
2006-12-01 19:32                                       ` Paul Menage
2006-12-01 19:56                                         ` Christoph Lameter
2006-12-01  2:44                                     ` KAMEZAWA Hiroyuki
2006-12-01  2:43                                       ` Christoph Lameter
2006-12-01  2:59                                         ` KAMEZAWA Hiroyuki
2006-12-01  2:44                                       ` Christoph Lameter
2006-12-01  3:10                                         ` KAMEZAWA Hiroyuki
2006-12-01  5:28                                           ` Christoph Lameter
2006-11-30  4:04 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456EBACB.9080304@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox