Re: [patch00/05]: Containers(V2)- Introduction

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [patch00/05]: Containers(V2)- Introduction
       [not found] <1158718568.29000.44.camel@galaxy.corp.google.com>
@ 2006-09-20  5:39 ` Nick Piggin
  2006-09-20 16:26   ` Christoph Lameter
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nick Piggin @ 2006-09-20  5:39 UTC (permalink / raw)
  To: rohitseth
  Cc: no To-header on input, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

Rohit Seth wrote:

>Containers:
>
>
[...]

>This is based on lot of discussions over last month or so.  I hope this
>patch set is something that we can agree and more support can be added
>on top of this.  Please provide feedback and add other extensions that
>are useful in the TODO list.
>

Hi Rohit,

Sorry for the late reply. I was just about to comment on your earlier
patchset but I will do so here instead.

Anyway I don't think I have much to say other than: this is almost
exactly as I had imagined the memory resource tracking should look
like. Just a small number of hooks and a very simple set of rules for
tracking allocations. Also, the possibility to track kernel
allocations as a whole rather than at individual callsites (which
shouldn't be too difficult to implement).

If anything I would perhaps even argue for further cutting down the
number of hooks and add them back as they prove to be needed.

I'm not sure about containers & workload management people, but from
a core mm/ perspective I see no reason why this couldn't get in,
given review and testing. Great!

Nick
--

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20  5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin
@ 2006-09-20 16:26   ` Christoph Lameter
  2006-09-20 16:56     ` Nick Piggin
  2006-09-20 16:27   ` Rohit Seth
       [not found]   ` <1158751720.8970.67.camel@twins>
  2 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2006-09-20 16:26 UTC (permalink / raw)
  To: Nick Piggin
  Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management

On Wed, 20 Sep 2006, Nick Piggin wrote:

> I'm not sure about containers & workload management people, but from
> a core mm/ perspective I see no reason why this couldn't get in,
> given review and testing. Great!

Nack. We already have the ability to manage workloads. We may want to 
extend the existing functionality but this is duplicating what is already 
available through cpusets.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20  5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin
  2006-09-20 16:26   ` Christoph Lameter
@ 2006-09-20 16:27   ` Rohit Seth
       [not found]   ` <1158751720.8970.67.camel@twins>
  2 siblings, 0 replies; 25+ messages in thread
From: Rohit Seth @ 2006-09-20 16:27 UTC (permalink / raw)
  To: Nick Piggin; +Cc: CKRM-Tech, devel, linux-kernel, Linux Memory Management

On Wed, 2006-09-20 at 15:39 +1000, Nick Piggin wrote:


> Anyway I don't think I have much to say other than: this is almost
> exactly as I had imagined the memory resource tracking should look
> like. Just a small number of hooks and a very simple set of rules for
> tracking allocations. Also, the possibility to track kernel
> allocations as a whole rather than at individual callsites (which
> shouldn't be too difficult to implement).
> 

I've started looking in that direction.  First shot could just be
tracking kernel memory consumption w/o worrying about whether it is slab
or PT etc.  Hopefully next patchset will have that support integrated.

> If anything I would perhaps even argue for further cutting down the
> number of hooks and add them back as they prove to be needed.
> 

I think the current set of changes (and tracking of different
components) is necessary for memory handler to do the right thing.  Plus
it is possible that user land management tools can also make use of this
information.

> I'm not sure about containers & workload management people, but from
> a core mm/ perspective I see no reason why this couldn't get in,
> given review and testing. Great!
> 

That is great to know. Thanks.  Hopefully it is getting enough coverage
to get there.

-rohit


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 16:26   ` Christoph Lameter
@ 2006-09-20 16:56     ` Nick Piggin
  2006-09-20 17:08       ` Christoph Lameter
  2006-09-20 17:40       ` Alan Cox
  0 siblings, 2 replies; 25+ messages in thread
From: Nick Piggin @ 2006-09-20 16:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management

Christoph Lameter wrote:
> On Wed, 20 Sep 2006, Nick Piggin wrote:
> 
> 
>>I'm not sure about containers & workload management people, but from
>>a core mm/ perspective I see no reason why this couldn't get in,
>>given review and testing. Great!
> 
> 
> Nack. We already have the ability to manage workloads. We may want to 
> extend the existing functionality but this is duplicating what is already 
> available through cpusets.

If it wasn't clear was talking specifically about the hooks for page
tracking rather than the whole patchset. If anybody wants such page
tracking infrastructure in the kernel, then this (as opposed to the
huge beancounters stuff) is what it should look like.

But as I said above, I don't know what the containers and workload
management people want exactly... The recent discussions about using
nodes and cpusets for memory workload management does seem like a
promising idea, and if it would avoid the need for this kind of
per-page tracking entirely, then that would probably be even better.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
       [not found]       ` <1158767787.3278.103.camel@taijtu>
@ 2006-09-20 17:00         ` Nick Piggin
  2006-09-20 17:23           ` [ckrm-tech] " Paul Menage
                             ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nick Piggin @ 2006-09-20 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rohitseth, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

(this time to the lists as well)

Peter Zijlstra wrote:

 > I'd much rather containterize the whole reclaim code, which should not
 > be too hard since he already adds a container pointer to struct page.

Yes, and I tend to agree with you. I probably wasn't clear, but I was
mainly talking about just the memory resource tracking part of this
patchset.

I am less willing to make a judgement about reclaim, because I don't
know very much about the workloads or the guarantees they attempt to
provide.

 > Esp. when we get some of my page reclaim abstractions merged, moving the
 > reclaim from struct zone to a container is not a lot of work. (this is
 > basically what one of the ckrm mm policies did too)

I do agree that it would be nicer to not have a completely different
scheme for doing their own page reclaim, but rather use the existing
code (*provided* that it is designed in the same, minimally intrusive
manner as the page tracking).

I can understand how it is attractive to create a new subsystem to
solve a particular problem, but once it is in the kernel it has to be
maintained regardless, so if it can be done in a way that shares more
of the current infrastructure (nicely) then that would be a better
solution.

I like that they're investigating the use of memory nodes for this.
It seems like the logical starting place.

 > I still have to reread what Rohit does for file backed pages, that gave
 > my head a spin.
 > I've been thinking a bit on that problem, and it would be possible to
 > share all address_space pages equally between attached containers, this
 > would lose some accuracy, since one container could read 10% of the file
 > and another 90%, but I don't think that is a common scenario.

Yeah, I'm not sure about that. I don't think really complex schemes
are needed... but again I might need more knowledge of their workloads
and problems.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 16:56     ` Nick Piggin
@ 2006-09-20 17:08       ` Christoph Lameter
  2006-09-20 17:19         ` Nick Piggin
  2006-09-20 17:40       ` Alan Cox
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2006-09-20 17:08 UTC (permalink / raw)
  To: Nick Piggin
  Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management

On Thu, 21 Sep 2006, Nick Piggin wrote:

> If it wasn't clear was talking specifically about the hooks for page
> tracking rather than the whole patchset. If anybody wants such page
> tracking infrastructure in the kernel, then this (as opposed to the
> huge beancounters stuff) is what it should look like.

Could you point to the patch and a description for what is meant here by 
page tracking (did not see that in the patch, maybe I could not find it)? 
If these are just statistics then we likely already have 
them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:08       ` Christoph Lameter
@ 2006-09-20 17:19         ` Nick Piggin
  2006-09-20 17:30           ` Christoph Lameter
  0 siblings, 1 reply; 25+ messages in thread
From: Nick Piggin @ 2006-09-20 17:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management

Christoph Lameter wrote:
> On Thu, 21 Sep 2006, Nick Piggin wrote:
> 
> 
>>If it wasn't clear was talking specifically about the hooks for page
>>tracking rather than the whole patchset. If anybody wants such page
>>tracking infrastructure in the kernel, then this (as opposed to the
>>huge beancounters stuff) is what it should look like.
> 
> 
> Could you point to the patch and a description for what is meant here by 
> page tracking (did not see that in the patch, maybe I could not find it)? 
> If these are just statistics then we likely already have 
> them.
> 

Patch 2/5 in this series provides hooks, and they are pretty unintrusive.

  mm/filemap.c              |    4 ++++
  mm/page_alloc.c           |    3 +++
  mm/rmap.c                 |    8 +++++++-
  mm/swap.c                 |    1 +
  mm/vmscan.c               |    1 +

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:00         ` Nick Piggin
@ 2006-09-20 17:23           ` Paul Menage
  2006-09-20 17:36           ` Alan Cox
  2006-09-20 17:50           ` Rohit Seth
  2 siblings, 0 replies; 25+ messages in thread
From: Paul Menage @ 2006-09-20 17:23 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, CKRM-Tech, linux-kernel, Linux Memory Management,
	rohitseth, devel, Christoph Lameter

On 9/20/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Yeah, I'm not sure about that. I don't think really complex schemes
> are needed... but again I might need more knowledge of their workloads
> and problems.
>

The basic need for separating files into containers distinct from the
tasks that are using them arises when you have several "jobs" all
working with the same large data set. (Possibly read-only data files,
or possibly one job is updating a dataset that's being used by other
jobs).
For automated job-tracking and scheduling, it's important to be able
to distinguish shared usage from individual usage (e.g. to be able to
answer questions "if I kill job X, how much memory do I get back?" and
"how do I recover 1G of memory on this machine")

As an example, assume two jobs each with 100M of anonymous memory both
mapping the same 1G file, for a total usage of 1.2G.

Any setup that doesn't let you distinguish shared and private usage
makes it hard to answer that kind of scheduling questions. E.g.:

- first user gets charged for the page -> first job reported as 1.1G,
and the second as 0.1G.

- page charges get shared between all users of the page -> two tasks
using 0.6G each.

- all users get charged for the page -> two tasks using 1.1G each.

But in fact killing either one of these jobs individually would only
free up 100M

By explicitly letting userspace see that there are two jobs each with
a private usage of 100M, and they're sharing a dataset of 1G, it's
possible to make more informed decisions.

The issue of telling the kernel exactly which files/directories need
to be accounted separately can be handled by userspace.

It could be done by per-page accounting, or by constraining particular
files to particular memory zones, or by just tracking/limiting the
number of pages from each address_space in the pagecache, but I think
that it's important that the kernel at least provide the primitive
support for this.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:36           ` Alan Cox
@ 2006-09-20 17:30             ` Nick Piggin
  0 siblings, 0 replies; 25+ messages in thread
From: Nick Piggin @ 2006-09-20 17:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Zijlstra, rohitseth, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

Alan Cox wrote:
> Ar Iau, 2006-09-21 am 03:00 +1000, ysgrifennodd Nick Piggin:
> 
>> > I've been thinking a bit on that problem, and it would be possible to
>> > share all address_space pages equally between attached containers, this
>> > would lose some accuracy, since one container could read 10% of the file
>> > and another 90%, but I don't think that is a common scenario.
>>
>>
>>Yeah, I'm not sure about that. I don't think really complex schemes
>>are needed... but again I might need more knowledge of their workloads
>>and problems.
> 
> 
> Any scenario which permits "cheating" will be a scenario that happens
> because people will try and cheat.

That's true, and that's one reason why I've advocated the solution
implemented by Rohit's patches, that is: just throw in the towel and
be happy to count just pages.

Look at the beancounter stuff, and it has hooks (in the form of gfp
flags) throughput the tree, and they still manage to miss accounting
user exploitable memory overallocation from some callers. Maintaining
that will be much more difficult and error prone.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:19         ` Nick Piggin
@ 2006-09-20 17:30           ` Christoph Lameter
  2006-09-20 18:03             ` Nick Piggin
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2006-09-20 17:30 UTC (permalink / raw)
  To: Nick Piggin
  Cc: rohitseth, pj, CKRM-Tech, devel, linux-kernel, Linux Memory Management

On Thu, 21 Sep 2006, Nick Piggin wrote:

> Patch 2/5 in this series provides hooks, and they are pretty unintrusive.

Ok. We shadow existing vm counters add stuff to the adress_space 
structure. The task add / remove is duplicating what some of the cpuset 
hooks do. That clearly shows that we are just duplicating functionality.

The mapping things are new.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:00         ` Nick Piggin
  2006-09-20 17:23           ` [ckrm-tech] " Paul Menage
@ 2006-09-20 17:36           ` Alan Cox
  2006-09-20 17:30             ` Nick Piggin
  2006-09-20 17:50           ` Rohit Seth
  2 siblings, 1 reply; 25+ messages in thread
From: Alan Cox @ 2006-09-20 17:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, rohitseth, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

Ar Iau, 2006-09-21 am 03:00 +1000, ysgrifennodd Nick Piggin:
>  > I've been thinking a bit on that problem, and it would be possible to
>  > share all address_space pages equally between attached containers, this
>  > would lose some accuracy, since one container could read 10% of the file
>  > and another 90%, but I don't think that is a common scenario.
> 
> 
> Yeah, I'm not sure about that. I don't think really complex schemes
> are needed... but again I might need more knowledge of their workloads
> and problems.

Any scenario which permits "cheating" will be a scenario that happens
because people will try and cheat.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 16:56     ` Nick Piggin
  2006-09-20 17:08       ` Christoph Lameter
@ 2006-09-20 17:40       ` Alan Cox
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2006-09-20 17:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Lameter, rohitseth, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

Ar Iau, 2006-09-21 am 02:56 +1000, ysgrifennodd Nick Piggin:
> But as I said above, I don't know what the containers and workload
> management people want exactly... The recent discussions about using
> nodes and cpusets for memory workload management does seem like a
> promising idea, and if it would avoid the need for this kind of
> per-page tracking entirely, then that would probably be even better.

I think you can roughly break it down to

- I want one group of users not to be able to screw another group of
users or the box but don't care about anything else. The basic
beancounter stuff handles this. Generally they also want maximal
sharing.

- I want to charge people for portions of machine use (mostly accounting
and some fairness)

- I don't want any user to be able to hog the system to the harm of the
others but don't care about overcommit of idle resources (think about
the 5000 apaches on a box case)

- I want to be able to divide resources reasonably accurately all the
time between groups of users

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:00         ` Nick Piggin
  2006-09-20 17:23           ` [ckrm-tech] " Paul Menage
  2006-09-20 17:36           ` Alan Cox
@ 2006-09-20 17:50           ` Rohit Seth
  2006-09-20 17:52             ` Christoph Lameter
  2006-09-20 18:37             ` Peter Zijlstra
  2 siblings, 2 replies; 25+ messages in thread
From: Rohit Seth @ 2006-09-20 17:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote:
> (this time to the lists as well)
> 
> Peter Zijlstra wrote:
> 
>  > I'd much rather containterize the whole reclaim code, which should not
>  > be too hard since he already adds a container pointer to struct page.
> 
> 

Right now the memory handler in this container subsystem is written in
such a way that when existing kernel reclaimer kicks in, it will first
operate on those (container with pages over the limit) pages first.  But
in general I like the notion of containerizing the whole reclaim code.

>  > I still have to reread what Rohit does for file backed pages, that gave
>  > my head a spin.

Please let me know if there is any specific part that isn't making much
sense.

-rohit


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:50           ` Rohit Seth
@ 2006-09-20 17:52             ` Christoph Lameter
  2006-09-20 18:06               ` Peter Zijlstra
  2006-09-20 18:37             ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2006-09-20 17:52 UTC (permalink / raw)
  To: Rohit Seth
  Cc: Nick Piggin, Peter Zijlstra, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 20 Sep 2006, Rohit Seth wrote:

> Right now the memory handler in this container subsystem is written in
> such a way that when existing kernel reclaimer kicks in, it will first
> operate on those (container with pages over the limit) pages first.  But
> in general I like the notion of containerizing the whole reclaim code.

Which comes naturally with cpusets.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:30           ` Christoph Lameter
@ 2006-09-20 18:03             ` Nick Piggin
  0 siblings, 0 replies; 25+ messages in thread
From: Nick Piggin @ 2006-09-20 18:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: rohitseth, pj, CKRM-Tech, devel, linux-kernel, Linux Memory Management

Christoph Lameter wrote:
> On Thu, 21 Sep 2006, Nick Piggin wrote:
> 
> 
>>Patch 2/5 in this series provides hooks, and they are pretty unintrusive.
> 
> 
> Ok. We shadow existing vm counters add stuff to the adress_space 
> structure. The task add / remove is duplicating what some of the cpuset 
> hooks do. That clearly shows that we are just duplicating functionality.

I don't think so. To start with, the point about containers is they are
not per address_space.

But secondly, these are hooks from the container subsystem into the mm
subsystem. As such, they might do something a bit more or different
than simple statistics, and we don't want to teach the core mm/ about
what that might be. You also want to be able to configure them out
entirely.

I think it is fine to add some new hooks in fundamental (ie mm agnostic)
points. Without getting to the fine details about exactly how the hooks
are implemented, or what information needs to be tracked, I think we can
say that they are not much burden for mm/ to bear (if they turn out to
be usable).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:52             ` Christoph Lameter
@ 2006-09-20 18:06               ` Peter Zijlstra
  2006-09-20 18:14                 ` Rohit Seth
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Peter Zijlstra @ 2006-09-20 18:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rohit Seth, Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote:
> On Wed, 20 Sep 2006, Rohit Seth wrote:
> 
> > Right now the memory handler in this container subsystem is written in
> > such a way that when existing kernel reclaimer kicks in, it will first
> > operate on those (container with pages over the limit) pages first.  But
> > in general I like the notion of containerizing the whole reclaim code.
> 
> Which comes naturally with cpusets.

How are shared mappings dealt with, are pages charged to the set that
first faults them in?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:06               ` Peter Zijlstra
@ 2006-09-20 18:14                 ` Rohit Seth
  2006-09-20 18:27                   ` Peter Zijlstra
  2006-09-20 19:48                 ` Paul Jackson
  2006-09-20 19:48                 ` Christoph Lameter
  2 siblings, 1 reply; 25+ messages in thread
From: Rohit Seth @ 2006-09-20 18:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 2006-09-20 at 20:06 +0200, Peter Zijlstra wrote:
> On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote:
> > On Wed, 20 Sep 2006, Rohit Seth wrote:
> > 
> > > Right now the memory handler in this container subsystem is written in
> > > such a way that when existing kernel reclaimer kicks in, it will first
> > > operate on those (container with pages over the limit) pages first.  But
> > > in general I like the notion of containerizing the whole reclaim code.
> > 
> > Which comes naturally with cpusets.
> 
> How are shared mappings dealt with, are pages charged to the set that
> first faults them in?
> 

For anonymous pages (simpler case), they get charged to the faulting
task's container.

For filesystem pages (could be shared across tasks running different
containers): Every time a new file mapping is created, it is bound to a
container of the process creating that mapping.  All subsequent pages
belonging to this mapping will belong to this container, irrespective of
different tasks running in different containers accessing these pages.
Currently, I've not implemented a mechanism to allow a file to be
specifically moved into or out of container. But when that gets
implemented then all pages belonging to a mapping will also move out of
container (or into a new container).

-rohit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:14                 ` Rohit Seth
@ 2006-09-20 18:27                   ` Peter Zijlstra
  2006-09-20 18:33                     ` [ckrm-tech] " Paul Menage
  2006-09-20 18:38                     ` Rohit Seth
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2006-09-20 18:27 UTC (permalink / raw)
  To: rohitseth
  Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 2006-09-20 at 11:14 -0700, Rohit Seth wrote:
> On Wed, 2006-09-20 at 20:06 +0200, Peter Zijlstra wrote:
> > On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote:
> > > On Wed, 20 Sep 2006, Rohit Seth wrote:
> > > 
> > > > Right now the memory handler in this container subsystem is written in
> > > > such a way that when existing kernel reclaimer kicks in, it will first
> > > > operate on those (container with pages over the limit) pages first.  But
> > > > in general I like the notion of containerizing the whole reclaim code.
> > > 
> > > Which comes naturally with cpusets.
> > 
> > How are shared mappings dealt with, are pages charged to the set that
> > first faults them in?
> > 
> 
> For anonymous pages (simpler case), they get charged to the faulting
> task's container.
> 
> For filesystem pages (could be shared across tasks running different
> containers): Every time a new file mapping is created, it is bound to a
> container of the process creating that mapping.  All subsequent pages
> belonging to this mapping will belong to this container, irrespective of
> different tasks running in different containers accessing these pages.
> Currently, I've not implemented a mechanism to allow a file to be
> specifically moved into or out of container. But when that gets
> implemented then all pages belonging to a mapping will also move out of
> container (or into a new container).

Yes, I read that in your patches, I was wondering how the cpuset
approach would handle this.

Neither are really satisfactory for shared mappings.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:27                   ` Peter Zijlstra
@ 2006-09-20 18:33                     ` Paul Menage
  2006-09-20 18:38                     ` Rohit Seth
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Menage @ 2006-09-20 18:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rohitseth, Nick Piggin, CKRM-Tech, linux-kernel,
	Linux Memory Management, devel, Christoph Lameter

On 9/20/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Yes, I read that in your patches, I was wondering how the cpuset
> approach would handle this.

The VM currently has support for letting vmas define their own memory
policies - so specifying that a file-backed vma gets its memory from a
particular set of memory nodes would accomplish that for the fake-node
approach. The mechanism for setting up the per-file/per-vma policies
would probably involve something originating in struct inode or struct
address_space.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 17:50           ` Rohit Seth
  2006-09-20 17:52             ` Christoph Lameter
@ 2006-09-20 18:37             ` Peter Zijlstra
  2006-09-20 18:57               ` Rohit Seth
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2006-09-20 18:37 UTC (permalink / raw)
  To: rohitseth
  Cc: Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

On Wed, 2006-09-20 at 10:50 -0700, Rohit Seth wrote:
> On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote:
> > (this time to the lists as well)
> > 
> > Peter Zijlstra wrote:
> > 
> >  > I'd much rather containterize the whole reclaim code, which should not
> >  > be too hard since he already adds a container pointer to struct page.
> > 
> > 
> 
> Right now the memory handler in this container subsystem is written in
> such a way that when existing kernel reclaimer kicks in, it will first
> operate on those (container with pages over the limit) pages first.  But
> in general I like the notion of containerizing the whole reclaim code.

Patch 5/5 seems to have a horrid deactivation scheme.

> >  > I still have to reread what Rohit does for file backed pages, that gave
> >  > my head a spin.
> 
> Please let me know if there is any specific part that isn't making much
> sense.

Well, the whole over the limit handler is quite painfull, having taken a
second reading it isn't all that complex after all, just odd.

You just start invalidating whole files for file backed pages. Granted,
this will get you below the threshold. but you might just have destroyed
your working set.

Pretty much the same for you anonymous memory handler, you scan through
the pages in linear fashion and demote the first that you encounter.

Both things pretty thoroughly destroy the existing kernel reclaim.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:27                   ` Peter Zijlstra
  2006-09-20 18:33                     ` [ckrm-tech] " Paul Menage
@ 2006-09-20 18:38                     ` Rohit Seth
  1 sibling, 0 replies; 25+ messages in thread
From: Rohit Seth @ 2006-09-20 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 2006-09-20 at 20:27 +0200, Peter Zijlstra wrote:

> Yes, I read that in your patches, I was wondering how the cpuset
> approach would handle this.
> 
> Neither are really satisfactory for shared mappings.
> 

In which way?  We could have the per container flag indicating whether
to charge this container for shared mapping that it initiates or to the
container where mapping belongs...or is there something different that
you are referring.

-rohit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:37             ` Peter Zijlstra
@ 2006-09-20 18:57               ` Rohit Seth
  0 siblings, 0 replies; 25+ messages in thread
From: Rohit Seth @ 2006-09-20 18:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management, Christoph Lameter

On Wed, 2006-09-20 at 20:37 +0200, Peter Zijlstra wrote:
> On Wed, 2006-09-20 at 10:50 -0700, Rohit Seth wrote:
> > On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote:
> > > (this time to the lists as well)
> > > 
> > > Peter Zijlstra wrote:
> > > 
> > >  > I'd much rather containterize the whole reclaim code, which should not
> > >  > be too hard since he already adds a container pointer to struct page.
> > > 
> > > 
> > 
> > Right now the memory handler in this container subsystem is written in
> > such a way that when existing kernel reclaimer kicks in, it will first
> > operate on those (container with pages over the limit) pages first.  But
> > in general I like the notion of containerizing the whole reclaim code.
> 
> Patch 5/5 seems to have a horrid deactivation scheme.
> 
> > >  > I still have to reread what Rohit does for file backed pages, that gave
> > >  > my head a spin.
> > 
> > Please let me know if there is any specific part that isn't making much
> > sense.
> 
> Well, the whole over the limit handler is quite painfull, having taken a
> second reading it isn't all that complex after all, just odd.
> 

It is very basic right now.  

> You just start invalidating whole files for file backed pages. Granted,
> this will get you below the threshold. but you might just have destroyed
> your working set.
> 

When a container gone over the limit then it is okay to penalize it.  I
agree that I'm not making an attempt to maintain the current working
set.  Any suggestions that I can incorporate to improve this algorithm
will be very appreciated.

> Pretty much the same for you anonymous memory handler, you scan through
> the pages in linear fashion and demote the first that you encounter.
> 
> Both things pretty thoroughly destroy the existing kernel reclaim.
> 

I agree that with in a container I need to do add more smarts to (for
example) not do a linear search.  Simple additions like last task or
last mapping visited could be useful. And I definitely want to improve
on that.

Though it should not destroy the existing kernel reclaim.  Pages
belonging to over the limit container should be the first ones to either
get flushed out to FS or swapped if necessary.  (Means that is the cost
that you will have to pay if you, for example, want to container your
tar to 100MB memory foot print).

-rohit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:06               ` Peter Zijlstra
  2006-09-20 18:14                 ` Rohit Seth
@ 2006-09-20 19:48                 ` Paul Jackson
  2006-09-20 19:48                 ` Christoph Lameter
  2 siblings, 0 replies; 25+ messages in thread
From: Paul Jackson @ 2006-09-20 19:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: clameter, rohitseth, nickpiggin, ckrm-tech, devel, linux-kernel,
	linux-mm

Peter wrote:
> > Which comes naturally with cpusets.
> 
> How are shared mappings dealt with, are pages charged to the set that
> first faults them in?

Cpusets does not attempt to manage how much memory a task can allocate,
but where it can allocate it.  If a task can find an existing page to
share, and avoid the allocation, then it entirely avoids dealing with
cpusets in that case.

Cpusets pays no attention to how often a page is shared.  It controls
which tasks can allocate a given free page, based on the node on which
that page resides.  If that node is allowed in a tasks 'nodemask_t
mems_allowed' (a task struct field), then the task can allocate
that page, so far as cpusets is concerned.

Cpusets does not care who links to a page, once it is allocated.

Every page is assigned to one specific node, and may only be allocated
by tasks allowed to allocate from that node.

These cpusets can overlap - which so far as memory goes, roughly means
that the various mems_allowed nodemask_t's of different tasks can overlap.

Here's an oddball example configuration that might make this easier to
think about.

    Let's say we have a modest sized NUMA system with an extra bank
    of memory added, in addition to the per-node memory.  Let's say
    the extra bank is a huge pile of cheaper (slower) memory, off a
    slower bus.

    Normal sized tasks running on one or more of the NUMA nodes just
    get to fight for the CPUs and memory on those nodes allowed them.

    Let's say an occassional big memory job is to be allowed to use
    some of the extra cheap memory, and we use the idea of Andrew
    and others to split that memory into fake nodes to manage the
    portion of memory available to specified tasks.

    Then one of these big jobs could be in a cpuset that let it use
    one or more of the CPUs and memory on the node it ran on, plus
    some number of the fake nodes on the extra cheap memory.

    Other jobs could be allowed, using cpusets, to use any combination
    of the same or overlapping CPUs or nodes, and/or other disjoint
    CPUs or nodes, fake or real.

Another example, restating some of the above.

    If say some application happened to fault in a libc.so page,
    it would be required to place that page on one of the nodes
    allowed to it.  If an other application comes along later and
    ends up wanting shared references to that same page, it could
    certainly do so, regardless of its cpuset settings.  It would
    not be allocating a new page for this, so would not encounter
    the cpuset constraints on where it could allocate such a page.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch00/05]: Containers(V2)- Introduction
  2006-09-20 18:06               ` Peter Zijlstra
  2006-09-20 18:14                 ` Rohit Seth
  2006-09-20 19:48                 ` Paul Jackson
@ 2006-09-20 19:48                 ` Christoph Lameter
  2006-09-20 19:51                   ` [ckrm-tech] " Paul Menage
  2 siblings, 1 reply; 25+ messages in thread
From: Christoph Lameter @ 2006-09-20 19:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rohit Seth, Nick Piggin, CKRM-Tech, devel, linux-kernel,
	Linux Memory Management

On Wed, 20 Sep 2006, Peter Zijlstra wrote:

> > Which comes naturally with cpusets.
> 
> How are shared mappings dealt with, are pages charged to the set that
> first faults them in?

They are charged to the node from which they were allocated. If the 
process is restricted to the node (container) then all pages allocated 
are are charged to the container regardless if they are shared or not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction
  2006-09-20 19:48                 ` Christoph Lameter
@ 2006-09-20 19:51                   ` Paul Menage
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Menage @ 2006-09-20 19:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Nick Piggin, CKRM-Tech, linux-kernel,
	Linux Memory Management, Rohit Seth, devel

On 9/20/06, Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 20 Sep 2006, Peter Zijlstra wrote:
>
> > > Which comes naturally with cpusets.
> >
> > How are shared mappings dealt with, are pages charged to the set that
> > first faults them in?
>
> They are charged to the node from which they were allocated. If the
> process is restricted to the node (container) then all pages allocated
> are are charged to the container regardless if they are shared or not.
>

Or you could use the per-vma mempolicy support to bind a large data
file to a particular node, and track shared file usage that way.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-09-20 19:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1158718568.29000.44.camel@galaxy.corp.google.com>
2006-09-20  5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin
2006-09-20 16:26   ` Christoph Lameter
2006-09-20 16:56     ` Nick Piggin
2006-09-20 17:08       ` Christoph Lameter
2006-09-20 17:19         ` Nick Piggin
2006-09-20 17:30           ` Christoph Lameter
2006-09-20 18:03             ` Nick Piggin
2006-09-20 17:40       ` Alan Cox
2006-09-20 16:27   ` Rohit Seth
     [not found]   ` <1158751720.8970.67.camel@twins>
     [not found]     ` <4511626B.9000106@yahoo.com.au>
     [not found]       ` <1158767787.3278.103.camel@taijtu>
2006-09-20 17:00         ` Nick Piggin
2006-09-20 17:23           ` [ckrm-tech] " Paul Menage
2006-09-20 17:36           ` Alan Cox
2006-09-20 17:30             ` Nick Piggin
2006-09-20 17:50           ` Rohit Seth
2006-09-20 17:52             ` Christoph Lameter
2006-09-20 18:06               ` Peter Zijlstra
2006-09-20 18:14                 ` Rohit Seth
2006-09-20 18:27                   ` Peter Zijlstra
2006-09-20 18:33                     ` [ckrm-tech] " Paul Menage
2006-09-20 18:38                     ` Rohit Seth
2006-09-20 19:48                 ` Paul Jackson
2006-09-20 19:48                 ` Christoph Lameter
2006-09-20 19:51                   ` [ckrm-tech] " Paul Menage
2006-09-20 18:37             ` Peter Zijlstra
2006-09-20 18:57               ` Rohit Seth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox