* Re: [patch00/05]: Containers(V2)- Introduction [not found] <1158718568.29000.44.camel@galaxy.corp.google.com> @ 2006-09-20 5:39 ` Nick Piggin 2006-09-20 16:26 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Nick Piggin @ 2006-09-20 5:39 UTC (permalink / raw) To: rohitseth Cc: no To-header on input, CKRM-Tech, devel, linux-kernel, Linux Memory Management Rohit Seth wrote: >Containers: > > [...] >This is based on lot of discussions over last month or so. I hope this >patch set is something that we can agree and more support can be added >on top of this. Please provide feedback and add other extensions that >are useful in the TODO list. > Hi Rohit, Sorry for the late reply. I was just about to comment on your earlier patchset but I will do so here instead. Anyway I don't think I have much to say other than: this is almost exactly as I had imagined the memory resource tracking should look like. Just a small number of hooks and a very simple set of rules for tracking allocations. Also, the possibility to track kernel allocations as a whole rather than at individual callsites (which shouldn't be too difficult to implement). If anything I would perhaps even argue for further cutting down the number of hooks and add them back as they prove to be needed. I'm not sure about containers & workload management people, but from a core mm/ perspective I see no reason why this couldn't get in, given review and testing. Great! Nick -- Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin @ 2006-09-20 16:26 ` Christoph Lameter 2006-09-20 16:56 ` Nick Piggin 2006-09-20 16:27 ` Rohit Seth [not found] ` <1158751720.8970.67.camel@twins> 2 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2006-09-20 16:26 UTC (permalink / raw) To: Nick Piggin Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 20 Sep 2006, Nick Piggin wrote: > I'm not sure about containers & workload management people, but from > a core mm/ perspective I see no reason why this couldn't get in, > given review and testing. Great! Nack. We already have the ability to manage workloads. We may want to extend the existing functionality but this is duplicating what is already available through cpusets. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 16:26 ` Christoph Lameter @ 2006-09-20 16:56 ` Nick Piggin 2006-09-20 17:08 ` Christoph Lameter 2006-09-20 17:40 ` Alan Cox 0 siblings, 2 replies; 25+ messages in thread From: Nick Piggin @ 2006-09-20 16:56 UTC (permalink / raw) To: Christoph Lameter Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management Christoph Lameter wrote: > On Wed, 20 Sep 2006, Nick Piggin wrote: > > >>I'm not sure about containers & workload management people, but from >>a core mm/ perspective I see no reason why this couldn't get in, >>given review and testing. Great! > > > Nack. We already have the ability to manage workloads. We may want to > extend the existing functionality but this is duplicating what is already > available through cpusets. If it wasn't clear was talking specifically about the hooks for page tracking rather than the whole patchset. If anybody wants such page tracking infrastructure in the kernel, then this (as opposed to the huge beancounters stuff) is what it should look like. But as I said above, I don't know what the containers and workload management people want exactly... The recent discussions about using nodes and cpusets for memory workload management does seem like a promising idea, and if it would avoid the need for this kind of per-page tracking entirely, then that would probably be even better. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 16:56 ` Nick Piggin @ 2006-09-20 17:08 ` Christoph Lameter 2006-09-20 17:19 ` Nick Piggin 2006-09-20 17:40 ` Alan Cox 1 sibling, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2006-09-20 17:08 UTC (permalink / raw) To: Nick Piggin Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Thu, 21 Sep 2006, Nick Piggin wrote: > If it wasn't clear was talking specifically about the hooks for page > tracking rather than the whole patchset. If anybody wants such page > tracking infrastructure in the kernel, then this (as opposed to the > huge beancounters stuff) is what it should look like. Could you point to the patch and a description for what is meant here by page tracking (did not see that in the patch, maybe I could not find it)? If these are just statistics then we likely already have them. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:08 ` Christoph Lameter @ 2006-09-20 17:19 ` Nick Piggin 2006-09-20 17:30 ` Christoph Lameter 0 siblings, 1 reply; 25+ messages in thread From: Nick Piggin @ 2006-09-20 17:19 UTC (permalink / raw) To: Christoph Lameter Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management Christoph Lameter wrote: > On Thu, 21 Sep 2006, Nick Piggin wrote: > > >>If it wasn't clear was talking specifically about the hooks for page >>tracking rather than the whole patchset. If anybody wants such page >>tracking infrastructure in the kernel, then this (as opposed to the >>huge beancounters stuff) is what it should look like. > > > Could you point to the patch and a description for what is meant here by > page tracking (did not see that in the patch, maybe I could not find it)? > If these are just statistics then we likely already have > them. > Patch 2/5 in this series provides hooks, and they are pretty unintrusive. mm/filemap.c | 4 ++++ mm/page_alloc.c | 3 +++ mm/rmap.c | 8 +++++++- mm/swap.c | 1 + mm/vmscan.c | 1 + -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:19 ` Nick Piggin @ 2006-09-20 17:30 ` Christoph Lameter 2006-09-20 18:03 ` Nick Piggin 0 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2006-09-20 17:30 UTC (permalink / raw) To: Nick Piggin Cc: rohitseth, pj, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Thu, 21 Sep 2006, Nick Piggin wrote: > Patch 2/5 in this series provides hooks, and they are pretty unintrusive. Ok. We shadow existing vm counters add stuff to the adress_space structure. The task add / remove is duplicating what some of the cpuset hooks do. That clearly shows that we are just duplicating functionality. The mapping things are new. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:30 ` Christoph Lameter @ 2006-09-20 18:03 ` Nick Piggin 0 siblings, 0 replies; 25+ messages in thread From: Nick Piggin @ 2006-09-20 18:03 UTC (permalink / raw) To: Christoph Lameter Cc: rohitseth, pj, CKRM-Tech, devel, linux-kernel, Linux Memory Management Christoph Lameter wrote: > On Thu, 21 Sep 2006, Nick Piggin wrote: > > >>Patch 2/5 in this series provides hooks, and they are pretty unintrusive. > > > Ok. We shadow existing vm counters add stuff to the adress_space > structure. The task add / remove is duplicating what some of the cpuset > hooks do. That clearly shows that we are just duplicating functionality. I don't think so. To start with, the point about containers is they are not per address_space. But secondly, these are hooks from the container subsystem into the mm subsystem. As such, they might do something a bit more or different than simple statistics, and we don't want to teach the core mm/ about what that might be. You also want to be able to configure them out entirely. I think it is fine to add some new hooks in fundamental (ie mm agnostic) points. Without getting to the fine details about exactly how the hooks are implemented, or what information needs to be tracked, I think we can say that they are not much burden for mm/ to bear (if they turn out to be usable). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 16:56 ` Nick Piggin 2006-09-20 17:08 ` Christoph Lameter @ 2006-09-20 17:40 ` Alan Cox 1 sibling, 0 replies; 25+ messages in thread From: Alan Cox @ 2006-09-20 17:40 UTC (permalink / raw) To: Nick Piggin Cc: Christoph Lameter, rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management Ar Iau, 2006-09-21 am 02:56 +1000, ysgrifennodd Nick Piggin: > But as I said above, I don't know what the containers and workload > management people want exactly... The recent discussions about using > nodes and cpusets for memory workload management does seem like a > promising idea, and if it would avoid the need for this kind of > per-page tracking entirely, then that would probably be even better. I think you can roughly break it down to - I want one group of users not to be able to screw another group of users or the box but don't care about anything else. The basic beancounter stuff handles this. Generally they also want maximal sharing. - I want to charge people for portions of machine use (mostly accounting and some fairness) - I don't want any user to be able to hog the system to the harm of the others but don't care about overcommit of idle resources (think about the 5000 apaches on a box case) - I want to be able to divide resources reasonably accurately all the time between groups of users Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin 2006-09-20 16:26 ` Christoph Lameter @ 2006-09-20 16:27 ` Rohit Seth [not found] ` <1158751720.8970.67.camel@twins> 2 siblings, 0 replies; 25+ messages in thread From: Rohit Seth @ 2006-09-20 16:27 UTC (permalink / raw) To: Nick Piggin; +Cc: CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 2006-09-20 at 15:39 +1000, Nick Piggin wrote: > Anyway I don't think I have much to say other than: this is almost > exactly as I had imagined the memory resource tracking should look > like. Just a small number of hooks and a very simple set of rules for > tracking allocations. Also, the possibility to track kernel > allocations as a whole rather than at individual callsites (which > shouldn't be too difficult to implement). > I've started looking in that direction. First shot could just be tracking kernel memory consumption w/o worrying about whether it is slab or PT etc. Hopefully next patchset will have that support integrated. > If anything I would perhaps even argue for further cutting down the > number of hooks and add them back as they prove to be needed. > I think the current set of changes (and tracking of different components) is necessary for memory handler to do the right thing. Plus it is possible that user land management tools can also make use of this information. > I'm not sure about containers & workload management people, but from > a core mm/ perspective I see no reason why this couldn't get in, > given review and testing. Great! > That is great to know. Thanks. Hopefully it is getting enough coverage to get there. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1158751720.8970.67.camel@twins>]
[parent not found: <4511626B.9000106@yahoo.com.au>]
[parent not found: <1158767787.3278.103.camel@taijtu>]
* Re: [patch00/05]: Containers(V2)- Introduction [not found] ` <1158767787.3278.103.camel@taijtu> @ 2006-09-20 17:00 ` Nick Piggin 2006-09-20 17:23 ` [ckrm-tech] " Paul Menage ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Nick Piggin @ 2006-09-20 17:00 UTC (permalink / raw) To: Peter Zijlstra Cc: rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter (this time to the lists as well) Peter Zijlstra wrote: > I'd much rather containterize the whole reclaim code, which should not > be too hard since he already adds a container pointer to struct page. Yes, and I tend to agree with you. I probably wasn't clear, but I was mainly talking about just the memory resource tracking part of this patchset. I am less willing to make a judgement about reclaim, because I don't know very much about the workloads or the guarantees they attempt to provide. > Esp. when we get some of my page reclaim abstractions merged, moving the > reclaim from struct zone to a container is not a lot of work. (this is > basically what one of the ckrm mm policies did too) I do agree that it would be nicer to not have a completely different scheme for doing their own page reclaim, but rather use the existing code (*provided* that it is designed in the same, minimally intrusive manner as the page tracking). I can understand how it is attractive to create a new subsystem to solve a particular problem, but once it is in the kernel it has to be maintained regardless, so if it can be done in a way that shares more of the current infrastructure (nicely) then that would be a better solution. I like that they're investigating the use of memory nodes for this. It seems like the logical starting place. > I still have to reread what Rohit does for file backed pages, that gave > my head a spin. > I've been thinking a bit on that problem, and it would be possible to > share all address_space pages equally between attached containers, this > would lose some accuracy, since one container could read 10% of the file > and another 90%, but I don't think that is a common scenario. Yeah, I'm not sure about that. I don't think really complex schemes are needed... but again I might need more knowledge of their workloads and problems. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:00 ` Nick Piggin @ 2006-09-20 17:23 ` Paul Menage 2006-09-20 17:36 ` Alan Cox 2006-09-20 17:50 ` Rohit Seth 2 siblings, 0 replies; 25+ messages in thread From: Paul Menage @ 2006-09-20 17:23 UTC (permalink / raw) To: Nick Piggin Cc: Peter Zijlstra, CKRM-Tech, linux-kernel, Linux Memory Management, rohitseth, devel, Christoph Lameter On 9/20/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Yeah, I'm not sure about that. I don't think really complex schemes > are needed... but again I might need more knowledge of their workloads > and problems. > The basic need for separating files into containers distinct from the tasks that are using them arises when you have several "jobs" all working with the same large data set. (Possibly read-only data files, or possibly one job is updating a dataset that's being used by other jobs). For automated job-tracking and scheduling, it's important to be able to distinguish shared usage from individual usage (e.g. to be able to answer questions "if I kill job X, how much memory do I get back?" and "how do I recover 1G of memory on this machine") As an example, assume two jobs each with 100M of anonymous memory both mapping the same 1G file, for a total usage of 1.2G. Any setup that doesn't let you distinguish shared and private usage makes it hard to answer that kind of scheduling questions. E.g.: - first user gets charged for the page -> first job reported as 1.1G, and the second as 0.1G. - page charges get shared between all users of the page -> two tasks using 0.6G each. - all users get charged for the page -> two tasks using 1.1G each. But in fact killing either one of these jobs individually would only free up 100M By explicitly letting userspace see that there are two jobs each with a private usage of 100M, and they're sharing a dataset of 1G, it's possible to make more informed decisions. The issue of telling the kernel exactly which files/directories need to be accounted separately can be handled by userspace. It could be done by per-page accounting, or by constraining particular files to particular memory zones, or by just tracking/limiting the number of pages from each address_space in the pagecache, but I think that it's important that the kernel at least provide the primitive support for this. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:00 ` Nick Piggin 2006-09-20 17:23 ` [ckrm-tech] " Paul Menage @ 2006-09-20 17:36 ` Alan Cox 2006-09-20 17:30 ` Nick Piggin 2006-09-20 17:50 ` Rohit Seth 2 siblings, 1 reply; 25+ messages in thread From: Alan Cox @ 2006-09-20 17:36 UTC (permalink / raw) To: Nick Piggin Cc: Peter Zijlstra, rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter Ar Iau, 2006-09-21 am 03:00 +1000, ysgrifennodd Nick Piggin: > > I've been thinking a bit on that problem, and it would be possible to > > share all address_space pages equally between attached containers, this > > would lose some accuracy, since one container could read 10% of the file > > and another 90%, but I don't think that is a common scenario. > > > Yeah, I'm not sure about that. I don't think really complex schemes > are needed... but again I might need more knowledge of their workloads > and problems. Any scenario which permits "cheating" will be a scenario that happens because people will try and cheat. Alan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:36 ` Alan Cox @ 2006-09-20 17:30 ` Nick Piggin 0 siblings, 0 replies; 25+ messages in thread From: Nick Piggin @ 2006-09-20 17:30 UTC (permalink / raw) To: Alan Cox Cc: Peter Zijlstra, rohitseth, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter Alan Cox wrote: > Ar Iau, 2006-09-21 am 03:00 +1000, ysgrifennodd Nick Piggin: > >> > I've been thinking a bit on that problem, and it would be possible to >> > share all address_space pages equally between attached containers, this >> > would lose some accuracy, since one container could read 10% of the file >> > and another 90%, but I don't think that is a common scenario. >> >> >>Yeah, I'm not sure about that. I don't think really complex schemes >>are needed... but again I might need more knowledge of their workloads >>and problems. > > > Any scenario which permits "cheating" will be a scenario that happens > because people will try and cheat. That's true, and that's one reason why I've advocated the solution implemented by Rohit's patches, that is: just throw in the towel and be happy to count just pages. Look at the beancounter stuff, and it has hooks (in the form of gfp flags) throughput the tree, and they still manage to miss accounting user exploitable memory overallocation from some callers. Maintaining that will be much more difficult and error prone. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:00 ` Nick Piggin 2006-09-20 17:23 ` [ckrm-tech] " Paul Menage 2006-09-20 17:36 ` Alan Cox @ 2006-09-20 17:50 ` Rohit Seth 2006-09-20 17:52 ` Christoph Lameter 2006-09-20 18:37 ` Peter Zijlstra 2 siblings, 2 replies; 25+ messages in thread From: Rohit Seth @ 2006-09-20 17:50 UTC (permalink / raw) To: Nick Piggin Cc: Peter Zijlstra, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote: > (this time to the lists as well) > > Peter Zijlstra wrote: > > > I'd much rather containterize the whole reclaim code, which should not > > be too hard since he already adds a container pointer to struct page. > > Right now the memory handler in this container subsystem is written in such a way that when existing kernel reclaimer kicks in, it will first operate on those (container with pages over the limit) pages first. But in general I like the notion of containerizing the whole reclaim code. > > I still have to reread what Rohit does for file backed pages, that gave > > my head a spin. Please let me know if there is any specific part that isn't making much sense. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:50 ` Rohit Seth @ 2006-09-20 17:52 ` Christoph Lameter 2006-09-20 18:06 ` Peter Zijlstra 2006-09-20 18:37 ` Peter Zijlstra 1 sibling, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2006-09-20 17:52 UTC (permalink / raw) To: Rohit Seth Cc: Nick Piggin, Peter Zijlstra, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 20 Sep 2006, Rohit Seth wrote: > Right now the memory handler in this container subsystem is written in > such a way that when existing kernel reclaimer kicks in, it will first > operate on those (container with pages over the limit) pages first. But > in general I like the notion of containerizing the whole reclaim code. Which comes naturally with cpusets. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:52 ` Christoph Lameter @ 2006-09-20 18:06 ` Peter Zijlstra 2006-09-20 18:14 ` Rohit Seth ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Peter Zijlstra @ 2006-09-20 18:06 UTC (permalink / raw) To: Christoph Lameter Cc: Rohit Seth, Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote: > On Wed, 20 Sep 2006, Rohit Seth wrote: > > > Right now the memory handler in this container subsystem is written in > > such a way that when existing kernel reclaimer kicks in, it will first > > operate on those (container with pages over the limit) pages first. But > > in general I like the notion of containerizing the whole reclaim code. > > Which comes naturally with cpusets. How are shared mappings dealt with, are pages charged to the set that first faults them in? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:06 ` Peter Zijlstra @ 2006-09-20 18:14 ` Rohit Seth 2006-09-20 18:27 ` Peter Zijlstra 2006-09-20 19:48 ` Paul Jackson 2006-09-20 19:48 ` Christoph Lameter 2 siblings, 1 reply; 25+ messages in thread From: Rohit Seth @ 2006-09-20 18:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 2006-09-20 at 20:06 +0200, Peter Zijlstra wrote: > On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote: > > On Wed, 20 Sep 2006, Rohit Seth wrote: > > > > > Right now the memory handler in this container subsystem is written in > > > such a way that when existing kernel reclaimer kicks in, it will first > > > operate on those (container with pages over the limit) pages first. But > > > in general I like the notion of containerizing the whole reclaim code. > > > > Which comes naturally with cpusets. > > How are shared mappings dealt with, are pages charged to the set that > first faults them in? > For anonymous pages (simpler case), they get charged to the faulting task's container. For filesystem pages (could be shared across tasks running different containers): Every time a new file mapping is created, it is bound to a container of the process creating that mapping. All subsequent pages belonging to this mapping will belong to this container, irrespective of different tasks running in different containers accessing these pages. Currently, I've not implemented a mechanism to allow a file to be specifically moved into or out of container. But when that gets implemented then all pages belonging to a mapping will also move out of container (or into a new container). -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:14 ` Rohit Seth @ 2006-09-20 18:27 ` Peter Zijlstra 2006-09-20 18:33 ` [ckrm-tech] " Paul Menage 2006-09-20 18:38 ` Rohit Seth 0 siblings, 2 replies; 25+ messages in thread From: Peter Zijlstra @ 2006-09-20 18:27 UTC (permalink / raw) To: rohitseth Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 2006-09-20 at 11:14 -0700, Rohit Seth wrote: > On Wed, 2006-09-20 at 20:06 +0200, Peter Zijlstra wrote: > > On Wed, 2006-09-20 at 10:52 -0700, Christoph Lameter wrote: > > > On Wed, 20 Sep 2006, Rohit Seth wrote: > > > > > > > Right now the memory handler in this container subsystem is written in > > > > such a way that when existing kernel reclaimer kicks in, it will first > > > > operate on those (container with pages over the limit) pages first. But > > > > in general I like the notion of containerizing the whole reclaim code. > > > > > > Which comes naturally with cpusets. > > > > How are shared mappings dealt with, are pages charged to the set that > > first faults them in? > > > > For anonymous pages (simpler case), they get charged to the faulting > task's container. > > For filesystem pages (could be shared across tasks running different > containers): Every time a new file mapping is created, it is bound to a > container of the process creating that mapping. All subsequent pages > belonging to this mapping will belong to this container, irrespective of > different tasks running in different containers accessing these pages. > Currently, I've not implemented a mechanism to allow a file to be > specifically moved into or out of container. But when that gets > implemented then all pages belonging to a mapping will also move out of > container (or into a new container). Yes, I read that in your patches, I was wondering how the cpuset approach would handle this. Neither are really satisfactory for shared mappings. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:27 ` Peter Zijlstra @ 2006-09-20 18:33 ` Paul Menage 2006-09-20 18:38 ` Rohit Seth 1 sibling, 0 replies; 25+ messages in thread From: Paul Menage @ 2006-09-20 18:33 UTC (permalink / raw) To: Peter Zijlstra Cc: rohitseth, Nick Piggin, CKRM-Tech, linux-kernel, Linux Memory Management, devel, Christoph Lameter On 9/20/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > Yes, I read that in your patches, I was wondering how the cpuset > approach would handle this. The VM currently has support for letting vmas define their own memory policies - so specifying that a file-backed vma gets its memory from a particular set of memory nodes would accomplish that for the fake-node approach. The mechanism for setting up the per-file/per-vma policies would probably involve something originating in struct inode or struct address_space. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:27 ` Peter Zijlstra 2006-09-20 18:33 ` [ckrm-tech] " Paul Menage @ 2006-09-20 18:38 ` Rohit Seth 1 sibling, 0 replies; 25+ messages in thread From: Rohit Seth @ 2006-09-20 18:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Christoph Lameter, Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 2006-09-20 at 20:27 +0200, Peter Zijlstra wrote: > Yes, I read that in your patches, I was wondering how the cpuset > approach would handle this. > > Neither are really satisfactory for shared mappings. > In which way? We could have the per container flag indicating whether to charge this container for shared mapping that it initiates or to the container where mapping belongs...or is there something different that you are referring. -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:06 ` Peter Zijlstra 2006-09-20 18:14 ` Rohit Seth @ 2006-09-20 19:48 ` Paul Jackson 2006-09-20 19:48 ` Christoph Lameter 2 siblings, 0 replies; 25+ messages in thread From: Paul Jackson @ 2006-09-20 19:48 UTC (permalink / raw) To: Peter Zijlstra Cc: clameter, rohitseth, nickpiggin, ckrm-tech, devel, linux-kernel, linux-mm Peter wrote: > > Which comes naturally with cpusets. > > How are shared mappings dealt with, are pages charged to the set that > first faults them in? Cpusets does not attempt to manage how much memory a task can allocate, but where it can allocate it. If a task can find an existing page to share, and avoid the allocation, then it entirely avoids dealing with cpusets in that case. Cpusets pays no attention to how often a page is shared. It controls which tasks can allocate a given free page, based on the node on which that page resides. If that node is allowed in a tasks 'nodemask_t mems_allowed' (a task struct field), then the task can allocate that page, so far as cpusets is concerned. Cpusets does not care who links to a page, once it is allocated. Every page is assigned to one specific node, and may only be allocated by tasks allowed to allocate from that node. These cpusets can overlap - which so far as memory goes, roughly means that the various mems_allowed nodemask_t's of different tasks can overlap. Here's an oddball example configuration that might make this easier to think about. Let's say we have a modest sized NUMA system with an extra bank of memory added, in addition to the per-node memory. Let's say the extra bank is a huge pile of cheaper (slower) memory, off a slower bus. Normal sized tasks running on one or more of the NUMA nodes just get to fight for the CPUs and memory on those nodes allowed them. Let's say an occassional big memory job is to be allowed to use some of the extra cheap memory, and we use the idea of Andrew and others to split that memory into fake nodes to manage the portion of memory available to specified tasks. Then one of these big jobs could be in a cpuset that let it use one or more of the CPUs and memory on the node it ran on, plus some number of the fake nodes on the extra cheap memory. Other jobs could be allowed, using cpusets, to use any combination of the same or overlapping CPUs or nodes, and/or other disjoint CPUs or nodes, fake or real. Another example, restating some of the above. If say some application happened to fault in a libc.so page, it would be required to place that page on one of the nodes allowed to it. If an other application comes along later and ends up wanting shared references to that same page, it could certainly do so, regardless of its cpuset settings. It would not be allocating a new page for this, so would not encounter the cpuset constraints on where it could allocate such a page. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:06 ` Peter Zijlstra 2006-09-20 18:14 ` Rohit Seth 2006-09-20 19:48 ` Paul Jackson @ 2006-09-20 19:48 ` Christoph Lameter 2006-09-20 19:51 ` [ckrm-tech] " Paul Menage 2 siblings, 1 reply; 25+ messages in thread From: Christoph Lameter @ 2006-09-20 19:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Rohit Seth, Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management On Wed, 20 Sep 2006, Peter Zijlstra wrote: > > Which comes naturally with cpusets. > > How are shared mappings dealt with, are pages charged to the set that > first faults them in? They are charged to the node from which they were allocated. If the process is restricted to the node (container) then all pages allocated are are charged to the container regardless if they are shared or not. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction 2006-09-20 19:48 ` Christoph Lameter @ 2006-09-20 19:51 ` Paul Menage 0 siblings, 0 replies; 25+ messages in thread From: Paul Menage @ 2006-09-20 19:51 UTC (permalink / raw) To: Christoph Lameter Cc: Peter Zijlstra, Nick Piggin, CKRM-Tech, linux-kernel, Linux Memory Management, Rohit Seth, devel On 9/20/06, Christoph Lameter <clameter@sgi.com> wrote: > On Wed, 20 Sep 2006, Peter Zijlstra wrote: > > > > Which comes naturally with cpusets. > > > > How are shared mappings dealt with, are pages charged to the set that > > first faults them in? > > They are charged to the node from which they were allocated. If the > process is restricted to the node (container) then all pages allocated > are are charged to the container regardless if they are shared or not. > Or you could use the per-vma mempolicy support to bind a large data file to a particular node, and track shared file usage that way. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 17:50 ` Rohit Seth 2006-09-20 17:52 ` Christoph Lameter @ 2006-09-20 18:37 ` Peter Zijlstra 2006-09-20 18:57 ` Rohit Seth 1 sibling, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2006-09-20 18:37 UTC (permalink / raw) To: rohitseth Cc: Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter On Wed, 2006-09-20 at 10:50 -0700, Rohit Seth wrote: > On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote: > > (this time to the lists as well) > > > > Peter Zijlstra wrote: > > > > > I'd much rather containterize the whole reclaim code, which should not > > > be too hard since he already adds a container pointer to struct page. > > > > > > Right now the memory handler in this container subsystem is written in > such a way that when existing kernel reclaimer kicks in, it will first > operate on those (container with pages over the limit) pages first. But > in general I like the notion of containerizing the whole reclaim code. Patch 5/5 seems to have a horrid deactivation scheme. > > > I still have to reread what Rohit does for file backed pages, that gave > > > my head a spin. > > Please let me know if there is any specific part that isn't making much > sense. Well, the whole over the limit handler is quite painfull, having taken a second reading it isn't all that complex after all, just odd. You just start invalidating whole files for file backed pages. Granted, this will get you below the threshold. but you might just have destroyed your working set. Pretty much the same for you anonymous memory handler, you scan through the pages in linear fashion and demote the first that you encounter. Both things pretty thoroughly destroy the existing kernel reclaim. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch00/05]: Containers(V2)- Introduction 2006-09-20 18:37 ` Peter Zijlstra @ 2006-09-20 18:57 ` Rohit Seth 0 siblings, 0 replies; 25+ messages in thread From: Rohit Seth @ 2006-09-20 18:57 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, CKRM-Tech, devel, linux-kernel, Linux Memory Management, Christoph Lameter On Wed, 2006-09-20 at 20:37 +0200, Peter Zijlstra wrote: > On Wed, 2006-09-20 at 10:50 -0700, Rohit Seth wrote: > > On Thu, 2006-09-21 at 03:00 +1000, Nick Piggin wrote: > > > (this time to the lists as well) > > > > > > Peter Zijlstra wrote: > > > > > > > I'd much rather containterize the whole reclaim code, which should not > > > > be too hard since he already adds a container pointer to struct page. > > > > > > > > > > Right now the memory handler in this container subsystem is written in > > such a way that when existing kernel reclaimer kicks in, it will first > > operate on those (container with pages over the limit) pages first. But > > in general I like the notion of containerizing the whole reclaim code. > > Patch 5/5 seems to have a horrid deactivation scheme. > > > > > I still have to reread what Rohit does for file backed pages, that gave > > > > my head a spin. > > > > Please let me know if there is any specific part that isn't making much > > sense. > > Well, the whole over the limit handler is quite painfull, having taken a > second reading it isn't all that complex after all, just odd. > It is very basic right now. > You just start invalidating whole files for file backed pages. Granted, > this will get you below the threshold. but you might just have destroyed > your working set. > When a container gone over the limit then it is okay to penalize it. I agree that I'm not making an attempt to maintain the current working set. Any suggestions that I can incorporate to improve this algorithm will be very appreciated. > Pretty much the same for you anonymous memory handler, you scan through > the pages in linear fashion and demote the first that you encounter. > > Both things pretty thoroughly destroy the existing kernel reclaim. > I agree that with in a container I need to do add more smarts to (for example) not do a linear search. Simple additions like last task or last mapping visited could be useful. And I definitely want to improve on that. Though it should not destroy the existing kernel reclaim. Pages belonging to over the limit container should be the first ones to either get flushed out to FS or swapped if necessary. (Means that is the cost that you will have to pay if you, for example, want to container your tar to 100MB memory foot print). -rohit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2006-09-20 19:51 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1158718568.29000.44.camel@galaxy.corp.google.com>
2006-09-20 5:39 ` [patch00/05]: Containers(V2)- Introduction Nick Piggin
2006-09-20 16:26 ` Christoph Lameter
2006-09-20 16:56 ` Nick Piggin
2006-09-20 17:08 ` Christoph Lameter
2006-09-20 17:19 ` Nick Piggin
2006-09-20 17:30 ` Christoph Lameter
2006-09-20 18:03 ` Nick Piggin
2006-09-20 17:40 ` Alan Cox
2006-09-20 16:27 ` Rohit Seth
[not found] ` <1158751720.8970.67.camel@twins>
[not found] ` <4511626B.9000106@yahoo.com.au>
[not found] ` <1158767787.3278.103.camel@taijtu>
2006-09-20 17:00 ` Nick Piggin
2006-09-20 17:23 ` [ckrm-tech] " Paul Menage
2006-09-20 17:36 ` Alan Cox
2006-09-20 17:30 ` Nick Piggin
2006-09-20 17:50 ` Rohit Seth
2006-09-20 17:52 ` Christoph Lameter
2006-09-20 18:06 ` Peter Zijlstra
2006-09-20 18:14 ` Rohit Seth
2006-09-20 18:27 ` Peter Zijlstra
2006-09-20 18:33 ` [ckrm-tech] " Paul Menage
2006-09-20 18:38 ` Rohit Seth
2006-09-20 19:48 ` Paul Jackson
2006-09-20 19:48 ` Christoph Lameter
2006-09-20 19:51 ` [ckrm-tech] " Paul Menage
2006-09-20 18:37 ` Peter Zijlstra
2006-09-20 18:57 ` Rohit Seth
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox