* Re: [ckrm-tech] RFC: Memory Controller [not found] ` <454638D2.7050306@in.ibm.com> @ 2006-10-30 18:07 ` Balbir Singh 2006-10-31 8:57 ` Pavel Emelianov [not found] ` <45470DF4.70405@openvz.org> 1 sibling, 1 reply; 16+ messages in thread From: Balbir Singh @ 2006-10-30 18:07 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm Balbir Singh wrote: [snip] > >> I see that everyone agree that we want to see three resources: >> 1. kernel memory >> 2. unreclaimable memory >> 3. reclaimable memory >> if this is right then let's save it somewhere >> (e.g. http://wiki.openvz.org/Containers/UBC_discussion) >> and go on discussing the next question - interface. > > I understand that kernel memory accounting is the first priority for > containers, but accounting kernel memory requires too many changes > to the VM core, hence I was hesitant to put it up as first priority. > > But in general I agree, these are the three important resources for > accounting and control I missed out to mention, I hope you were including the page cache in your definition of reclaimable memory. > > [snip] > -- Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-30 18:07 ` [ckrm-tech] RFC: Memory Controller Balbir Singh @ 2006-10-31 8:57 ` Pavel Emelianov 2006-10-31 9:19 ` Balbir Singh 0 siblings, 1 reply; 16+ messages in thread From: Pavel Emelianov @ 2006-10-31 8:57 UTC (permalink / raw) To: balbir Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm [snip] >> But in general I agree, these are the three important resources for >> accounting and control > > I missed out to mention, I hope you were including the page cache in > your definition of reclaimable memory. As far as page cache is concerned my opinion is the following. (If I misunderstood you, please correct me.) Page cache is designed to keep in memory as much pages as possible to optimize performance. If we start limiting the page cache usage we cut the performance. What is to be controlled is _used_ resources (touched pages, opened file descriptors, mapped areas, etc), but not the cached ones. I see nothing bad if the page that belongs to a file, but is not used by ANY task in BC, stays in memory. I think this is normal. If kernel wants it may push this page out easily it won't event need to try_to_unmap() it. So cached pages must not be accounted. I've also noticed that you've [snip]-ed on one of my questions. > How would you allocate memory on NUMA in advance? Please, clarify this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 8:57 ` Pavel Emelianov @ 2006-10-31 9:19 ` Balbir Singh 2006-10-31 9:25 ` Pavel Emelianov 2006-10-31 9:42 ` Andrew Morton 0 siblings, 2 replies; 16+ messages in thread From: Balbir Singh @ 2006-10-31 9:19 UTC (permalink / raw) To: Pavel Emelianov Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S Pavel Emelianov wrote: > [snip] > >>> But in general I agree, these are the three important resources for >>> accounting and control >> I missed out to mention, I hope you were including the page cache in >> your definition of reclaimable memory. > > As far as page cache is concerned my opinion is the following. > (If I misunderstood you, please correct me.) > > Page cache is designed to keep in memory as much pages as > possible to optimize performance. If we start limiting the page > cache usage we cut the performance. What is to be controlled is > _used_ resources (touched pages, opened file descriptors, mapped > areas, etc), but not the cached ones. I see nothing bad if the > page that belongs to a file, but is not used by ANY task in BC, > stays in memory. I think this is normal. If kernel wants it may > push this page out easily it won't event need to try_to_unmap() > it. So cached pages must not be accounted. > The idea behind limiting the page cache is this 1. Lets say one container fills up the page cache. 2. The other containers will not be able to allocate memory (even though they are within their limits) without the overhead of having to flush the page cache and freeing up occupied cache. The kernel will have to pageout() the dirty pages in the page cache. Since it is easy to push the page out (as you said), it should be easy to impose a limit on the page cache usage of a container. > > I've also noticed that you've [snip]-ed on one of my questions. > > > How would you allocate memory on NUMA in advance? > > Please, clarify this. I am not quite sure I understand the question. Could you please rephrase it and highlight some of the difficulty? -- Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 9:19 ` Balbir Singh @ 2006-10-31 9:25 ` Pavel Emelianov 2006-10-31 10:10 ` Balbir Singh 2006-10-31 9:42 ` Andrew Morton 1 sibling, 1 reply; 16+ messages in thread From: Pavel Emelianov @ 2006-10-31 9:25 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S Balbir Singh wrote: > Pavel Emelianov wrote: >> [snip] >> >>>> But in general I agree, these are the three important resources for >>>> accounting and control >>> I missed out to mention, I hope you were including the page cache in >>> your definition of reclaimable memory. >> As far as page cache is concerned my opinion is the following. >> (If I misunderstood you, please correct me.) >> >> Page cache is designed to keep in memory as much pages as >> possible to optimize performance. If we start limiting the page >> cache usage we cut the performance. What is to be controlled is >> _used_ resources (touched pages, opened file descriptors, mapped >> areas, etc), but not the cached ones. I see nothing bad if the >> page that belongs to a file, but is not used by ANY task in BC, >> stays in memory. I think this is normal. If kernel wants it may >> push this page out easily it won't event need to try_to_unmap() >> it. So cached pages must not be accounted. >> > > The idea behind limiting the page cache is this > > 1. Lets say one container fills up the page cache. > 2. The other containers will not be able to allocate memory (even > though they are within their limits) without the overhead of having > to flush the page cache and freeing up occupied cache. The kernel > will have to pageout() the dirty pages in the page cache. > > Since it is easy to push the page out (as you said), it should be > easy to impose a limit on the page cache usage of a container. If a group is limited with memory _consumption_ it won't fill the page cache... >> I've also noticed that you've [snip]-ed on one of my questions. >> >> > How would you allocate memory on NUMA in advance? >> >> Please, clarify this. > > I am not quite sure I understand the question. Could you please rephrase > it and highlight some of the difficulty? I'd like to provide a guarantee for a newly created group. According to your idea I have to preallocate some pages in advance. OK. How to select a NUMA node to allocate them from? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 9:25 ` Pavel Emelianov @ 2006-10-31 10:10 ` Balbir Singh 2006-10-31 10:19 ` Pavel Emelianov 0 siblings, 1 reply; 16+ messages in thread From: Balbir Singh @ 2006-10-31 10:10 UTC (permalink / raw) To: Pavel Emelianov Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S Pavel Emelianov wrote: > Balbir Singh wrote: >> Pavel Emelianov wrote: >>> [snip] >>> >>>>> But in general I agree, these are the three important resources for >>>>> accounting and control >>>> I missed out to mention, I hope you were including the page cache in >>>> your definition of reclaimable memory. >>> As far as page cache is concerned my opinion is the following. >>> (If I misunderstood you, please correct me.) >>> >>> Page cache is designed to keep in memory as much pages as >>> possible to optimize performance. If we start limiting the page >>> cache usage we cut the performance. What is to be controlled is >>> _used_ resources (touched pages, opened file descriptors, mapped >>> areas, etc), but not the cached ones. I see nothing bad if the >>> page that belongs to a file, but is not used by ANY task in BC, >>> stays in memory. I think this is normal. If kernel wants it may >>> push this page out easily it won't event need to try_to_unmap() >>> it. So cached pages must not be accounted. >>> >> The idea behind limiting the page cache is this >> >> 1. Lets say one container fills up the page cache. >> 2. The other containers will not be able to allocate memory (even >> though they are within their limits) without the overhead of having >> to flush the page cache and freeing up occupied cache. The kernel >> will have to pageout() the dirty pages in the page cache. >> >> Since it is easy to push the page out (as you said), it should be >> easy to impose a limit on the page cache usage of a container. > > If a group is limited with memory _consumption_ it won't fill > the page cache... > So you mean the memory _consumption_ limit is already controlling the page cache? That's what we need the ability for a container not to fill up the page cache :) I don't remember correctly, but do you account for dirty page cache usage in the latest patches of BC? >>> I've also noticed that you've [snip]-ed on one of my questions. >>> >>> > How would you allocate memory on NUMA in advance? >>> >>> Please, clarify this. >> I am not quite sure I understand the question. Could you please rephrase >> it and highlight some of the difficulty? > > I'd like to provide a guarantee for a newly created group. According > to your idea I have to preallocate some pages in advance. OK. How to > select a NUMA node to allocate them from? The idea of pre-allocation was discussed as a possibility in the case that somebody needed hard guarantees, but most of us don't need it. I was in the RFC for the sake of completeness. Coming back to your question Why do you need to select a NUMA node? For performance? -- Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 10:10 ` Balbir Singh @ 2006-10-31 10:19 ` Pavel Emelianov 0 siblings, 0 replies; 16+ messages in thread From: Pavel Emelianov @ 2006-10-31 10:19 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S [snip] >>> Since it is easy to push the page out (as you said), it should be >>> easy to impose a limit on the page cache usage of a container. >> If a group is limited with memory _consumption_ it won't fill >> the page cache... >> > > So you mean the memory _consumption_ limit is already controlling > the page cache? That's what we need the ability for a container > not to fill up the page cache :) I mean page cache limiting is not needed. We need to make sure group eats less that N physical pages. That can be achieved by controlling page faults, setup_arg_pages(), etc. Page cache is not to be touched. > I don't remember correctly, but do you account for dirty page cache usage in > the latest patches of BC? We do not account for page cache itself. We track only physical pages regardless of where they are. [snip] > The idea of pre-allocation was discussed as a possibility in the case > that somebody needed hard guarantees, but most of us don't need it. > I was in the RFC for the sake of completeness. > > Coming back to your question > > Why do you need to select a NUMA node? For performance? Of course! Otherwise what do we need kmem_cache_alloc_node() etc calls in kernel? The second question is - what if two processes from different beancounters try to share one page. I remember that the current solution is to take the page from the first user's reserve. OK. Consider then that this first user stops using the page. When this happens one page must be put back to it's reserve, right? But where to get this page from? Note that making guarantee through limiting doesn't care about where the page is get from. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 9:19 ` Balbir Singh 2006-10-31 9:25 ` Pavel Emelianov @ 2006-10-31 9:42 ` Andrew Morton 2006-10-31 10:36 ` Balbir Singh 1 sibling, 1 reply; 16+ messages in thread From: Andrew Morton @ 2006-10-31 9:42 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S On Tue, 31 Oct 2006 14:49:12 +0530 Balbir Singh <balbir@in.ibm.com> wrote: > The idea behind limiting the page cache is this > > 1. Lets say one container fills up the page cache. > 2. The other containers will not be able to allocate memory (even > though they are within their limits) without the overhead of having > to flush the page cache and freeing up occupied cache. The kernel > will have to pageout() the dirty pages in the page cache. There's a vast difference between clean pagecache and dirty pagecache in this context. It is terribly imprecise to use the term "pagecache". And it would be a poor implementation which failed to distinguish between clean pagecache and dirty pagecache. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 9:42 ` Andrew Morton @ 2006-10-31 10:36 ` Balbir Singh 0 siblings, 0 replies; 16+ messages in thread From: Balbir Singh @ 2006-10-31 10:36 UTC (permalink / raw) To: Andrew Morton Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S Andrew Morton wrote: > On Tue, 31 Oct 2006 14:49:12 +0530 > Balbir Singh <balbir@in.ibm.com> wrote: > >> The idea behind limiting the page cache is this >> >> 1. Lets say one container fills up the page cache. >> 2. The other containers will not be able to allocate memory (even >> though they are within their limits) without the overhead of having >> to flush the page cache and freeing up occupied cache. The kernel >> will have to pageout() the dirty pages in the page cache. > > There's a vast difference between clean pagecache and dirty pagecache in this > context. It is terribly imprecise to use the term "pagecache". And it would be > a poor implementation which failed to distinguish between clean pagecache and > dirty pagecache. > Yes, I agree, it will be a good idea to distinguish between the two. -- Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <45470DF4.70405@openvz.org>]
* Re: [ckrm-tech] RFC: Memory Controller [not found] ` <45470DF4.70405@openvz.org> @ 2006-10-31 10:54 ` Balbir Singh 2006-10-31 11:15 ` Pavel Emelianov 0 siblings, 1 reply; 16+ messages in thread From: Balbir Singh @ 2006-10-31 10:54 UTC (permalink / raw) To: Pavel Emelianov Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm Pavel Emelianov wrote: > Balbir Singh wrote: >> Pavel Emelianov wrote: >>> [snip] >>> >>>> Reclaimable memory >>>> >>>> (i) Anonymous pages - Anonymous pages are pages allocated by the user space, >>>> they are mapped into the user page tables, but not backed by a file. >>> I do not agree with such classification. >>> When one maps file then kernel can remove page from address >>> space as there is already space on disk for it. When one >>> maps an anonymous page then kernel won't remove this page >>> for sure as system may simply be configured to be swapless. >> Yes, I agree if there is no swap space, then anonymous memory is pinned. >> Assuming that we'll end up using a an abstraction on top of the >> existing reclaim mechanism, the mechanism would know if a particular >> type of memory is reclaimable or not. > > If memory is considered to be unreclaimable then actions should be > taken at mmap() time, not later! Rejecting mmap() is the only way to > limit user in unreclaimable memory consumption. That's like disabling memory over-commit in the regular kernel. Don't you think this should again be based on the systems configuration of over-commit? [snip] > >> I understand that kernel memory accounting is the first priority for >> containers, but accounting kernel memory requires too many changes >> to the VM core, hence I was hesitant to put it up as first priority. > > Among all the kernel-code-intrusive patches in BC patch set > kmemsize hooks are the most "conservative" - only one place > is heavily patched - this is slab allocator. Buddy is patched, > but _significantly_ smaller. The rest of the patch adds __GFP_BC > flags to some allocations and SLAB_BC to some kmem_caches. > > User memory controlling patch is much heavier... > Please see the patching of Rohit's memory controller for user level patching. It seems much simpler. > I'd set priorities of development that way: > > 1. core infrastructure (mainly headers) > 2. interface > 3. kernel memory hooks and accounting > 4. mappings hooks and accounting > 5. physical pages hooks and accounting > 6. user pages reclamation > 7. moving threads between beancounters > 8. make beancounter persistent I would prefer a different set 1 & 2, for now we could use any interface and then start developing the controller. As we develop the new controller, we are likely to find the need to add/enhance the interface, so freezing in on 1 & 2 might not be a good idea. I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's memory controller. Then take up the rest. -- Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 10:54 ` Balbir Singh @ 2006-10-31 11:15 ` Pavel Emelianov 2006-10-31 12:39 ` Balbir Singh ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Pavel Emelianov @ 2006-10-31 11:15 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm [snip] > That's like disabling memory over-commit in the regular kernel. Nope. We limit only unreclaimable mappings. Allowing user to break limits breaks the sense of limit. Or you do not agree that allowing unlimited unreclaimable mappings doesn't alow you the way to cut groups gracefully? [snip] > Please see the patching of Rohit's memory controller for user > level patching. It seems much simpler. Could you send me an URL where to get the patch from, please. Or the patch itself directly to me. Thank you. [snip] > I would prefer a different set > > 1 & 2, for now we could use any interface and then start developing the > controller. As we develop the new controller, we are likely to find the > need to add/enhance the interface, so freezing in on 1 & 2 might not be > a good idea. Paul Menage won't agree. He believes that interface must come first. I also remind you that the latest beancounter patch provides all the stuff we're discussing. It may move tasks, limit all three resources discussed, reclaim memory and so on. And configfs interface could be attached easily. > I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's > memory controller. > > Then take up the rest. I'll review Rohit's patches and comment. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 11:15 ` Pavel Emelianov @ 2006-10-31 12:39 ` Balbir Singh 2006-10-31 14:19 ` Pavel Emelianov 2006-10-31 16:54 ` Paul Menage 2006-11-01 6:00 ` David Rientjes 2 siblings, 1 reply; 16+ messages in thread From: Balbir Singh @ 2006-10-31 12:39 UTC (permalink / raw) To: Pavel Emelianov Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm Pavel Emelianov wrote: >> That's like disabling memory over-commit in the regular kernel. > > Nope. We limit only unreclaimable mappings. Allowing user > to break limits breaks the sense of limit. > > Or you do not agree that allowing unlimited unreclaimable > mappings doesn't alow you the way to cut groups gracefully? > A quick code review showed that most of the accounting is the same. I see that most of the mmap accounting code, it seems to do the equivalent of security_vm_enough_memory() when VM_ACCOUNT is set. May be we could merge the accounting code to handle even containers. I looked at do_mmap_pgoff acct_stack_growth __do_brk ( do_mremap > [snip] > >> Please see the patching of Rohit's memory controller for user >> level patching. It seems much simpler. > > Could you send me an URL where to get the patch from, please. > Or the patch itself directly to me. Thank you. Please see http://lkml.org/lkml/2006/9/19/283 > > [snip] > >> I would prefer a different set >> >> 1 & 2, for now we could use any interface and then start developing the >> controller. As we develop the new controller, we are likely to find the >> need to add/enhance the interface, so freezing in on 1 & 2 might not be >> a good idea. > > Paul Menage won't agree. He believes that interface must come first. > I also remind you that the latest beancounter patch provides all the > stuff we're discussing. It may move tasks, limit all three resources > discussed, reclaim memory and so on. And configfs interface could be > attached easily. > I think the interface should depend on the controllers and not the other way around. I fear that the infrastructure discussion might hold us back and no fruitful work will happen on the controllers. Once we add and agree on the controller, we can then look at the interface requirements (like persistence if kernel memory is being tracked, etc). What do you think? >> I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's >> memory controller. >> >> Then take up the rest. > > I'll review Rohit's patches and comment. ok -- Thanks, Balbir Singh, Linux Technology Center, IBM Software Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 12:39 ` Balbir Singh @ 2006-10-31 14:19 ` Pavel Emelianov 0 siblings, 0 replies; 16+ messages in thread From: Pavel Emelianov @ 2006-10-31 14:19 UTC (permalink / raw) To: balbir, menage Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, linux-mm [snip] > A quick code review showed that most of the accounting is the > same. > > I see that most of the mmap accounting code, it seems to do > the equivalent of security_vm_enough_memory() when VM_ACCOUNT > is set. May be we could merge the accounting code to handle > even containers. > > I looked at > > do_mmap_pgoff > acct_stack_growth > __do_brk ( > do_mremap I'm sure this is possible. I'll take this into account in the next patch series. Thank you. >> [snip] >> >>> Please see the patching of Rohit's memory controller for user >>> level patching. It seems much simpler. >> Could you send me an URL where to get the patch from, please. >> Or the patch itself directly to me. Thank you. > > Please see http://lkml.org/lkml/2006/9/19/283 Thanks. I'll review it in a couple of days and comment. [snip] > I think the interface should depend on the controllers and not > the other way around. I fear that the infrastructure discussion might > hold us back and no fruitful work will happen on the controllers. > Once we add and agree on the controller, we can then look at the > interface requirements (like persistence if kernel memory is being > tracked, etc). What do you think? I do agree with you. But we have to make an agreement with Paul in this also... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 11:15 ` Pavel Emelianov 2006-10-31 12:39 ` Balbir Singh @ 2006-10-31 16:54 ` Paul Menage 2006-11-01 6:00 ` David Rientjes 2 siblings, 0 replies; 16+ messages in thread From: Paul Menage @ 2006-10-31 16:54 UTC (permalink / raw) To: Pavel Emelianov Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, linux-mm On 10/31/06, Pavel Emelianov <xemul@openvz.org> wrote: > > Paul Menage won't agree. He believes that interface must come first. No, I'm just trying to get agreement on the generic infrastructure for process containers and extensibility - the actual API to the memory controller (i.e. what limits, what to track, etc) can presumably be fitted into the generic mechanism fairly easily (or else the infrastructure probably isn't generic enough). Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-10-31 11:15 ` Pavel Emelianov 2006-10-31 12:39 ` Balbir Singh 2006-10-31 16:54 ` Paul Menage @ 2006-11-01 6:00 ` David Rientjes 2006-11-01 8:05 ` Pavel Emelianov 2 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2006-11-01 6:00 UTC (permalink / raw) To: Pavel Emelianov Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm On Tue, 31 Oct 2006, Pavel Emelianov wrote: > Paul Menage won't agree. He believes that interface must come first. > I also remind you that the latest beancounter patch provides all the > stuff we're discussing. It may move tasks, limit all three resources > discussed, reclaim memory and so on. And configfs interface could be > attached easily. > There's really two different interfaces: those to the controller and those to the container. While the configfs (or simpler fs implementation solely for our purposes) is the most logical because of its inherent hierarchial nature, it seems like the only criticism on that has come from UBC. From my understanding of beancounter, it could be implemented on top of any such container abstraction anyway. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-11-01 6:00 ` David Rientjes @ 2006-11-01 8:05 ` Pavel Emelianov 2006-11-01 8:35 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Pavel Emelianov @ 2006-11-01 8:05 UTC (permalink / raw) To: David Rientjes Cc: Pavel Emelianov, balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm David Rientjes wrote: > On Tue, 31 Oct 2006, Pavel Emelianov wrote: > >> Paul Menage won't agree. He believes that interface must come first. >> I also remind you that the latest beancounter patch provides all the >> stuff we're discussing. It may move tasks, limit all three resources >> discussed, reclaim memory and so on. And configfs interface could be >> attached easily. >> > > There's really two different interfaces: those to the controller and those > to the container. While the configfs (or simpler fs implementation solely > for our purposes) is the most logical because of its inherent hierarchial > nature, it seems like the only criticism on that has come from UBC. From > my understanding of beancounter, it could be implemented on top of any > such container abstraction anyway. beancounters may be implemented above any (or nearly any) userspace interface, no questions. But we're trying to come to agreement here, so I just say my point of view. I don't mind having file system based interface, I just believe that configfs is not so good for it. I've already answered that having our own filesystem for it sounds better than having configfs. Maybe we can summarize what we have come to? > David > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [ckrm-tech] RFC: Memory Controller 2006-11-01 8:05 ` Pavel Emelianov @ 2006-11-01 8:35 ` David Rientjes 0 siblings, 0 replies; 16+ messages in thread From: David Rientjes @ 2006-11-01 8:35 UTC (permalink / raw) To: Pavel Emelianov Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth, menage, linux-mm On Wed, 1 Nov 2006, Pavel Emelianov wrote: > beancounters may be implemented above any (or nearly any) userspace > interface, no questions. But we're trying to come to agreement here, > so I just say my point of view. > > I don't mind having file system based interface, I just believe that > configfs is not so good for it. I've already answered that having > our own filesystem for it sounds better than having configfs. > > Maybe we can summarize what we have come to? > I've seen nothing but praise for Paul Menage's suggestion of implementing a single-level containers abstraction for processes and attaching these to various resource controller (disk, network, memory, cpu) nodes. The question of whether to use configfs or not is really at the fore-front of that discussion because making any progress in implementation is difficult without first deciding upon it, and the containers abstraction patchset uses configfs as its interface. The original objection against configfs was against the lifetime of the resource controller. But this is actually a two part question since there are two interfaces: one for the containers, one for the controllers. At present it seems like the only discussion taking place is that of the container so this objection can wait. After boot, there are one of two options: - require the user to mount the configfs filesystem with a single system-wide container as default i. include all processes in that container by default ii. include no processes in that container, force the user to add them - create the entire container abstraction upon boot and attach all processes to it in a manner similar to procfs [ In both scenarios, kernel behavior is unchanged if no resource controller node is attached to any container as if the container(s) didn't exist. ] Another objection against configfs was the fact that you must define CONFIG_CONFIGFS_FS to use CONFIG_CONTAINERS. This objection does not make much sense since it seems like we are falling the direction of abandoning the syscall approach here and looking toward an fs approach in the first place. So CONFIG_CONTAINERS will need to include its own lightweight filesystem if we cannot use CONFIG_CONFIGFS_FS, but it seems redundant since this is what configfs is for: a configurable filesystem to interface to the kernel. We definitely do not want two or more interfaces to _containers_ so we are reimplementing an already existing infrastructure. The criticism that users can create containers and then not use them shouldn't be an issue if it is carefully implemented. In fact, I proposed that all processes are initially attached to a single system-wide container at boot regardless if you've loaded any controllers or not just like how UMA machines work with node 0 for system-wide memory. We should incur no overhead for having empty or _full_ containers if we haven't loaded controllers or have configured them properly to include the right containers. So if we re-read Paul Menage's containers abstraction away from cpusets patchset that uses configfs, we can see that we are almost there with the exception of making it a single-layer "hierarchy" as he has already proposed. Resource controller "nodes" that these containers can be attached to are a separate issue at this point and shouldn't be confused. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-11-01 8:35 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20061030103356.GA16833@in.ibm.com>
[not found] ` <4545D51A.1060808@in.ibm.com>
[not found] ` <4546212B.4010603@openvz.org>
[not found] ` <454638D2.7050306@in.ibm.com>
2006-10-30 18:07 ` [ckrm-tech] RFC: Memory Controller Balbir Singh
2006-10-31 8:57 ` Pavel Emelianov
2006-10-31 9:19 ` Balbir Singh
2006-10-31 9:25 ` Pavel Emelianov
2006-10-31 10:10 ` Balbir Singh
2006-10-31 10:19 ` Pavel Emelianov
2006-10-31 9:42 ` Andrew Morton
2006-10-31 10:36 ` Balbir Singh
[not found] ` <45470DF4.70405@openvz.org>
2006-10-31 10:54 ` Balbir Singh
2006-10-31 11:15 ` Pavel Emelianov
2006-10-31 12:39 ` Balbir Singh
2006-10-31 14:19 ` Pavel Emelianov
2006-10-31 16:54 ` Paul Menage
2006-11-01 6:00 ` David Rientjes
2006-11-01 8:05 ` Pavel Emelianov
2006-11-01 8:35 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox