Re: [ckrm-tech] RFC: Memory Controller

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [ckrm-tech] RFC: Memory Controller
       [not found]     ` <454638D2.7050306@in.ibm.com>
@ 2006-10-30 18:07       ` Balbir Singh
  2006-10-31  8:57         ` Pavel Emelianov
       [not found]       ` <45470DF4.70405@openvz.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2006-10-30 18:07 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm

Balbir Singh wrote:
[snip]

>
>> I see that everyone agree that we want to see three resources:
>>   1. kernel memory
>>   2. unreclaimable memory
>>   3. reclaimable memory
>> if this is right then let's save it somewhere
>> (e.g. http://wiki.openvz.org/Containers/UBC_discussion)
>> and go on discussing the next question - interface.
> 
> I understand that kernel memory accounting is the first priority for
> containers, but accounting kernel memory requires too many changes
> to the VM core, hence I was hesitant to put it up as first priority.
> 
> But in general I agree, these are the three important resources for
> accounting and control

I missed out to mention, I hope you were including the page cache in
your definition of reclaimable memory.

> 
> [snip]
> 


-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-30 18:07       ` [ckrm-tech] RFC: Memory Controller Balbir Singh
@ 2006-10-31  8:57         ` Pavel Emelianov
  2006-10-31  9:19           ` Balbir Singh
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Emelianov @ 2006-10-31  8:57 UTC (permalink / raw)
  To: balbir
  Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj,
	matthltc, dipankar, rohitseth, menage, linux-mm

[snip]

>> But in general I agree, these are the three important resources for
>> accounting and control
> 
> I missed out to mention, I hope you were including the page cache in
> your definition of reclaimable memory.

As far as page cache is concerned my opinion is the following.
(If I misunderstood you, please correct me.)

Page cache is designed to keep in memory as much pages as
possible to optimize performance. If we start limiting the page
cache usage we cut the performance. What is to be controlled is
_used_ resources (touched pages, opened file descriptors, mapped
areas, etc), but not the cached ones. I see nothing bad if the
page that belongs to a file, but is not used by ANY task in BC,
stays in memory. I think this is normal. If kernel wants it may
push this page out easily it won't event need to try_to_unmap()
it. So cached pages must not be accounted.

I've also noticed that you've [snip]-ed on one of my questions.

 > How would you allocate memory on NUMA in advance?

Please, clarify this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31  8:57         ` Pavel Emelianov
@ 2006-10-31  9:19           ` Balbir Singh
  2006-10-31  9:25             ` Pavel Emelianov
  2006-10-31  9:42             ` Andrew Morton
  0 siblings, 2 replies; 16+ messages in thread
From: Balbir Singh @ 2006-10-31  9:19 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj,
	matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S

Pavel Emelianov wrote:
> [snip]
> 
>>> But in general I agree, these are the three important resources for
>>> accounting and control
>> I missed out to mention, I hope you were including the page cache in
>> your definition of reclaimable memory.
> 
> As far as page cache is concerned my opinion is the following.
> (If I misunderstood you, please correct me.)
> 
> Page cache is designed to keep in memory as much pages as
> possible to optimize performance. If we start limiting the page
> cache usage we cut the performance. What is to be controlled is
> _used_ resources (touched pages, opened file descriptors, mapped
> areas, etc), but not the cached ones. I see nothing bad if the
> page that belongs to a file, but is not used by ANY task in BC,
> stays in memory. I think this is normal. If kernel wants it may
> push this page out easily it won't event need to try_to_unmap()
> it. So cached pages must not be accounted.
> 

The idea behind limiting the page cache is this

1. Lets say one container fills up the page cache.
2. The other containers will not be able to allocate memory (even
though they are within their limits) without the overhead of having
to flush the page cache and freeing up occupied cache. The kernel
will have to pageout() the dirty pages in the page cache.

Since it is easy to push the page out (as you said), it should be
easy to impose a limit on the page cache usage of a container.

> 
> I've also noticed that you've [snip]-ed on one of my questions.
> 
>  > How would you allocate memory on NUMA in advance?
> 
> Please, clarify this.

I am not quite sure I understand the question. Could you please rephrase
it and highlight some of the difficulty?

-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31  9:19           ` Balbir Singh
@ 2006-10-31  9:25             ` Pavel Emelianov
  2006-10-31 10:10               ` Balbir Singh
  2006-10-31  9:42             ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: Pavel Emelianov @ 2006-10-31  9:25 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm, Vaidyanathan S

Balbir Singh wrote:
> Pavel Emelianov wrote:
>> [snip]
>>
>>>> But in general I agree, these are the three important resources for
>>>> accounting and control
>>> I missed out to mention, I hope you were including the page cache in
>>> your definition of reclaimable memory.
>> As far as page cache is concerned my opinion is the following.
>> (If I misunderstood you, please correct me.)
>>
>> Page cache is designed to keep in memory as much pages as
>> possible to optimize performance. If we start limiting the page
>> cache usage we cut the performance. What is to be controlled is
>> _used_ resources (touched pages, opened file descriptors, mapped
>> areas, etc), but not the cached ones. I see nothing bad if the
>> page that belongs to a file, but is not used by ANY task in BC,
>> stays in memory. I think this is normal. If kernel wants it may
>> push this page out easily it won't event need to try_to_unmap()
>> it. So cached pages must not be accounted.
>>
> 
> The idea behind limiting the page cache is this
> 
> 1. Lets say one container fills up the page cache.
> 2. The other containers will not be able to allocate memory (even
> though they are within their limits) without the overhead of having
> to flush the page cache and freeing up occupied cache. The kernel
> will have to pageout() the dirty pages in the page cache.
> 
> Since it is easy to push the page out (as you said), it should be
> easy to impose a limit on the page cache usage of a container.

If a group is limited with memory _consumption_ it won't fill
the page cache...

>> I've also noticed that you've [snip]-ed on one of my questions.
>>
>>  > How would you allocate memory on NUMA in advance?
>>
>> Please, clarify this.
> 
> I am not quite sure I understand the question. Could you please rephrase
> it and highlight some of the difficulty?

I'd like to provide a guarantee for a newly created group. According
to your idea I have to preallocate some pages in advance. OK. How to
select a NUMA node to allocate them from?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31  9:19           ` Balbir Singh
  2006-10-31  9:25             ` Pavel Emelianov
@ 2006-10-31  9:42             ` Andrew Morton
  2006-10-31 10:36               ` Balbir Singh
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-10-31  9:42 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm, Vaidyanathan S

On Tue, 31 Oct 2006 14:49:12 +0530
Balbir Singh <balbir@in.ibm.com> wrote:

> The idea behind limiting the page cache is this
> 
> 1. Lets say one container fills up the page cache.
> 2. The other containers will not be able to allocate memory (even
> though they are within their limits) without the overhead of having
> to flush the page cache and freeing up occupied cache. The kernel
> will have to pageout() the dirty pages in the page cache.

There's a vast difference between clean pagecache and dirty pagecache in this
context.  It is terribly imprecise to use the term "pagecache".  And it would be
a poor implementation which failed to distinguish between clean pagecache and
dirty pagecache.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31  9:25             ` Pavel Emelianov
@ 2006-10-31 10:10               ` Balbir Singh
  2006-10-31 10:19                 ` Pavel Emelianov
  0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2006-10-31 10:10 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj,
	matthltc, dipankar, rohitseth, menage, linux-mm, Vaidyanathan S

Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Pavel Emelianov wrote:
>>> [snip]
>>>
>>>>> But in general I agree, these are the three important resources for
>>>>> accounting and control
>>>> I missed out to mention, I hope you were including the page cache in
>>>> your definition of reclaimable memory.
>>> As far as page cache is concerned my opinion is the following.
>>> (If I misunderstood you, please correct me.)
>>>
>>> Page cache is designed to keep in memory as much pages as
>>> possible to optimize performance. If we start limiting the page
>>> cache usage we cut the performance. What is to be controlled is
>>> _used_ resources (touched pages, opened file descriptors, mapped
>>> areas, etc), but not the cached ones. I see nothing bad if the
>>> page that belongs to a file, but is not used by ANY task in BC,
>>> stays in memory. I think this is normal. If kernel wants it may
>>> push this page out easily it won't event need to try_to_unmap()
>>> it. So cached pages must not be accounted.
>>>
>> The idea behind limiting the page cache is this
>>
>> 1. Lets say one container fills up the page cache.
>> 2. The other containers will not be able to allocate memory (even
>> though they are within their limits) without the overhead of having
>> to flush the page cache and freeing up occupied cache. The kernel
>> will have to pageout() the dirty pages in the page cache.
>>
>> Since it is easy to push the page out (as you said), it should be
>> easy to impose a limit on the page cache usage of a container.
> 
> If a group is limited with memory _consumption_ it won't fill
> the page cache...
> 

So you mean the memory _consumption_ limit is already controlling
the page cache? That's what we need the ability for a container
not to fill up the page cache :)

I don't remember correctly, but do you account for dirty page cache usage in
the latest patches of BC?

>>> I've also noticed that you've [snip]-ed on one of my questions.
>>>
>>>  > How would you allocate memory on NUMA in advance?
>>>
>>> Please, clarify this.
>> I am not quite sure I understand the question. Could you please rephrase
>> it and highlight some of the difficulty?
> 
> I'd like to provide a guarantee for a newly created group. According
> to your idea I have to preallocate some pages in advance. OK. How to
> select a NUMA node to allocate them from?

The idea of pre-allocation was discussed as a possibility in the case
that somebody needed hard guarantees, but most of us don't need it.
I was in the RFC for the sake of completeness.

Coming back to your question

Why do you need to select a NUMA node? For performance?

-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 10:10               ` Balbir Singh
@ 2006-10-31 10:19                 ` Pavel Emelianov
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Emelianov @ 2006-10-31 10:19 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm, Vaidyanathan S

[snip]

>>> Since it is easy to push the page out (as you said), it should be
>>> easy to impose a limit on the page cache usage of a container.
>> If a group is limited with memory _consumption_ it won't fill
>> the page cache...
>>
> 
> So you mean the memory _consumption_ limit is already controlling
> the page cache? That's what we need the ability for a container
> not to fill up the page cache :)

I mean page cache limiting is not needed. We need to make
sure group eats less that N physical pages. That can be
achieved by controlling page faults, setup_arg_pages(), etc.
Page cache is not to be touched.

> I don't remember correctly, but do you account for dirty page cache usage in
> the latest patches of BC?

We do not account for page cache itself. We track only
physical pages regardless of where they are.

[snip]

> The idea of pre-allocation was discussed as a possibility in the case
> that somebody needed hard guarantees, but most of us don't need it.
> I was in the RFC for the sake of completeness.
> 
> Coming back to your question
> 
> Why do you need to select a NUMA node? For performance?

Of course! Otherwise what do we need kmem_cache_alloc_node() etc
calls in kernel?

The second question is - what if two processes from different
beancounters try to share one page. I remember that the current
solution is to take the page from the first user's reserve. OK.
Consider then that this first user stops using the page. When
this happens one page must be put back to it's reserve, right?
But where to get this page from?

Note that making guarantee through limiting doesn't care about
where the page is get from.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31  9:42             ` Andrew Morton
@ 2006-10-31 10:36               ` Balbir Singh
  0 siblings, 0 replies; 16+ messages in thread
From: Balbir Singh @ 2006-10-31 10:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm, Vaidyanathan S

Andrew Morton wrote:
> On Tue, 31 Oct 2006 14:49:12 +0530
> Balbir Singh <balbir@in.ibm.com> wrote:
> 
>> The idea behind limiting the page cache is this
>>
>> 1. Lets say one container fills up the page cache.
>> 2. The other containers will not be able to allocate memory (even
>> though they are within their limits) without the overhead of having
>> to flush the page cache and freeing up occupied cache. The kernel
>> will have to pageout() the dirty pages in the page cache.
> 
> There's a vast difference between clean pagecache and dirty pagecache in this
> context.  It is terribly imprecise to use the term "pagecache".  And it would be
> a poor implementation which failed to distinguish between clean pagecache and
> dirty pagecache.
> 

Yes, I agree, it will be a good idea to distinguish between the two.

-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
       [not found]       ` <45470DF4.70405@openvz.org>
@ 2006-10-31 10:54         ` Balbir Singh
  2006-10-31 11:15           ` Pavel Emelianov
  0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2006-10-31 10:54 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj,
	matthltc, dipankar, rohitseth, menage, linux-mm

Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Pavel Emelianov wrote:
>>> [snip]
>>>
>>>> Reclaimable memory
>>>>
>>>> (i)   Anonymous pages - Anonymous pages are pages allocated by the user space,
>>>>       they are mapped into the user page tables, but not backed by a file.
>>> I do not agree with such classification.
>>> When one maps file then kernel can remove page from address
>>> space as there is already space on disk for it. When one
>>> maps an anonymous page then kernel won't remove this page
>>> for sure as system may simply be configured to be swapless.
>> Yes, I agree if there is no swap space, then anonymous memory is pinned.
>> Assuming that we'll end up using a an abstraction on top of the
>> existing reclaim mechanism, the mechanism would know if a particular
>> type of memory is reclaimable or not.
> 
> If memory is considered to be unreclaimable then actions should be
> taken at mmap() time, not later! Rejecting mmap() is the only way to
> limit user in unreclaimable memory consumption.

That's like disabling memory over-commit in the regular kernel.
Don't you think this should again be based on the systems configuration
of over-commit?

[snip]

> 
>> I understand that kernel memory accounting is the first priority for
>> containers, but accounting kernel memory requires too many changes
>> to the VM core, hence I was hesitant to put it up as first priority.
> 
> Among all the kernel-code-intrusive patches in BC patch set
> kmemsize hooks are the most "conservative" - only one place
> is heavily patched - this is slab allocator. Buddy is patched,
> but _significantly_ smaller. The rest of the patch adds __GFP_BC
> flags to some allocations and SLAB_BC to some kmem_caches.
> 
> User memory controlling patch is much heavier...
> 

Please see the patching of Rohit's memory controller for user
level patching. It seems much simpler.

> I'd set priorities of development that way:
> 
> 1. core infrastructure (mainly headers)
> 2. interface
> 3. kernel memory hooks and accounting
> 4. mappings hooks and accounting
> 5. physical pages hooks and accounting
> 6. user pages reclamation
> 7. moving threads between beancounters
> 8. make beancounter persistent

I would prefer a different set

1 & 2, for now we could use any interface and then start developing the
controller. As we develop the new controller, we are likely to find the
need to add/enhance the interface, so freezing in on 1 & 2 might not be
a good idea.

I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's
memory controller.

Then take up the rest.

-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 10:54         ` Balbir Singh
@ 2006-10-31 11:15           ` Pavel Emelianov
  2006-10-31 12:39             ` Balbir Singh
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Pavel Emelianov @ 2006-10-31 11:15 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, menage,
	linux-mm

[snip]

> That's like disabling memory over-commit in the regular kernel.

Nope. We limit only unreclaimable mappings. Allowing user
to break limits breaks the sense of limit.

Or you do not agree that allowing unlimited unreclaimable
mappings doesn't alow you the way to cut groups gracefully?

[snip]

> Please see the patching of Rohit's memory controller for user
> level patching. It seems much simpler.

Could you send me an URL where to get the patch from, please.
Or the patch itself directly to me. Thank you.

[snip]

> I would prefer a different set
> 
> 1 & 2, for now we could use any interface and then start developing the
> controller. As we develop the new controller, we are likely to find the
> need to add/enhance the interface, so freezing in on 1 & 2 might not be
> a good idea.

Paul Menage won't agree. He believes that interface must come first.
I also remind you that the latest beancounter patch provides all the
stuff we're discussing. It may move tasks, limit all three resources
discussed, reclaim memory and so on. And configfs interface could be
attached easily.

> I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's
> memory controller.
> 
> Then take up the rest.

I'll review Rohit's patches and comment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 11:15           ` Pavel Emelianov
@ 2006-10-31 12:39             ` Balbir Singh
  2006-10-31 14:19               ` Pavel Emelianov
  2006-10-31 16:54             ` Paul Menage
  2006-11-01  6:00             ` David Rientjes
  2 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2006-10-31 12:39 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel, pj,
	matthltc, dipankar, rohitseth, menage, linux-mm

Pavel Emelianov wrote:
>> That's like disabling memory over-commit in the regular kernel.
> 
> Nope. We limit only unreclaimable mappings. Allowing user
> to break limits breaks the sense of limit.
> 
> Or you do not agree that allowing unlimited unreclaimable
> mappings doesn't alow you the way to cut groups gracefully?
> 


A quick code review showed that most of the accounting is the
same.

I see that most of the mmap accounting code, it seems to do
the equivalent of security_vm_enough_memory() when VM_ACCOUNT
is set. May be we could merge the accounting code to handle
even containers.

I looked at

do_mmap_pgoff
acct_stack_growth
__do_brk (
do_mremap


> [snip]
> 
>> Please see the patching of Rohit's memory controller for user
>> level patching. It seems much simpler.
> 
> Could you send me an URL where to get the patch from, please.
> Or the patch itself directly to me. Thank you.

Please see http://lkml.org/lkml/2006/9/19/283

> 
> [snip]
> 
>> I would prefer a different set
>>
>> 1 & 2, for now we could use any interface and then start developing the
>> controller. As we develop the new controller, we are likely to find the
>> need to add/enhance the interface, so freezing in on 1 & 2 might not be
>> a good idea.
> 
> Paul Menage won't agree. He believes that interface must come first.
> I also remind you that the latest beancounter patch provides all the
> stuff we're discussing. It may move tasks, limit all three resources
> discussed, reclaim memory and so on. And configfs interface could be
> attached easily.
> 

I think the interface should depend on the controllers and not
the other way around. I fear that the infrastructure discussion might
hold us back and no fruitful work will happen on the controllers.
Once we add and agree on the controller, we can then look at the
interface requirements (like persistence if kernel memory is being
tracked, etc). What do you think?

>> I would put 4, 5 and 6 ahead of 3, based on the changes I see in Rohit's
>> memory controller.
>>
>> Then take up the rest.
> 
> I'll review Rohit's patches and comment.

ok



-- 
	Thanks,
	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 12:39             ` Balbir Singh
@ 2006-10-31 14:19               ` Pavel Emelianov
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Emelianov @ 2006-10-31 14:19 UTC (permalink / raw)
  To: balbir, menage
  Cc: Pavel Emelianov, vatsa, dev, sekharan, ckrm-tech, haveblue,
	linux-kernel, pj, matthltc, dipankar, rohitseth, linux-mm

[snip]

> A quick code review showed that most of the accounting is the
> same.
> 
> I see that most of the mmap accounting code, it seems to do
> the equivalent of security_vm_enough_memory() when VM_ACCOUNT
> is set. May be we could merge the accounting code to handle
> even containers.
> 
> I looked at
> 
> do_mmap_pgoff
> acct_stack_growth
> __do_brk (
> do_mremap

I'm sure this is possible. I'll take this into account
in the next patch series. Thank you.

>> [snip]
>>
>>> Please see the patching of Rohit's memory controller for user
>>> level patching. It seems much simpler.
>> Could you send me an URL where to get the patch from, please.
>> Or the patch itself directly to me. Thank you.
> 
> Please see http://lkml.org/lkml/2006/9/19/283

Thanks. I'll review it in a couple of days and comment.

[snip]

> I think the interface should depend on the controllers and not
> the other way around. I fear that the infrastructure discussion might
> hold us back and no fruitful work will happen on the controllers.
> Once we add and agree on the controller, we can then look at the
> interface requirements (like persistence if kernel memory is being
> tracked, etc). What do you think?

I do agree with you. But we have to make an agreement with
Paul in this also...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 11:15           ` Pavel Emelianov
  2006-10-31 12:39             ` Balbir Singh
@ 2006-10-31 16:54             ` Paul Menage
  2006-11-01  6:00             ` David Rientjes
  2 siblings, 0 replies; 16+ messages in thread
From: Paul Menage @ 2006-10-31 16:54 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel,
	pj, matthltc, dipankar, rohitseth, linux-mm

On 10/31/06, Pavel Emelianov <xemul@openvz.org> wrote:
>
> Paul Menage won't agree. He believes that interface must come first.

No, I'm just trying to get agreement on the generic infrastructure for
process containers and extensibility - the actual API to the memory
controller (i.e. what limits, what to track, etc) can presumably be
fitted into  the generic mechanism fairly easily (or else the
infrastructure probably isn't generic enough).

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-10-31 11:15           ` Pavel Emelianov
  2006-10-31 12:39             ` Balbir Singh
  2006-10-31 16:54             ` Paul Menage
@ 2006-11-01  6:00             ` David Rientjes
  2006-11-01  8:05               ` Pavel Emelianov
  2 siblings, 1 reply; 16+ messages in thread
From: David Rientjes @ 2006-11-01  6:00 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel,
	pj, matthltc, dipankar, rohitseth, menage, linux-mm

On Tue, 31 Oct 2006, Pavel Emelianov wrote:

> Paul Menage won't agree. He believes that interface must come first.
> I also remind you that the latest beancounter patch provides all the
> stuff we're discussing. It may move tasks, limit all three resources
> discussed, reclaim memory and so on. And configfs interface could be
> attached easily.
> 

There's really two different interfaces: those to the controller and those 
to the container.  While the configfs (or simpler fs implementation solely 
for our purposes) is the most logical because of its inherent hierarchial 
nature, it seems like the only criticism on that has come from UBC.  From 
my understanding of beancounter, it could be implemented on top of any 
such container abstraction anyway.

		David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-11-01  6:00             ` David Rientjes
@ 2006-11-01  8:05               ` Pavel Emelianov
  2006-11-01  8:35                 ` David Rientjes
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Emelianov @ 2006-11-01  8:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pavel Emelianov, balbir, vatsa, dev, sekharan, ckrm-tech,
	haveblue, linux-kernel, pj, matthltc, dipankar, rohitseth,
	menage, linux-mm

David Rientjes wrote:
> On Tue, 31 Oct 2006, Pavel Emelianov wrote:
> 
>> Paul Menage won't agree. He believes that interface must come first.
>> I also remind you that the latest beancounter patch provides all the
>> stuff we're discussing. It may move tasks, limit all three resources
>> discussed, reclaim memory and so on. And configfs interface could be
>> attached easily.
>>
> 
> There's really two different interfaces: those to the controller and those 
> to the container.  While the configfs (or simpler fs implementation solely 
> for our purposes) is the most logical because of its inherent hierarchial 
> nature, it seems like the only criticism on that has come from UBC.  From 
> my understanding of beancounter, it could be implemented on top of any 
> such container abstraction anyway.

beancounters may be implemented above any (or nearly any) userspace
interface, no questions. But we're trying to come to agreement here,
so I just say my point of view.

I don't mind having file system based interface, I just believe that
configfs is not so good for it. I've already answered that having
our own filesystem for it sounds better than having configfs.

Maybe we can summarize what we have come to?

> 		David
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [ckrm-tech] RFC: Memory Controller
  2006-11-01  8:05               ` Pavel Emelianov
@ 2006-11-01  8:35                 ` David Rientjes
  0 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2006-11-01  8:35 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: balbir, vatsa, dev, sekharan, ckrm-tech, haveblue, linux-kernel,
	pj, matthltc, dipankar, rohitseth, menage, linux-mm

On Wed, 1 Nov 2006, Pavel Emelianov wrote:

> beancounters may be implemented above any (or nearly any) userspace
> interface, no questions. But we're trying to come to agreement here,
> so I just say my point of view.
> 
> I don't mind having file system based interface, I just believe that
> configfs is not so good for it. I've already answered that having
> our own filesystem for it sounds better than having configfs.
> 
> Maybe we can summarize what we have come to?
> 

I've seen nothing but praise for Paul Menage's suggestion of implementing 
a single-level containers abstraction for processes and attaching 
these to various resource controller (disk, network, memory, cpu) nodes.  
The question of whether to use configfs or not is really at the fore-front 
of that discussion because making any progress in implementation is 
difficult without first deciding upon it, and the containers abstraction 
patchset uses configfs as its interface.

The original objection against configfs was against the lifetime of the 
resource controller.  But this is actually a two part question since there 
are two interfaces: one for the containers, one for the controllers.  At 
present it seems like the only discussion taking place is that of the 
container so this objection can wait.  After boot, there are one of two 
options:

 - require the user to mount the configfs filesystem with a single
   system-wide container as default

    i. include all processes in that container by default

   ii. include no processes in that container, force the user to add them

 - create the entire container abstraction upon boot and attach all
   processes to it in a manner similar to procfs

 [ In both scenarios, kernel behavior is unchanged if no resource
   controller node is attached to any container as if the container(s)
   didn't exist. ]

Another objection against configfs was the fact that you must define 
CONFIG_CONFIGFS_FS to use CONFIG_CONTAINERS.  This objection does not make 
much sense since it seems like we are falling the direction of abandoning 
the syscall approach here and looking toward an fs approach in the first 
place.  So CONFIG_CONTAINERS will need to include its own lightweight 
filesystem if we cannot use CONFIG_CONFIGFS_FS, but it seems redundant 
since this is what configfs is for: a configurable filesystem to interface 
to the kernel.  We definitely do not want two or more interfaces to 
_containers_ so we are reimplementing an already existing infrastructure.

The criticism that users can create containers and then not use them 
shouldn't be an issue if it is carefully implemented.  In fact, I proposed 
that all processes are initially attached to a single system-wide 
container at boot regardless if you've loaded any controllers or not just 
like how UMA machines work with node 0 for system-wide memory.  We should 
incur no overhead for having empty or _full_ containers if we haven't 
loaded controllers or have configured them properly to include the right 
containers.

So if we re-read Paul Menage's containers abstraction away from cpusets 
patchset that uses configfs, we can see that we are almost there with the 
exception of making it a single-layer "hierarchy" as he has already 
proposed.  Resource controller "nodes" that these containers can be 
attached to are a separate issue at this point and shouldn't be confused.

		David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-11-01  8:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20061030103356.GA16833@in.ibm.com>
     [not found] ` <4545D51A.1060808@in.ibm.com>
     [not found]   ` <4546212B.4010603@openvz.org>
     [not found]     ` <454638D2.7050306@in.ibm.com>
2006-10-30 18:07       ` [ckrm-tech] RFC: Memory Controller Balbir Singh
2006-10-31  8:57         ` Pavel Emelianov
2006-10-31  9:19           ` Balbir Singh
2006-10-31  9:25             ` Pavel Emelianov
2006-10-31 10:10               ` Balbir Singh
2006-10-31 10:19                 ` Pavel Emelianov
2006-10-31  9:42             ` Andrew Morton
2006-10-31 10:36               ` Balbir Singh
     [not found]       ` <45470DF4.70405@openvz.org>
2006-10-31 10:54         ` Balbir Singh
2006-10-31 11:15           ` Pavel Emelianov
2006-10-31 12:39             ` Balbir Singh
2006-10-31 14:19               ` Pavel Emelianov
2006-10-31 16:54             ` Paul Menage
2006-11-01  6:00             ` David Rientjes
2006-11-01  8:05               ` Pavel Emelianov
2006-11-01  8:35                 ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox