* Re: [RFC][PATCH 1/3] memcg: documentation for controll file
2008-05-20 9:05 [RFC][PATCH 1/3] memcg: documentation for controll file KAMEZAWA Hiroyuki
@ 2008-05-20 9:04 ` Pavel Emelyanov
2008-05-20 9:23 ` KAMEZAWA Hiroyuki
2008-05-20 9:08 ` [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup KAMEZAWA Hiroyuki
2008-05-20 9:09 ` [RFC][PATCH 3/3] memcg: per node information KAMEZAWA Hiroyuki
2 siblings, 1 reply; 13+ messages in thread
From: Pavel Emelyanov @ 2008-05-20 9:04 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, menage, balbir, lizf, yamamoto
KAMEZAWA Hiroyuki wrote:
> Add a documentation for memory resource controller's files.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
I have described some files, that should be created by a control group,
which uses a res_counter in Documentation/controllers/resource_counter.txt
section 4.
Maybe it's worth adding a reference to this file, or even rework this
text? How do you think?
> Index: mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> ===================================================================
> --- /dev/null
> +++ mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> @@ -0,0 +1,76 @@
> +Files under memory resource controller and its resource counter.
> +(See controllers/memory.txt about memory resource controller)
> +
> +* memory.usage_in_bytes
> + (read)
> + Currently accounted memory usage under memory controller in bytes.
> + multiple of PAGE_SIZE.
> +
> + Even if there is no tasks under controller, some page caches and
> + swap caches are still accounted. (See memory.force_empty below.)
> +
> + (write)
> + no write operation
> +
> +* memory.limit_in_bytes
> + (read)
> + Current limit of usage to this memory resource controller in bytes.
> + (write)
> + Set limit to this memory resource controller.
> + A user can use "K', 'M', 'G' to specify the limit.
> +
> + (Example) You can set limit of 400M by following.
> + % echo 400M > /path to cgroup/memory.limit_in_bytes
> +
> +* memory.max_usage_in_bytes
> + (read)
> + Recorded maximum memory usage under this memory controller.
> +
> + (write)
> + Reset the record to 0.
> +
> + (example usage)
> + 1. create a cgroup
> + % mkdir /path_to_cgroup/my_cgroup.
> +
> + 2. enter the cgroup
> + % echo $$ > /path_to_cgroup/my_cgroup/tasks.
> +
> + 3. Run your program
> + % Run......
> +
> + 4. See how much you used.
> + % cat /path_to_cgroup/my_cgroup/memory.max_usage_in_bytes.
> +
> + Now you know how much your application will use. Maybe this
> + can be a good to set limits_in_bytes to some proper value.
> +
> +* memory.force_empty
> + (read)
> + not allowed.
> + (write)
> + Drop all charges under cgroup. This can be called only when
> + there is no task under this cgroup. This is here for debug purpose.
> +
> +* memory.stat
> + (read)
> + show 6 values. (will change in future)
> + cache .... usage accounted as File-Cache.
> + anon/swapcache .... usage accounted as anonymous memory or swapcache.
> + pgpgin .... # of page-in under this cgroup.
> + pgpgout .... # of page-out under this cgroup
> + active .... amounts of memory which is treated as 'active'
> + inactive .... amounts of memory which is treated as 'inactive'
> + (write)
> + not allowed
> +
> +* memory.failcnt
> + (read)
> + The number of blocked memory allocation.
> + Until the usage reaches the limit, memory allocation is not blocked.
> + When it reaches, memory allocation is blocked and try to reclaim memory
> + from LRU.
> +
> + (write)
> + Reset to 0.
> +
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 1/3] memcg: documentation for controll file
2008-05-20 9:04 ` Pavel Emelyanov
@ 2008-05-20 9:23 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-20 9:23 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: LKML, linux-mm, menage, balbir, lizf, yamamoto
On Tue, 20 May 2008 13:04:33 +0400
Pavel Emelyanov <xemul@openvz.org> wrote:
> KAMEZAWA Hiroyuki wrote:
> > Add a documentation for memory resource controller's files.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> I have described some files, that should be created by a control group,
> which uses a res_counter in Documentation/controllers/resource_counter.txt
> section 4.
>
Ah, sorry. I missed it.
> Maybe it's worth adding a reference to this file, or even rework this
> text? How do you think?
>
I'll drop parameters from res_coutner and just shows special files for
memory controller and some how-to-use text. (maybe add to memory.txt)
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-20 9:05 [RFC][PATCH 1/3] memcg: documentation for controll file KAMEZAWA Hiroyuki
2008-05-20 9:04 ` Pavel Emelyanov
@ 2008-05-20 9:08 ` KAMEZAWA Hiroyuki
2008-05-20 9:23 ` Pavel Emelyanov
2008-05-20 18:46 ` Paul Menage
2008-05-20 9:09 ` [RFC][PATCH 3/3] memcg: per node information KAMEZAWA Hiroyuki
2 siblings, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-20 9:08 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, menage, balbir, xemul, lizf, yamamoto
Does anyone have a better idea ?
==
Currently, cgroup's seq_file interface just supports single_open.
This patch allows arbitrary seq_ops if passed.
For example, "status per cpu, status per node" can be very big
in general and they tend to use its own start/next/stop ops.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/cgroup.h | 9 +++++++++
kernel/cgroup.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 38 insertions(+), 3 deletions(-)
Index: mm-2.6.26-rc2-mm1/include/linux/cgroup.h
===================================================================
--- mm-2.6.26-rc2-mm1.orig/include/linux/cgroup.h
+++ mm-2.6.26-rc2-mm1/include/linux/cgroup.h
@@ -232,6 +232,11 @@ struct cftype {
*/
int (*read_seq_string) (struct cgroup *cont, struct cftype *cft,
struct seq_file *m);
+ /*
+ * If this is not NULL, read ops will use this instead of
+ * single_open(). Useful for showing very large data.
+ */
+ struct seq_operations *seq_ops;
ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
struct file *file,
@@ -285,6 +290,10 @@ int cgroup_path(const struct cgroup *cgr
int cgroup_task_count(const struct cgroup *cgrp);
+
+struct cgroup *cgroup_of_seqfile(struct seq_file *m);
+struct cftype *cftype_of_seqfile(struct seq_file *m);
+
/* Return true if the cgroup is a descendant of the current cgroup */
int cgroup_is_descendant(const struct cgroup *cgrp);
Index: mm-2.6.26-rc2-mm1/kernel/cgroup.c
===================================================================
--- mm-2.6.26-rc2-mm1.orig/kernel/cgroup.c
+++ mm-2.6.26-rc2-mm1/kernel/cgroup.c
@@ -1540,6 +1540,16 @@ struct cgroup_seqfile_state {
struct cgroup *cgroup;
};
+struct cgroup *cgroup_of_seqfile(struct seq_file *m)
+{
+ return ((struct cgroup_seqfile_state *)m->private)->cgroup;
+}
+
+struct cftype *cftype_of_seqfile(struct seq_file *m)
+{
+ return ((struct cgroup_seqfile_state *)m->private)->cft;
+}
+
static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
{
struct seq_file *sf = cb->state;
@@ -1563,8 +1573,14 @@ static int cgroup_seqfile_show(struct se
static int cgroup_seqfile_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
+ struct cgroup_seqfile_state *state = seq->private;
+ struct cftype *cft = state->cft;
+
kfree(seq->private);
- return single_release(inode, file);
+ if (!cft->seq_ops)
+ return single_release(inode, file);
+ else
+ return seq_release(inode, file);
}
static struct file_operations cgroup_seqfile_operations = {
@@ -1585,7 +1601,7 @@ static int cgroup_file_open(struct inode
cft = __d_cft(file->f_dentry);
if (!cft)
return -ENODEV;
- if (cft->read_map || cft->read_seq_string) {
+ if (cft->read_map || cft->read_seq_string || cft->seq_ops) {
struct cgroup_seqfile_state *state =
kzalloc(sizeof(*state), GFP_USER);
if (!state)
@@ -1593,7 +1609,17 @@ static int cgroup_file_open(struct inode
state->cft = cft;
state->cgroup = __d_cgrp(file->f_dentry->d_parent);
file->f_op = &cgroup_seqfile_operations;
- err = single_open(file, cgroup_seqfile_show, state);
+
+ if (!cft->seq_ops)
+ err = single_open(file, cgroup_seqfile_show, state);
+ else {
+ err = seq_open(file, cft->seq_ops);
+ if (!err) {
+ struct seq_file *sf;
+ sf = ((struct seq_file *)file->private_data);
+ sf->private = state;
+ }
+ }
if (err < 0)
kfree(state);
} else if (cft->open)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-20 9:08 ` [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup KAMEZAWA Hiroyuki
@ 2008-05-20 9:23 ` Pavel Emelyanov
2008-05-20 18:46 ` Paul Menage
1 sibling, 0 replies; 13+ messages in thread
From: Pavel Emelyanov @ 2008-05-20 9:23 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, menage, balbir, lizf, yamamoto
KAMEZAWA Hiroyuki wrote:
> Does anyone have a better idea ?
> ==
>
> Currently, cgroup's seq_file interface just supports single_open.
> This patch allows arbitrary seq_ops if passed.
That's great :)
> For example, "status per cpu, status per node" can be very big
> in general and they tend to use its own start/next/stop ops.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-20 9:08 ` [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup KAMEZAWA Hiroyuki
2008-05-20 9:23 ` Pavel Emelyanov
@ 2008-05-20 18:46 ` Paul Menage
2008-05-21 0:28 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 13+ messages in thread
From: Paul Menage @ 2008-05-20 18:46 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, balbir, xemul, lizf, yamamoto
On Tue, May 20, 2008 at 2:08 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Does anyone have a better idea ?
As a way of printing plain text files, it seems fine.
My concern is that it means that cgroups no longer has any idea about
the typing of the data being returned, which will make it harder to
integrate with a binary stats API. You'd end up having to have a
separate reporting method for the same data to use it. That's why the
"read_map" function specifically doesn't take a seq_file, but instead
takes a key/value callback abstraction, which currently maps into a
seq_file. For the binary stats API, we can use the same reporting
functions, and just map into the binary API output.
Maybe we can somehow combine the read_map() abstraction with the
seq_file's start/stop/next operations.
Paul
> ==
>
> Currently, cgroup's seq_file interface just supports single_open.
> This patch allows arbitrary seq_ops if passed.
>
> For example, "status per cpu, status per node" can be very big
> in general and they tend to use its own start/next/stop ops.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> ---
> include/linux/cgroup.h | 9 +++++++++
> kernel/cgroup.c | 32 +++++++++++++++++++++++++++++---
> 2 files changed, 38 insertions(+), 3 deletions(-)
>
> Index: mm-2.6.26-rc2-mm1/include/linux/cgroup.h
> ===================================================================
> --- mm-2.6.26-rc2-mm1.orig/include/linux/cgroup.h
> +++ mm-2.6.26-rc2-mm1/include/linux/cgroup.h
> @@ -232,6 +232,11 @@ struct cftype {
> */
> int (*read_seq_string) (struct cgroup *cont, struct cftype *cft,
> struct seq_file *m);
> + /*
> + * If this is not NULL, read ops will use this instead of
> + * single_open(). Useful for showing very large data.
> + */
> + struct seq_operations *seq_ops;
>
> ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
> struct file *file,
> @@ -285,6 +290,10 @@ int cgroup_path(const struct cgroup *cgr
>
> int cgroup_task_count(const struct cgroup *cgrp);
>
> +
> +struct cgroup *cgroup_of_seqfile(struct seq_file *m);
> +struct cftype *cftype_of_seqfile(struct seq_file *m);
> +
> /* Return true if the cgroup is a descendant of the current cgroup */
> int cgroup_is_descendant(const struct cgroup *cgrp);
>
> Index: mm-2.6.26-rc2-mm1/kernel/cgroup.c
> ===================================================================
> --- mm-2.6.26-rc2-mm1.orig/kernel/cgroup.c
> +++ mm-2.6.26-rc2-mm1/kernel/cgroup.c
> @@ -1540,6 +1540,16 @@ struct cgroup_seqfile_state {
> struct cgroup *cgroup;
> };
>
> +struct cgroup *cgroup_of_seqfile(struct seq_file *m)
> +{
> + return ((struct cgroup_seqfile_state *)m->private)->cgroup;
> +}
> +
> +struct cftype *cftype_of_seqfile(struct seq_file *m)
> +{
> + return ((struct cgroup_seqfile_state *)m->private)->cft;
> +}
> +
> static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
> {
> struct seq_file *sf = cb->state;
> @@ -1563,8 +1573,14 @@ static int cgroup_seqfile_show(struct se
> static int cgroup_seqfile_release(struct inode *inode, struct file *file)
> {
> struct seq_file *seq = file->private_data;
> + struct cgroup_seqfile_state *state = seq->private;
> + struct cftype *cft = state->cft;
> +
> kfree(seq->private);
> - return single_release(inode, file);
> + if (!cft->seq_ops)
> + return single_release(inode, file);
> + else
> + return seq_release(inode, file);
> }
>
> static struct file_operations cgroup_seqfile_operations = {
> @@ -1585,7 +1601,7 @@ static int cgroup_file_open(struct inode
> cft = __d_cft(file->f_dentry);
> if (!cft)
> return -ENODEV;
> - if (cft->read_map || cft->read_seq_string) {
> + if (cft->read_map || cft->read_seq_string || cft->seq_ops) {
> struct cgroup_seqfile_state *state =
> kzalloc(sizeof(*state), GFP_USER);
> if (!state)
> @@ -1593,7 +1609,17 @@ static int cgroup_file_open(struct inode
> state->cft = cft;
> state->cgroup = __d_cgrp(file->f_dentry->d_parent);
> file->f_op = &cgroup_seqfile_operations;
> - err = single_open(file, cgroup_seqfile_show, state);
> +
> + if (!cft->seq_ops)
> + err = single_open(file, cgroup_seqfile_show, state);
> + else {
> + err = seq_open(file, cft->seq_ops);
> + if (!err) {
> + struct seq_file *sf;
> + sf = ((struct seq_file *)file->private_data);
> + sf->private = state;
> + }
> + }
> if (err < 0)
> kfree(state);
> } else if (cft->open)
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-20 18:46 ` Paul Menage
@ 2008-05-21 0:28 ` KAMEZAWA Hiroyuki
2008-05-21 5:06 ` Paul Menage
0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-21 0:28 UTC (permalink / raw)
To: Paul Menage; +Cc: LKML, linux-mm, balbir, xemul, lizf, yamamoto
On Tue, 20 May 2008 11:46:46 -0700
"Paul Menage" <menage@google.com> wrote:
> On Tue, May 20, 2008 at 2:08 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Does anyone have a better idea ?
>
> As a way of printing plain text files, it seems fine.
>
> My concern is that it means that cgroups no longer has any idea about
> the typing of the data being returned, which will make it harder to
> integrate with a binary stats API. You'd end up having to have a
> separate reporting method for the same data to use it. That's why the
> "read_map" function specifically doesn't take a seq_file, but instead
> takes a key/value callback abstraction, which currently maps into a
> seq_file. For the binary stats API, we can use the same reporting
> functions, and just map into the binary API output.
>
With current interface, my concern is hotplug.
File-per-node method requires delete/add files at hotplug.
A file for all nodes with _maps_ method cannot be used because
maps file says
==
The key/value pairs (and their ordering) should not
* change between reboots.
==
And (*read) method isn't useful ;)
Can we add new stat file dynamically ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-21 0:28 ` KAMEZAWA Hiroyuki
@ 2008-05-21 5:06 ` Paul Menage
2008-05-21 6:06 ` KAMEZAWA Hiroyuki
2008-05-21 13:08 ` Hirokazu Takahashi
0 siblings, 2 replies; 13+ messages in thread
From: Paul Menage @ 2008-05-21 5:06 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, balbir, xemul, lizf, yamamoto
On Tue, May 20, 2008 at 5:28 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> With current interface, my concern is hotplug.
>
> File-per-node method requires delete/add files at hotplug.
> A file for all nodes with _maps_ method cannot be used because
> maps file says
> ==
> The key/value pairs (and their ordering) should not
> * change between reboots.
> ==
OK, so we may need to extend the interface ...
The main reason for that restriction (not allowing the set of keys to
change) was to simplify and speed up userspace parsing and make any
future binary API simpler. But if it's not going to work, we can maybe
make that optional instead.
>
> And (*read) method isn't useful ;)
>
> Can we add new stat file dynamically ?
Yes, there's no reason we can't do that. Right now it's not possible
to remove a control file without deleting the cgroup, but I have a
patch that supports removal.
The question is whether it's better to have one file per CPU/node or
one large complex file.
Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-21 5:06 ` Paul Menage
@ 2008-05-21 6:06 ` KAMEZAWA Hiroyuki
2008-05-21 13:08 ` Hirokazu Takahashi
1 sibling, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-21 6:06 UTC (permalink / raw)
To: Paul Menage; +Cc: LKML, linux-mm, balbir, xemul, lizf, yamamoto
On Tue, 20 May 2008 22:06:48 -0700
"Paul Menage" <menage@google.com> wrote:
> >
> > And (*read) method isn't useful ;)
> >
> > Can we add new stat file dynamically ?
>
> Yes, there's no reason we can't do that. Right now it's not possible
> to remove a control file without deleting the cgroup, but I have a
> patch that supports removal.
>
Good news. I'll wait for.
> The question is whether it's better to have one file per CPU/node or
> one large complex file.
>
For making the kernel simple, one-file-per-entity(cpu/node...) is better.
For making the applications simple, one big file is better.
I think recent interfaces uses one-file-per-entity method. So I vote for it
for this numastat. One concern is size of cpu/node. It can be 1024...4096 depends
on environment.
open/close 4096 files took some amount of cpu time.
(And that's why 'ps' command is slow on big system.)
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
2008-05-21 5:06 ` Paul Menage
2008-05-21 6:06 ` KAMEZAWA Hiroyuki
@ 2008-05-21 13:08 ` Hirokazu Takahashi
1 sibling, 0 replies; 13+ messages in thread
From: Hirokazu Takahashi @ 2008-05-21 13:08 UTC (permalink / raw)
To: menage
Cc: kamezawa.hiroyu, linux-kernel, linux-mm, balbir, xemul, lizf, yamamoto
Hi,
> > With current interface, my concern is hotplug.
> >
> > File-per-node method requires delete/add files at hotplug.
> > A file for all nodes with _maps_ method cannot be used because
> > maps file says
> > ==
> > The key/value pairs (and their ordering) should not
> > * change between reboots.
> > ==
>
> OK, so we may need to extend the interface ...
I also hope it!
Now I'm working on dm-ioband --- I/O bandwidth controller --- and
making it be able to work under cgroups.
I realized it is quite hard to set some specific value to each block
device because each machine has various number of devices and then
some of them are hot-added or hot-removed.
So I hope CGROUP will support some method to handle hot-pluggable
resources.
> The main reason for that restriction (not allowing the set of keys to
> change) was to simplify and speed up userspace parsing and make any
> future binary API simpler. But if it's not going to work, we can maybe
> make that optional instead.
> >
> > And (*read) method isn't useful ;)
> >
> > Can we add new stat file dynamically ?
>
> Yes, there's no reason we can't do that. Right now it's not possible
> to remove a control file without deleting the cgroup, but I have a
> patch that supports removal.
>
> The question is whether it's better to have one file per CPU/node or
> one large complex file.
>
> Paul
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH 3/3] memcg: per node information
2008-05-20 9:05 [RFC][PATCH 1/3] memcg: documentation for controll file KAMEZAWA Hiroyuki
2008-05-20 9:04 ` Pavel Emelyanov
2008-05-20 9:08 ` [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup KAMEZAWA Hiroyuki
@ 2008-05-20 9:09 ` KAMEZAWA Hiroyuki
2008-05-20 9:33 ` Li Zefan
2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-20 9:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, menage, balbir, xemul, lizf, yamamoto
Show per-node statistics in following format.
node-id total acitve inactive
[root@iridium bench]# cat /opt/cgroup/memory.numa_stat
0 417611776 99586048 318025728
1 655360000 0 655360000
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memcontrol.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
Index: mm-2.6.26-rc2-mm1/mm/memcontrol.c
===================================================================
--- mm-2.6.26-rc2-mm1.orig/mm/memcontrol.c
+++ mm-2.6.26-rc2-mm1/mm/memcontrol.c
@@ -960,6 +960,66 @@ static int mem_control_stat_show(struct
return 0;
}
+#ifdef CONFIG_NUMA
+static void *memcg_numastat_start(struct seq_file *m, loff_t *pos)
+{
+ loff_t node = *pos;
+ struct pglist_data *pgdat = first_online_pgdat();
+
+ while (pgdat != NULL) {
+ if (!node)
+ break;
+ pgdat = next_online_pgdat(pgdat);
+ node--;
+ }
+ return pgdat;
+}
+
+static void *memcg_numastat_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ struct pglist_data *pgdat = (struct pglist_data *)arg;
+
+ (*pos)++;
+ return next_online_pgdat(pgdat);
+}
+
+static void memcg_numastat_stop(struct seq_file *m, void *arg)
+{
+}
+
+static int memcg_numastat_show(struct seq_file *m, void *arg)
+{
+ struct pglist_data *pgdat = (struct pglist_data *)arg;
+ int nid = pgdat->node_id;
+ struct cgroup *cgrp = cgroup_of_seqfile(m);
+ struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+ struct mem_cgroup_per_zone *mz;
+ long active, inactive, total;
+ int zid;
+
+ active = 0;
+ inactive = 0;
+
+ for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+ mz = mem_cgroup_zoneinfo(memcg, nid, zid);
+ active += MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE);
+ inactive += MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE);
+ }
+ active *= PAGE_SIZE;
+ inactive *= PAGE_SIZE;
+ total = active + inactive;
+ /* Node Total Active Inactive (Total = Active + Inactive) */
+ return seq_printf(m, "%d %ld %ld %ld\n", nid, total, active, inactive);
+}
+
+struct seq_operations memcg_numastat_op = {
+ .start = memcg_numastat_start,
+ .next = memcg_numastat_next,
+ .stop = memcg_numastat_stop,
+ .show = memcg_numastat_show,
+};
+#endif
+
static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -992,6 +1052,12 @@ static struct cftype mem_cgroup_files[]
.name = "stat",
.read_map = mem_control_stat_show,
},
+#ifdef CONFIG_NUMA
+ {
+ .name = "numa_stat",
+ .seq_ops = &memcg_numastat_op,
+ },
+#endif
};
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
Index: mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
===================================================================
--- mm-2.6.26-rc2-mm1.orig/Documentation/controllers/memory_files.txt
+++ mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
@@ -74,3 +74,13 @@ Files under memory resource controller a
(write)
Reset to 0.
+* memory.numa_stat
+
+ This file appears only when the kernel is configured as NUMA.
+
+ (read)
+ Show per-node accounting information of acitve/inactive pages.
+ formated as following.
+ nodeid total active inactive
+
+ total = active + inactive.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC][PATCH 3/3] memcg: per node information
2008-05-20 9:09 ` [RFC][PATCH 3/3] memcg: per node information KAMEZAWA Hiroyuki
@ 2008-05-20 9:33 ` Li Zefan
2008-05-20 10:56 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 13+ messages in thread
From: Li Zefan @ 2008-05-20 9:33 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, linux-mm, menage, balbir, xemul, yamamoto
KAMEZAWA Hiroyuki wrote:
> static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> Index: mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> ===================================================================
> --- mm-2.6.26-rc2-mm1.orig/Documentation/controllers/memory_files.txt
> +++ mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> @@ -74,3 +74,13 @@ Files under memory resource controller a
> (write)
> Reset to 0.
>
> +* memory.numa_stat
> +
> + This file appears only when the kernel is configured as NUMA.
> +
> + (read)
> + Show per-node accounting information of acitve/inactive pages.
> + formated as following.
formatted
> + nodeid total active inactive
2 spaces? ^^
> +
> + total = active + inactive.
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH 3/3] memcg: per node information
2008-05-20 9:33 ` Li Zefan
@ 2008-05-20 10:56 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-05-20 10:56 UTC (permalink / raw)
To: Li Zefan; +Cc: LKML, linux-mm, menage, balbir, xemul, yamamoto
On Tue, 20 May 2008 17:33:20 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> > Index: mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> > ===================================================================
> > --- mm-2.6.26-rc2-mm1.orig/Documentation/controllers/memory_files.txt
> > +++ mm-2.6.26-rc2-mm1/Documentation/controllers/memory_files.txt
> > @@ -74,3 +74,13 @@ Files under memory resource controller a
> > (write)
> > Reset to 0.
> >
> > +* memory.numa_stat
> > +
> > + This file appears only when the kernel is configured as NUMA.
> > +
> > + (read)
> > + Show per-node accounting information of acitve/inactive pages.
> > + formated as following.
>
> formatted
>
> > + nodeid total active inactive
>
> 2 spaces? ^^
>
Thanks, will fix.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread