From: David Rientjes <rientjes@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"containers@lists.osdl.org" <containers@lists.osdl.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"yamamoto@valinux.co.jp" <yamamoto@valinux.co.jp>
Subject: Re: [PATCH] memory cgroup enhancements [1/5] force_empty for memory cgroup
Date: Tue, 16 Oct 2007 21:17:06 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.0.9999.0710162113300.13648@chino.kir.corp.google.com> (raw)
In-Reply-To: <20071016192341.1c3746df.kamezawa.hiroyu@jp.fujitsu.com>
On Tue, 16 Oct 2007, KAMEZAWA Hiroyuki wrote:
> This patch adds an interface "memory.force_empty".
> Any write to this file will drop all charges in this cgroup if
> there is no task under.
>
> %echo 1 > /....../memory.force_empty
>
> will drop all charges of memory cgroup if cgroup's tasks is empty.
>
> This is useful to invoke rmdir() against memory cgroup successfully.
>
If there's no tasks in the cgroup, then how can there be any charges
against its memory controller? Is the memory not being uncharged when a
task exits or is moved to another cgroup?
If the only use of this is for rmdir, why not just make it part of the
rmdir operation on the memory cgroup if there are no tasks by default?
> Tested and worked well on x86_64/fake-NUMA system.
>
> Changelog v3 -> v4:
> - adjusted to 2.6.23-mm1
> - fixed typo
> - changes buf[2]="0" to static const
> Changelog v2 -> v3:
> - changed the name from force_reclaim to force_empty.
> Changelog v1 -> v2:
> - added a new interface force_reclaim.
> - changes spin_lock to spin_lock_irqsave().
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> mm/memcontrol.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 95 insertions(+), 7 deletions(-)
>
> Index: devel-2.6.23-mm1/mm/memcontrol.c
> ===================================================================
> --- devel-2.6.23-mm1.orig/mm/memcontrol.c
> +++ devel-2.6.23-mm1/mm/memcontrol.c
> @@ -480,6 +480,7 @@ void mem_cgroup_uncharge(struct page_cgr
> page = pc->page;
> /*
> * get page->cgroup and clear it under lock.
> + * force_empty can drop page->cgroup without checking refcnt.
> */
> if (clear_page_cgroup(page, pc) == pc) {
> mem = pc->mem_cgroup;
> @@ -489,13 +490,6 @@ void mem_cgroup_uncharge(struct page_cgr
> list_del_init(&pc->lru);
> spin_unlock_irqrestore(&mem->lru_lock, flags);
> kfree(pc);
> - } else {
> - /*
> - * Note:This will be removed when force-empty patch is
> - * applied. just show warning here.
> - */
> - printk(KERN_ERR "Race in mem_cgroup_uncharge() ?");
> - dump_stack();
Unrelated change?
> }
> }
> }
> @@ -543,6 +537,70 @@ retry:
> return;
> }
>
> +/*
> + * This routine traverse page_cgroup in given list and drop them all.
> + * This routine ignores page_cgroup->ref_cnt.
> + * *And* this routine doesn't relcaim page itself, just removes page_cgroup.
Reclaim?
> + */
> +static void
> +mem_cgroup_force_empty_list(struct mem_cgroup *mem, struct list_head *list)
> +{
> + struct page_cgroup *pc;
> + struct page *page;
> + int count = SWAP_CLUSTER_MAX;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&mem->lru_lock, flags);
> +
> + while (!list_empty(list)) {
> + pc = list_entry(list->prev, struct page_cgroup, lru);
> + page = pc->page;
> + /* Avoid race with charge */
> + atomic_set(&pc->ref_cnt, 0);
Don't you need {lock,unlock}_page_cgroup(page) here to avoid a true race
with mem_cgroup_charge() right after you set pc->ref_cnt to 0 here?
> + if (clear_page_cgroup(page, pc) == pc) {
> + css_put(&mem->css);
> + res_counter_uncharge(&mem->res, PAGE_SIZE);
> + list_del_init(&pc->lru);
> + kfree(pc);
> + } else
> + count = 1; /* being uncharged ? ...do relax */
> +
> + if (--count == 0) {
> + spin_unlock_irqrestore(&mem->lru_lock, flags);
> + cond_resched();
> + spin_lock_irqsave(&mem->lru_lock, flags);
Instead of this hack, it's probably easier to just goto a label placed
before spin_lock_irqsave() at the top of the function.
> + count = SWAP_CLUSTER_MAX;
> + }
> + }
> + spin_unlock_irqrestore(&mem->lru_lock, flags);
> +}
> +
> +/*
> + * make mem_cgroup's charge to be 0 if there is no binded task.
> + * This enables deleting this mem_cgroup.
> + */
> +
> +int mem_cgroup_force_empty(struct mem_cgroup *mem)
> +{
> + int ret = -EBUSY;
> + css_get(&mem->css);
> + while (!list_empty(&mem->active_list) ||
> + !list_empty(&mem->inactive_list)) {
> + if (atomic_read(&mem->css.cgroup->count) > 0)
> + goto out;
> + /* drop all page_cgroup in active_list */
> + mem_cgroup_force_empty_list(mem, &mem->active_list);
> + /* drop all page_cgroup in inactive_list */
> + mem_cgroup_force_empty_list(mem, &mem->inactive_list);
> + }
This implementation as a while loop looks very suspect since
mem_cgroup_force_empty_list() uses while (!list_empty(list)) as well.
Perhaps it's just easier here as
if (list_empty(&mem->active_list) && list_empty(&mem->inactive_list))
return 0;
> + ret = 0;
> +out:
> + css_put(&mem->css);
> + return ret;
> +}
> +
> +
> +
> int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> {
> *tmp = memparse(buf, &buf);
> @@ -628,6 +686,31 @@ static ssize_t mem_control_type_read(str
> ppos, buf, s - buf);
> }
>
> +
> +static ssize_t mem_force_empty_write(struct cgroup *cont,
> + struct cftype *cft, struct file *file,
> + const char __user *userbuf,
> + size_t nbytes, loff_t *ppos)
> +{
> + struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> + int ret;
> + ret = mem_cgroup_force_empty(mem);
> + if (!ret)
> + ret = nbytes;
> + return ret;
> +}
> +
> +static ssize_t mem_force_empty_read(struct cgroup *cont,
> + struct cftype *cft,
> + struct file *file, char __user *userbuf,
> + size_t nbytes, loff_t *ppos)
> +{
> + static const char buf[2] = "0";
> + return simple_read_from_buffer((void __user *)userbuf, nbytes,
> + ppos, buf, strlen(buf));
Reading memory.force_empty is pretty useless, so why allow it to be read
at all?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-10-17 4:17 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 10:19 [PATCH] memory cgroup enhancements [0/5] intro KAMEZAWA Hiroyuki
2007-10-16 10:23 ` [PATCH] memory cgroup enhancements [1/5] force_empty for memory cgroup KAMEZAWA Hiroyuki
2007-10-17 4:17 ` David Rientjes [this message]
2007-10-17 5:05 ` Balbir Singh
2007-10-17 5:26 ` KAMEZAWA Hiroyuki
2007-10-17 5:16 ` KAMEZAWA Hiroyuki
2007-10-17 5:38 ` David Rientjes
2007-10-17 5:50 ` KAMEZAWA Hiroyuki
2007-10-17 7:09 ` about page migration on UMA Jacky(GuangXiang Lee)
2007-10-17 6:44 ` KAMEZAWA Hiroyuki
2007-10-19 1:26 ` Christoph Lameter
2007-11-09 19:31 ` Jared Hulbert
2007-11-09 19:36 ` Christoph Lameter
2007-11-09 19:54 ` Jared Hulbert
2007-11-09 19:58 ` Christoph Lameter
2007-11-09 21:30 ` Dave Hansen
2007-11-12 0:50 ` KAMEZAWA Hiroyuki
2007-11-14 4:31 ` Jacky(GuangXiang Lee)
2007-11-14 17:05 ` Jared Hulbert
2007-10-16 10:25 ` [PATCH] memory cgroup enhancements [2/5] remember charge as cache KAMEZAWA Hiroyuki
2007-10-16 10:26 ` [PATCH] memory cgroup enhancements [3/5] record pc is on active list KAMEZAWA Hiroyuki
2007-10-17 4:17 ` David Rientjes
2007-10-17 5:16 ` KAMEZAWA Hiroyuki
2007-10-16 10:27 ` [PATCH] memory cgroup enhancements [4/5] memory cgroup statistics KAMEZAWA Hiroyuki
2007-10-16 10:28 ` [PATCH] memory cgroup enhancements [5/5] show statistics by memory.stat file per cgroup KAMEZAWA Hiroyuki
2007-10-16 18:20 ` [PATCH] memory cgroup enhancements [0/5] intro Balbir Singh
2007-10-16 18:28 ` Andrew Morton
2007-10-16 18:40 ` Balbir Singh
2007-10-17 5:19 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.0.9999.0710162113300.13648@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.osdl.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox