From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>
Subject: [RFC][PATCH 2/14] memcg: rewrite force_empty
Date: Fri, 22 Aug 2008 20:31:14 +0900 [thread overview]
Message-ID: <20080822203114.bf6f08e4.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20080822202720.b7977aab.kamezawa.hiroyu@jp.fujitsu.com>
Current force_empty of memory resource controller just removes page_cgroup.
This maans the page is not accounted at all and create an in-use page which
has no page_cgroup.
This patch tries to move account to "root" cgroup. By this patch, force_empty
doesn't leak an account but move account to "root" cgroup. Maybe someone can
think of other enhancements as
1. move account to its parent.
2. move account to default-trash-can-cgroup somewhere.
3. move account to a cgroup specified by an admin.
I think a routine this patch adds is an enough generic and can be the base
patch for supporting above behavior (if someone wants.). But, for now, just
moves account to root group.
While moving mem_cgroup, lock_page(page) is held. This helps us for avoiding
race condition with accessing page_cgroup->mem_cgroup.
While under lock_page(), page_cgroup->mem_cgroup points to right cgroup.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
Documentation/controllers/memory.txt | 7 +-
mm/memcontrol.c | 85 ++++++++++++++++++++++++++---------
2 files changed, 69 insertions(+), 23 deletions(-)
Index: mmtom-2.6.27-rc3+/mm/memcontrol.c
===================================================================
--- mmtom-2.6.27-rc3+.orig/mm/memcontrol.c
+++ mmtom-2.6.27-rc3+/mm/memcontrol.c
@@ -368,6 +368,7 @@ int task_in_mem_cgroup(struct task_struc
void mem_cgroup_move_lists(struct page *page, enum lru_list lru)
{
struct page_cgroup *pc;
+ struct mem_cgroup *mem;
struct mem_cgroup_per_zone *mz;
unsigned long flags;
@@ -386,9 +387,14 @@ void mem_cgroup_move_lists(struct page *
pc = page_get_page_cgroup(page);
if (pc) {
+ mem = pc->mem_cgroup;
mz = page_cgroup_zoneinfo(pc);
spin_lock_irqsave(&mz->lru_lock, flags);
- __mem_cgroup_move_lists(pc, lru);
+ /*
+ * check against the race with force_empty.
+ */
+ if (likely(mem == pc->mem_cgroup))
+ __mem_cgroup_move_lists(pc, lru);
spin_unlock_irqrestore(&mz->lru_lock, flags);
}
unlock_page_cgroup(page);
@@ -830,19 +836,52 @@ int mem_cgroup_resize_limit(struct mem_c
return ret;
}
+int mem_cgroup_move_account(struct page *page, struct page_cgroup *pc,
+ struct mem_cgroup *from, struct mem_cgroup *to)
+{
+ struct mem_cgroup_per_zone *from_mz, *to_mz;
+ int nid, zid;
+ int ret = 1;
+
+ VM_BUG_ON(to->no_limit == 0);
+ VM_BUG_ON(!irqs_disabled());
+ VM_BUG_ON(!PageLocked(page));
+
+ nid = page_to_nid(page);
+ zid = page_zonenum(page);
+ from_mz = mem_cgroup_zoneinfo(from, nid, zid);
+ to_mz = mem_cgroup_zoneinfo(to, nid, zid);
+
+ if (res_counter_charge(&to->res, PAGE_SIZE)) {
+ /* Now, we assume no_limit...no failure here. */
+ return ret;
+ }
+
+ if (spin_trylock(&to_mz->lru_lock)) {
+ __mem_cgroup_remove_list(from_mz, pc);
+ css_put(&from->css);
+ res_counter_uncharge(&from->res, PAGE_SIZE);
+ pc->mem_cgroup = to;
+ css_get(&to->css);
+ __mem_cgroup_add_list(to_mz, pc);
+ ret = 0;
+ spin_unlock(&to_mz->lru_lock);
+ } else {
+ res_counter_uncharge(&to->res, PAGE_SIZE);
+ }
+
+ return ret;
+}
/*
- * This routine traverse page_cgroup in given list and drop them all.
- * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
+ * This routine moves all account to root cgroup.
*/
-#define FORCE_UNCHARGE_BATCH (128)
static void mem_cgroup_force_empty_list(struct mem_cgroup *mem,
struct mem_cgroup_per_zone *mz,
enum lru_list lru)
{
struct page_cgroup *pc;
struct page *page;
- int count = FORCE_UNCHARGE_BATCH;
unsigned long flags;
struct list_head *list;
@@ -853,22 +892,28 @@ static void mem_cgroup_force_empty_list(
pc = list_entry(list->prev, struct page_cgroup, lru);
page = pc->page;
get_page(page);
- spin_unlock_irqrestore(&mz->lru_lock, flags);
- /*
- * Check if this page is on LRU. !LRU page can be found
- * if it's under page migration.
- */
- if (PageLRU(page)) {
- __mem_cgroup_uncharge_common(page,
- MEM_CGROUP_CHARGE_TYPE_FORCE);
+ if (!trylock_page(page)) {
+ list_move(&pc->lru, list);
+ put_page(page):
+ spin_unlock_irqrestore(&mz->lru_lock, flags);
+ yield();
+ spin_lock_irqsave(&mz->lru_lock, flags);
+ continue;
+ }
+ if (mem_cgroup_move_account(page, pc, mem, &init_mem_cgroup)) {
+ /* some confliction */
+ list_move(&pc->lru, list);
+ unlock_page(page);
put_page(page);
- if (--count <= 0) {
- count = FORCE_UNCHARGE_BATCH;
- cond_resched();
- }
- } else
- cond_resched();
- spin_lock_irqsave(&mz->lru_lock, flags);
+ spin_unlock_irqrestore(&mz->lru_lock, flags);
+ yield();
+ spin_lock_irqsave(&mz->lru_lock, flags);
+ } else {
+ unlock_page(page);
+ put_page(page);
+ }
+ if (atomic_read(&mem->css.cgroup->count) > 0)
+ break;
}
spin_unlock_irqrestore(&mz->lru_lock, flags);
}
Index: mmtom-2.6.27-rc3+/Documentation/controllers/memory.txt
===================================================================
--- mmtom-2.6.27-rc3+.orig/Documentation/controllers/memory.txt
+++ mmtom-2.6.27-rc3+/Documentation/controllers/memory.txt
@@ -207,7 +207,8 @@ The memory.force_empty gives an interfac
# echo 1 > memory.force_empty
-will drop all charges in cgroup. Currently, this is maintained for test.
+will drop all charges in cgroup and move to default cgroup.
+Currently, this is maintained for test.
4. Testing
@@ -238,8 +239,8 @@ reclaimed.
A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. Such charges are automatically dropped at
-rmdir() if there are no tasks.
+tasks have migrated away from it. Such charges are automatically moved to
+root cgroup at rmidr() if there are no tasks.
5. TODO
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-08-22 11:31 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-22 11:27 [RFC][PATCH 0/14] Mem+Swap Controller v2 KAMEZAWA Hiroyuki
2008-08-22 11:30 ` [RFC][PATCH 1/14] memcg: unlimted root cgroup KAMEZAWA Hiroyuki
2008-08-22 22:51 ` Balbir Singh
2008-08-23 0:38 ` kamezawa.hiroyu
2008-08-25 3:19 ` KAMEZAWA Hiroyuki
2008-08-22 11:31 ` KAMEZAWA Hiroyuki [this message]
2008-08-25 3:21 ` [RFC][PATCH 2/14] memcg: rewrite force_empty KAMEZAWA Hiroyuki
2008-08-29 11:45 ` Daisuke Nishimura
2008-08-30 7:30 ` KAMEZAWA Hiroyuki
2008-08-22 11:32 ` [RFC][PATCH 3/14] memcg: atomic_flags KAMEZAWA Hiroyuki
2008-08-26 4:55 ` Balbir Singh
2008-08-26 23:50 ` KAMEZAWA Hiroyuki
2008-08-27 1:58 ` KAMEZAWA Hiroyuki
2008-08-26 8:46 ` kamezawa.hiroyu
2008-08-26 8:49 ` Balbir Singh
2008-08-26 23:41 ` KAMEZAWA Hiroyuki
2008-08-22 11:33 ` [RFC][PATCH 4/14] delay page_cgroup freeing KAMEZAWA Hiroyuki
2008-08-26 11:46 ` Balbir Singh
2008-08-26 23:55 ` KAMEZAWA Hiroyuki
2008-08-27 1:17 ` Balbir Singh
2008-08-27 1:39 ` KAMEZAWA Hiroyuki
2008-08-27 2:25 ` Balbir Singh
2008-08-27 2:46 ` KAMEZAWA Hiroyuki
2008-08-22 11:34 ` [RFC][PATCH 5/14] memcg: free page_cgroup by RCU KAMEZAWA Hiroyuki
2008-08-28 10:06 ` Balbir Singh
2008-08-28 10:44 ` KAMEZAWA Hiroyuki
2008-09-01 6:51 ` YAMAMOTO Takashi
2008-09-01 7:01 ` KAMEZAWA Hiroyuki
2008-08-22 11:35 ` [RFC][PATCH 6/14] memcg: lockless page cgroup KAMEZAWA Hiroyuki
2008-09-09 5:40 ` Daisuke Nishimura
2008-09-09 7:56 ` KAMEZAWA Hiroyuki
2008-09-09 8:11 ` Daisuke Nishimura
2008-09-09 11:11 ` KAMEZAWA Hiroyuki
2008-09-09 11:48 ` Balbir Singh
2008-09-09 14:24 ` Balbir Singh
2008-09-09 14:04 ` Balbir Singh
2008-08-22 11:36 ` [RFC][PATCH 7/14] memcg: add prefetch to spinlock KAMEZAWA Hiroyuki
2008-08-28 11:00 ` Balbir Singh
2008-08-22 11:37 ` [RFC][PATCH 8/14] memcg: make mapping null before uncharge KAMEZAWA Hiroyuki
2008-08-22 11:38 ` [RFC][PATCH 9/14] memcg: add page_cgroup.h file KAMEZAWA Hiroyuki
2008-08-22 11:39 ` [RFC][PATCH 10/14] memcg: replace res_counter KAMEZAWA Hiroyuki
2008-08-27 0:44 ` Daisuke Nishimura
2008-08-27 1:26 ` KAMEZAWA Hiroyuki
2008-08-22 11:40 ` [RFC][PATCH 11/14] memcg: mem_cgroup private ID KAMEZAWA Hiroyuki
2008-08-22 11:41 ` [RFC][PATCH 12/14] memcg: mem+swap controller Kconfig KAMEZAWA Hiroyuki
2008-08-22 11:41 ` [RFC][PATCH 13/14] memcg: mem+swap counter KAMEZAWA Hiroyuki
2008-08-28 8:51 ` Daisuke Nishimura
2008-08-28 9:32 ` KAMEZAWA Hiroyuki
2008-08-22 11:44 ` [RFC][PATCH 14/14]memcg: mem+swap accounting KAMEZAWA Hiroyuki
2008-09-01 7:15 ` Daisuke Nishimura
2008-09-01 7:58 ` KAMEZAWA Hiroyuki
2008-09-01 8:53 ` Daisuke Nishimura
2008-09-01 9:53 ` KAMEZAWA Hiroyuki
2008-09-01 10:21 ` Daisuke Nishimura
2008-09-02 2:21 ` Daisuke Nishimura
2008-09-02 11:09 ` Daisuke Nishimura
2008-09-02 11:40 ` KAMEZAWA Hiroyuki
2008-09-03 6:23 ` Daisuke Nishimura
2008-09-03 7:05 ` KAMEZAWA Hiroyuki
2008-08-22 13:20 ` [RFC][PATCH 0/14] Mem+Swap Controller v2 Balbir Singh
2008-08-22 15:34 ` kamezawa.hiroyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080822203114.bf6f08e4.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox