linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Experimental] [PATCH 0/5] page_cgroup->flags diet.
@ 2011-12-15  6:00 KAMEZAWA Hiroyuki
  2011-12-15  6:05 ` [RFC] [PATCH 1/5] memcg: simplify account moving check KAMEZAWA Hiroyuki
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:00 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins, Ying Han,
	nishimura

For reducing size of page_cgroup, we need to reduce flags.

This patch is a trial to remove flags based on

linux-next + 
memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch +
4 patches for 'simplify LRU handling' I posted.

So, I don't ask anyone to test this but want to hear another idea
or comments on implemenation.

After this patch series, page_cgroup flags are 

enum {
        /* flags for mem_cgroup */
        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
        PCG_USED, /* this object is in use. */
        PCG_MIGRATION, /* under page migration */
        __NR_PCG_FLAGS,
};

3bits. I thought I could remove PCG_MIGRATION ....but failed.

BTW, because of kmalloc()'s alignment, low 3bits of pc->mem_cgroup must be 0.
So, we can move these flags to low bits in pc->mem_cgroup...I guess.

(But in another thoughts, we need to track blkio per page finally. So, I'm
 not sure whether we can remove page_cgroup at the end.)

Anyway, I dump a current trial series. Please comment when you have time.
I'm not in hurry and will not be able to make a quick response.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC] [PATCH 1/5] memcg: simplify account moving check
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
@ 2011-12-15  6:05 ` KAMEZAWA Hiroyuki
  2011-12-15 10:04   ` Johannes Weiner
  2011-12-15  6:06 ` [RFC][PATCH 2/5] memcg: safer page stat updating KAMEZAWA Hiroyuki
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 2/5] memcg: safer page stat updating
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
  2011-12-15  6:05 ` [RFC] [PATCH 1/5] memcg: simplify account moving check KAMEZAWA Hiroyuki
@ 2011-12-15  6:06 ` KAMEZAWA Hiroyuki
  2011-12-15  6:07 ` [RFC][PATCH 3/5] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

better (lockless) idea is welcomed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 3/5] memcg: remove PCG_FILE_MAPPED
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
  2011-12-15  6:05 ` [RFC] [PATCH 1/5] memcg: simplify account moving check KAMEZAWA Hiroyuki
  2011-12-15  6:06 ` [RFC][PATCH 2/5] memcg: safer page stat updating KAMEZAWA Hiroyuki
@ 2011-12-15  6:07 ` KAMEZAWA Hiroyuki
  2011-12-15  6:08 ` [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2011-12-15  6:07 ` [RFC][PATCH 3/5] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
@ 2011-12-15  6:08 ` KAMEZAWA Hiroyuki
  2011-12-15 10:24   ` Johannes Weiner
  2011-12-15  6:10 ` [RFC] [PATCH 5/5] memcg: remove PCG_MOVE_LOCK KAMEZAWA Hiroyuki
  2011-12-15  8:44 ` [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
  5 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC] [PATCH 5/5]  memcg: remove PCG_MOVE_LOCK
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2011-12-15  6:08 ` [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit KAMEZAWA Hiroyuki
@ 2011-12-15  6:10 ` KAMEZAWA Hiroyuki
  2011-12-15  8:44 ` [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
  5 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  6:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Experimental] [PATCH 0/5] page_cgroup->flags diet.
  2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2011-12-15  6:10 ` [RFC] [PATCH 5/5] memcg: remove PCG_MOVE_LOCK KAMEZAWA Hiroyuki
@ 2011-12-15  8:44 ` KAMEZAWA Hiroyuki
  5 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15  8:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, hannes, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura


This untested patch is for reducing size of page_cgroup to be 8 bytes.
After enough tests, we'll be ready to integrate page_cgroup as a member of
struct page (with CONFIG?)
I'll start tests when I can..

BTW, I don't consider how to track blkio owner for supporting buffered I/O
in blkio cgroup. But I wonder it's enough to add an interface to tie memcg
and blkio cgroup even if they are not bind-mounted...
Then, blkio_id can be gotten by page-> memcg -> blkio_id.
Another idea is using page->private (or some) for recording who is the writer
This info can be propageted to buffer_head or bio.

Worst idea is adding a new field to page_cgroup.

==

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index f9441ca..48be740 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -23,8 +23,7 @@ enum {
  * then the page cgroup for pfn always exists.
  */
 struct page_cgroup {
-	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
+	unsigned long _flags; /* This flag only uses lower 3bits */
 };
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
@@ -46,19 +45,19 @@ struct page *lookup_cgroup_page(struct page_cgroup *pc);
 
 #define TESTPCGFLAG(uname, lname)			\
 static inline int PageCgroup##uname(struct page_cgroup *pc)	\
-	{ return test_bit(PCG_##lname, &pc->flags); }
+	{ return test_bit(PCG_##lname, &pc->_flags); }
 
 #define SETPCGFLAG(uname, lname)			\
 static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
-	{ set_bit(PCG_##lname, &pc->flags);  }
+	{ set_bit(PCG_##lname, &pc->_flags);  }
 
 #define CLEARPCGFLAG(uname, lname)			\
 static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
-	{ clear_bit(PCG_##lname, &pc->flags);  }
+	{ clear_bit(PCG_##lname, &pc->_flags);  }
 
 #define TESTCLEARPCGFLAG(uname, lname)			\
 static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
-	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
+	{ return test_and_clear_bit(PCG_##lname, &pc->_flags);  }
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
@@ -68,18 +67,33 @@ SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)
 
+#define PCG_FLAG_MASK	((1 << (__NR_PCG_FLAGS)) - 1)
+
 static inline void lock_page_cgroup(struct page_cgroup *pc)
 {
 	/*
 	 * Don't take this lock in IRQ context.
 	 * This lock is for pc->mem_cgroup, USED, CACHE, MIGRATION
 	 */
-	bit_spin_lock(PCG_LOCK, &pc->flags);
+	bit_spin_lock(PCG_LOCK, &pc->_flags);
 }
 
 static inline void unlock_page_cgroup(struct page_cgroup *pc)
 {
-	bit_spin_unlock(PCG_LOCK, &pc->flags);
+	bit_spin_unlock(PCG_LOCK, &pc->_flags);
+}
+
+static inline struct mem_cgroup *pc_to_memcg(struct page_cgroup *pc)
+{
+	return (struct mem_cgroup *)
+		((unsigned long)pc->_flags & ~PCG_FLAG_MASK);
+}
+
+static inline void
+pc_set_memcg(struct page_cgroup *pc, struct mem_cgroup *memcg)
+{
+	unsigned long val = pc->_flags & PCG_FLAG_MASK;
+	pc->_flags = (unsigned long)memcg | val;
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 66e03ad..8750e5a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1103,7 +1103,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		return &zone->lruvec;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_memcg(pc);
 	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* compound_order() is stabilized through lru_lock */
@@ -1131,7 +1131,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_memcg(pc);
 	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* huge page split is done under lru_lock. so, we have no races. */
@@ -1268,7 +1268,7 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 		return NULL;
 	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
 	smp_rmb();
-	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+	mz = page_cgroup_zoneinfo(pc_to_memcg(pc), page);
 	return &mz->reclaim_stat;
 }
 
@@ -1897,7 +1897,7 @@ bool __mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flag
 	bool need_unlock = false;
 
 	rcu_read_lock();
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_memcg(pc);
 	if (!memcg  || !PageCgroupUsed(pc))
 		goto out;
 	if (unlikely(mem_cgroup_stealed(memcg)) || PageTransHuge(page)) {
@@ -1926,7 +1926,7 @@ void __mem_cgroup_update_page_stat(struct page *page,
 				 enum mem_cgroup_page_stat_item idx, int val)
 {
 	struct page_cgroup *pc = lookup_page_cgroup(page);
-	struct mem_cgroup *memcg = pc->mem_cgroup;
+	struct mem_cgroup *memcg = pc_to_memcg(pc);
 
 	if (!memcg || !PageCgroupUsed(pc))
 		return;
@@ -2400,7 +2400,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_memcg(pc);
 		if (memcg && !css_tryget(&memcg->css))
 			memcg = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2434,7 +2434,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 	 * we don't need page_cgroup_lock about tail pages, becase they are not
 	 * accessed by any other context at this point.
 	 */
-	pc->mem_cgroup = memcg;
+	pc_set_memcg(pc, memcg);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
 	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -2469,7 +2469,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2481,23 +2480,26 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
 	struct page_cgroup *pc;
 	struct mem_cgroup_per_zone *mz;
+	struct mem_cgroup *head_memcg;
 	enum lru_list lru;
 	int i;
 
 	if (mem_cgroup_disabled())
 		return;
+	head_memcg = pc_to_memcg(head_pc);
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc->mem_cgroup = head_pc->mem_cgroup;
+		pc_set_memcg(pc, head_memcg);
 		smp_wmb();/* see __commit_charge() */
-		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+		/* this page is never be under page migration */
+		SetPageCgroupUsed(pc);
 	}
 	/* 
 	 * Tail pages will be added to LRU.
 	 * We hold lru_lock,then,reduce counter directly.
 	 */
 	lru = page_lru(head);
-	mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
+	mz = page_cgroup_zoneinfo(head_memcg, head);
 	MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
 }
 #endif
@@ -2545,7 +2547,7 @@ static int mem_cgroup_move_account(struct page *page,
 	lock_page_cgroup(pc);
 
 	ret = -EINVAL;
-	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
+	if (!PageCgroupUsed(pc) || pc_to_memcg(pc) != from)
 		goto unlock;
 
 	mem_cgroup_move_account_wlock(page, &flags);
@@ -2563,7 +2565,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc_set_memcg(pc, to);
 	mem_cgroup_charge_statistics(to, !PageAnon(page), nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2928,7 +2930,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	lock_page_cgroup(pc);
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_memcg(pc);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -3109,7 +3111,7 @@ void mem_cgroup_reset_owner(struct page *newpage)
 
 	pc = lookup_page_cgroup(newpage);
 	VM_BUG_ON(PageCgroupUsed(pc));
-	pc->mem_cgroup = root_mem_cgroup;
+	pc_set_memcg(pc, root_mem_cgroup);
 }
 
 /**
@@ -3191,7 +3193,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_memcg(pc);
 		css_get(&memcg->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -3329,7 +3331,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	pc = lookup_page_cgroup(oldpage);
 	/* fix accounting on old pages */
 	lock_page_cgroup(pc);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_memcg(pc);
 	mem_cgroup_charge_statistics(memcg, !PageAnon(oldpage), -1);
 	ClearPageCgroupUsed(pc);
 	unlock_page_cgroup(pc);
@@ -3376,14 +3378,15 @@ void mem_cgroup_print_bad_page(struct page *page)
 	if (pc) {
 		int ret = -1;
 		char *path;
+		struct mem_cgroup *memcg = pc_to_memcg(pc);
 
 		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p",
-		       pc, pc->flags, pc->mem_cgroup);
+		       pc, pc->_flags, memcg);
 
 		path = kmalloc(PATH_MAX, GFP_KERNEL);
 		if (path) {
 			rcu_read_lock();
-			ret = cgroup_path(pc->mem_cgroup->css.cgroup,
+			ret = cgroup_path(memcg->css.cgroup,
 							path, PATH_MAX);
 			rcu_read_unlock();
 		}
@@ -5247,7 +5250,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) && pc_to_memcg(pc) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH 1/5] memcg: simplify account moving check
  2011-12-15  6:05 ` [RFC] [PATCH 1/5] memcg: simplify account moving check KAMEZAWA Hiroyuki
@ 2011-12-15 10:04   ` Johannes Weiner
  2011-12-15 10:17     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2011-12-15 10:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

On Thu, Dec 15, 2011 at 03:05:22PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 528f5f2667da17c26e40d271b24691412e1cbe81 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 15 Dec 2011 11:41:18 +0900
> Subject: [PATCH 1/5] memcg: simplify account moving check
> 
> Now, percpu variable MEM_CGROUP_ON_MOVE is used for indicating that
> a memcg is under move_account() and pc->mem_cgroup under it may be
> overwritten.
> 
> But this value is almost read only and not worth to be percpu.
> Using atomic_t instread.

I like this, but I think you can go one further.  The only place I see
where the per-cpu counter is actually read is to avoid taking the
lock, but if you make that counter an atomic anyway - why bother?

Couldn't you remove the counter completely and just take move_lock
unconditionally in the page stat updating?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] [PATCH 1/5] memcg: simplify account moving check
  2011-12-15 10:04   ` Johannes Weiner
@ 2011-12-15 10:17     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15 10:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, akpm, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

On Thu, 15 Dec 2011 11:04:43 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Thu, Dec 15, 2011 at 03:05:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > >From 528f5f2667da17c26e40d271b24691412e1cbe81 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 15 Dec 2011 11:41:18 +0900
> > Subject: [PATCH 1/5] memcg: simplify account moving check
> > 
> > Now, percpu variable MEM_CGROUP_ON_MOVE is used for indicating that
> > a memcg is under move_account() and pc->mem_cgroup under it may be
> > overwritten.
> > 
> > But this value is almost read only and not worth to be percpu.
> > Using atomic_t instread.
> 
> I like this, but I think you can go one further.  The only place I see
> where the per-cpu counter is actually read is to avoid taking the
> lock, but if you make that counter an atomic anyway - why bother?
> 
> Couldn't you remove the counter completely and just take move_lock
> unconditionally in the page stat updating?
> 

Hmm, I (and Greg Thelen) warned that 'please _never_ add atomic ops to
this path' by Peter Zilstra. That 'moving_account' condition checking
was for avoiding atomic ops in this path (for most of cases.)

We'll need to gather enough performance data after implementing per-memcg
dirty ratio accounting. So, could you wait for a while ?

Anyway, my patch '[PATCH 5/5]  memcg: remove PCG_MOVE_LOCK' may not work
enough well without the moving_account check. I'll consider more.

Thank you for review.

-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit
  2011-12-15  6:08 ` [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit KAMEZAWA Hiroyuki
@ 2011-12-15 10:24   ` Johannes Weiner
  2011-12-15 10:36     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2011-12-15 10:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

On Thu, Dec 15, 2011 at 03:08:22PM +0900, KAMEZAWA Hiroyuki wrote:
> >From 577b441ae259728d83a99baba11bf4925b4542d4 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 15 Dec 2011 12:09:03 +0900
> Subject: [PATCH 4/5] memcg: remove PCG_CACHE bit.
> 
> This bit can be replaced by PageAnon(page) check.

You need to make sure that only fully rmapped pages are
charged/uncharged, which is not the case yet.  E.g. look at the
newpage_charge() in mm/memory.c::do_anonymous_page() - if the page
table is updated by someone else, that uncharge_page() in the error
path will see !PageAnon() because page->mapping is not set yet and
uncharge it as cache.

What I think is required is to break up the charging and committing
like we do for swap cache already:

	if (!mem_cgroup_try_charge())
		goto error;
	page_add_new_anon_rmap()
	mem_cgroup_commit()

This will also allow us to even get rid of passing around the charge
type everywhere...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit
  2011-12-15 10:24   ` Johannes Weiner
@ 2011-12-15 10:36     ` KAMEZAWA Hiroyuki
  2011-12-15 12:04       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15 10:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, akpm, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

On Thu, 15 Dec 2011 11:24:42 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Thu, Dec 15, 2011 at 03:08:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > >From 577b441ae259728d83a99baba11bf4925b4542d4 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date: Thu, 15 Dec 2011 12:09:03 +0900
> > Subject: [PATCH 4/5] memcg: remove PCG_CACHE bit.
> > 
> > This bit can be replaced by PageAnon(page) check.
> 
> You need to make sure that only fully rmapped pages are
> charged/uncharged, which is not the case yet.  E.g. look at the
> newpage_charge() in mm/memory.c::do_anonymous_page() - if the page
> table is updated by someone else, that uncharge_page() in the error
> path will see !PageAnon() because page->mapping is not set yet and
> uncharge it as cache.
> 

Ah, nice catch. I missed the error path.

> What I think is required is to break up the charging and committing
> like we do for swap cache already:
> 
> 	if (!mem_cgroup_try_charge())
> 		goto error;
> 	page_add_new_anon_rmap()
> 	mem_cgroup_commit()
> 
> This will also allow us to even get rid of passing around the charge
> type everywhere...
> 

Thank you. I'll look into.

To be honest, I want to remove 'rss' and 'cache' counter ;(
This doesn't have much meanings after lru was splitted.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit
  2011-12-15 10:36     ` KAMEZAWA Hiroyuki
@ 2011-12-15 12:04       ` KAMEZAWA Hiroyuki
  2011-12-15 14:54         ` Johannes Weiner
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-15 12:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, linux-mm, akpm, Michal Hocko, Balbir Singh,
	Hugh Dickins, Ying Han, nishimura

On Thu, 15 Dec 2011 19:36:31 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Thu, 15 Dec 2011 11:24:42 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:

> > What I think is required is to break up the charging and committing
> > like we do for swap cache already:
> > 
> > 	if (!mem_cgroup_try_charge())
> > 		goto error;
> > 	page_add_new_anon_rmap()
> > 	mem_cgroup_commit()
> > 
> > This will also allow us to even get rid of passing around the charge
> > type everywhere...
> > 
> 
> Thank you. I'll look into.
> 
> To be honest, I want to remove 'rss' and 'cache' counter ;(
> This doesn't have much meanings after lru was splitted.
> 

I'll use this version for test. This patch is under far deep stacks of
unmerged patches, anyway.

==
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index e4cb1bf..86967ed 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -4,7 +4,6 @@
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
-	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
 	/* flags for mem_cgroup and file and I/O status */
@@ -63,11 +62,6 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
 	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
 
-/* Cache flag is set only once (at allocation) */
-TESTPCGFLAG(Cache, CACHE)
-CLEARPCGFLAG(Cache, CACHE)
-SETPCGFLAG(Cache, CACHE)
-
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 SETPCGFLAG(Used, USED)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fdcf454..89c76f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2368,6 +2368,8 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 				       struct page_cgroup *pc,
 				       enum charge_type ctype)
 {
+	bool file = false;
+
 	lock_page_cgroup(pc);
 	if (unlikely(PageCgroupUsed(pc))) {
 		unlock_page_cgroup(pc);
@@ -2390,18 +2392,17 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 	switch (ctype) {
 	case MEM_CGROUP_CHARGE_TYPE_CACHE:
 	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
-		SetPageCgroupCache(pc);
+		file = true;
 		SetPageCgroupUsed(pc);
 		break;
 	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
-		ClearPageCgroupCache(pc);
 		SetPageCgroupUsed(pc);
 		break;
 	default:
 		break;
 	}
 
-	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
+	mem_cgroup_charge_statistics(memcg, file, nr_pages);
 	unlock_page_cgroup(pc);
 	WARN_ON_ONCE(PageLRU(page));
 	/*
@@ -2474,6 +2475,7 @@ static int mem_cgroup_move_account(struct page *page,
 				   bool uncharge)
 {
 	unsigned long flags;
+	bool file = false;
 	int ret;
 
 	VM_BUG_ON(from == to);
@@ -2503,14 +2505,17 @@ static int mem_cgroup_move_account(struct page *page,
 		__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
 		preempt_enable();
 	}
-	mem_cgroup_charge_statistics(from, PageCgroupCache(pc), -nr_pages);
+	/* Once PageAnon is set, it will not be cleared until freed. */
+	if (!PageAnon(page))
+		file = true;
+	mem_cgroup_charge_statistics(from, file, -nr_pages);
 	if (uncharge)
 		/* This is not "cancel", but cancel_charge does all we need. */
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
-	mem_cgroup_charge_statistics(to, PageCgroupCache(pc), nr_pages);
+	mem_cgroup_charge_statistics(to, file, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
 	 * can be under rmdir(). But in current implementation, caller of
@@ -2854,6 +2859,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	struct mem_cgroup *memcg = NULL;
 	unsigned int nr_pages = 1;
 	struct page_cgroup *pc;
+	bool file = false;
 
 	if (mem_cgroup_disabled())
 		return NULL;
@@ -2880,6 +2886,10 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		goto unlock_out;
 
 	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		file = true;
+		break;
 	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
 	case MEM_CGROUP_CHARGE_TYPE_DROP:
 		/* See mem_cgroup_prepare_migration() */
@@ -2897,7 +2907,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 		break;
 	}
 
-	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), -nr_pages);
+	mem_cgroup_charge_statistics(memcg, file, -nr_pages);
 
 	ClearPageCgroupUsed(pc);
 	/*
@@ -2938,9 +2948,13 @@ void mem_cgroup_uncharge_page(struct page *page)
 
 void mem_cgroup_uncharge_cache_page(struct page *page)
 {
+	int ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
 	VM_BUG_ON(page_mapped(page));
 	VM_BUG_ON(page->mapping);
-	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
+
+	if (page_is_file_cache(page))
+		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+	__mem_cgroup_uncharge_common(page, ctype);
 }
 
 /*
@@ -3276,7 +3290,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	/* fix accounting on old pages */
 	lock_page_cgroup(pc);
 	memcg = pc->mem_cgroup;
-	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), -1);
+	mem_cgroup_charge_statistics(memcg, true, -1);
 	ClearPageCgroupUsed(pc);
 	unlock_page_cgroup(pc);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit
  2011-12-15 12:04       ` KAMEZAWA Hiroyuki
@ 2011-12-15 14:54         ` Johannes Weiner
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Weiner @ 2011-12-15 14:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, akpm, Michal Hocko, Balbir Singh, Hugh Dickins,
	Ying Han, nishimura

On Thu, Dec 15, 2011 at 09:04:06PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 15 Dec 2011 19:36:31 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Thu, 15 Dec 2011 11:24:42 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > What I think is required is to break up the charging and committing
> > > like we do for swap cache already:
> > > 
> > > 	if (!mem_cgroup_try_charge())
> > > 		goto error;
> > > 	page_add_new_anon_rmap()
> > > 	mem_cgroup_commit()
> > > 
> > > This will also allow us to even get rid of passing around the charge
> > > type everywhere...
> > > 
> > 
> > Thank you. I'll look into.
> > 
> > To be honest, I want to remove 'rss' and 'cache' counter ;(
> > This doesn't have much meanings after lru was splitted.
> > 
> 
> I'll use this version for test. This patch is under far deep stacks of
> unmerged patches, anyway.

Ok, makes sense.  I can do the PCG_CACHE removal, btw, I have half the
patches sitting around anyway, just need to fix up huge_memory.c.

> @@ -2938,9 +2948,13 @@ void mem_cgroup_uncharge_page(struct page *page)
>  
>  void mem_cgroup_uncharge_cache_page(struct page *page)
>  {
> +	int ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
>  	VM_BUG_ON(page_mapped(page));
>  	VM_BUG_ON(page->mapping);
> -	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
> +
> +	if (page_is_file_cache(page))
> +		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> +	__mem_cgroup_uncharge_common(page, ctype);

I think this is missing a negation, but it doesn't matter because the
SHMEM and CACHE charge types are treated exactly the same way.  I'll
send a patch series that removes it soon, there is more shmem related
things...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-12-15 14:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-15  6:00 [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki
2011-12-15  6:05 ` [RFC] [PATCH 1/5] memcg: simplify account moving check KAMEZAWA Hiroyuki
2011-12-15 10:04   ` Johannes Weiner
2011-12-15 10:17     ` KAMEZAWA Hiroyuki
2011-12-15  6:06 ` [RFC][PATCH 2/5] memcg: safer page stat updating KAMEZAWA Hiroyuki
2011-12-15  6:07 ` [RFC][PATCH 3/5] memcg: remove PCG_FILE_MAPPED KAMEZAWA Hiroyuki
2011-12-15  6:08 ` [RFC][PATCH 4/5] memcg: remove PCG_CACHE bit KAMEZAWA Hiroyuki
2011-12-15 10:24   ` Johannes Weiner
2011-12-15 10:36     ` KAMEZAWA Hiroyuki
2011-12-15 12:04       ` KAMEZAWA Hiroyuki
2011-12-15 14:54         ` Johannes Weiner
2011-12-15  6:10 ` [RFC] [PATCH 5/5] memcg: remove PCG_MOVE_LOCK KAMEZAWA Hiroyuki
2011-12-15  8:44 ` [Experimental] [PATCH 0/5] page_cgroup->flags diet KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox