From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"xemul@openvz.org" <xemul@openvz.org>,
"menage@google.com" <menage@google.com>,
nishimura@mxp.nes.nec.co.jp
Subject: Re: [RFC][PATCH 11/11] memcg: mem+swap controler core
Date: Mon, 27 Oct 2008 20:37:51 +0900 [thread overview]
Message-ID: <20081027203751.b3b5a607.nishimura@mxp.nes.nec.co.jp> (raw)
In-Reply-To: <20081023181611.367d9f07.kamezawa.hiroyu@jp.fujitsu.com>
On Thu, 23 Oct 2008 18:16:11 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Mem+Swap controller core.
>
> This patch implements per cgroup limit for usage of memory+swap.
> However there are SwapCache, double counting of swap-cache and
> swap-entry is avoided.
>
> Mem+Swap controller works as following.
> - memory usage is limited by memory.limit_in_bytes.
> - memory + swap usage is limited by memory.memsw_limit_in_bytes.
>
>
> This has following benefits.
> - A user can limit total resource usage of mem+swap.
>
> Without this, because memory resource controller doesn't take care of
> usage of swap, a process can exhaust all the swap (by memory leak.)
> We can avoid this case.
>
> And Swap is shared resource but it cannot be reclaimed (goes back to memory)
> until it's used. This characteristic can be trouble when the memory
> is divided into some parts by cpuset or memcg.
> Assume group A and group B.
> After some application executes, the system can be..
>
> Group A -- very large free memory space but occupy 99% of swap.
> Group B -- under memory shortage but cannot use swap...it's nearly full.
> This cannot be recovered in general.
> Ability to set appropriate swap limit for each group is required.
>
> Maybe someone wonder "why not swap but mem+swap ?"
>
> - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> to move account from memory to swap...there is no change in usage of
> mem+swap.
>
> In other words, when we want to limit the usage of swap without affecting
> global LRU, mem+swap limit is better than just limiting swap.
>
>
> Accounting target information is stored in swap_cgroup which is
> per swap entry record.
>
> Charge is done as following.
> map
> - charge page and memsw.
>
> unmap
> - uncharge page/memsw if not SwapCache.
>
> swap-out (__delete_from_swap_cache)
> - uncharge page
> - record mem_cgroup information to swap_cgroup.
>
> swap-in (do_swap_page)
> - charged as page and memsw.
> record in swap_cgroup is cleared.
> memsw accounting is decremented.
>
> swap-free (swap_free())
> - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
>
>
> After this, usual memory resource controller handles SwapCache.
> (It was lacked(ignored) feature in current memcg but must be
> handled.)
>
> There are people work under never-swap environments and consider swap as
> something bad. For such people, this mem+swap controller extension is just an
> overhead. This overhead is avoided by config or boot option.
> (see Kconfig. detail is not in this patch.)
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> include/linux/memcontrol.h | 3
> include/linux/swap.h | 17 ++
> mm/memcontrol.c | 356 +++++++++++++++++++++++++++++++++++++++++----
> mm/swap_state.c | 4
> mm/swapfile.c | 8 -
> 5 files changed, 359 insertions(+), 29 deletions(-)
>
> Index: mmotm-2.6.27+/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.27+.orig/mm/memcontrol.c
> +++ mmotm-2.6.27+/mm/memcontrol.c
> @@ -130,6 +130,10 @@ struct mem_cgroup {
> */
> struct res_counter res;
> /*
> + * the coutner to accoaunt for mem+swap usage.
> + */
> + struct res_counter memsw;
> + /*
> * Per cgroup active and inactive list, similar to the
> * per zone LRU lists.
> */
> @@ -140,6 +144,12 @@ struct mem_cgroup {
> * statistics.
> */
> struct mem_cgroup_stat stat;
> +
> + /*
> + * used for counting reference from swap_cgroup.
> + */
> + int obsolete;
> + atomic_t swapref;
> };
> static struct mem_cgroup init_mem_cgroup;
>
> @@ -148,6 +158,7 @@ enum charge_type {
> MEM_CGROUP_CHARGE_TYPE_MAPPED,
> MEM_CGROUP_CHARGE_TYPE_SHMEM, /* used by page migration of shmem */
> MEM_CGROUP_CHARGE_TYPE_FORCE, /* used by force_empty */
> + MEM_CGROUP_CHARGE_TYPE_SWAPOUT, /* used by force_empty */
comment should be modified :)
> NR_CHARGE_TYPE,
> };
>
> @@ -165,6 +176,16 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> 0, /* FORCE */
> };
>
> +
> +/* for encoding cft->private value on file */
> +#define _MEM (0)
> +#define _MEMSWAP (1)
> +#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
> +#define MEMFILE_TYPE(val) (((val) >> 16) & 0xffff)
> +#define MEMFILE_ATTR(val) ((val) & 0xffff)
> +
> +static void mem_cgroup_forget_swapref(struct mem_cgroup *mem);
> +
> /*
> * Always modified under lru lock. Then, not necessary to preempt_disable()
> */
> @@ -698,8 +719,19 @@ static int __mem_cgroup_try_charge(struc
> css_get(&mem->css);
> }
>
> + while (1) {
> + int ret;
>
> - while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
> + ret = res_counter_charge(&mem->res, PAGE_SIZE);
> + if (likely(!ret)) {
> + if (!do_swap_account)
> + break;
> + ret = res_counter_charge(&mem->memsw, PAGE_SIZE);
> + if (likely(!ret))
> + break;
> + /* mem+swap counter fails */
> + res_counter_uncharge(&mem->res, PAGE_SIZE);
> + }
> if (!(gfp_mask & __GFP_WAIT))
> goto nomem;
>
> @@ -712,8 +744,13 @@ static int __mem_cgroup_try_charge(struc
> * moved to swap cache or just unmapped from the cgroup.
> * Check the limit again to see if the reclaim reduced the
> * current usage of the cgroup before giving up
> + *
> */
> - if (res_counter_check_under_limit(&mem->res))
> + if (!do_swap_account &&
> + res_counter_check_under_limit(&mem->res))
> + continue;
> + if (do_swap_account &&
> + res_counter_check_under_limit(&mem->memsw))
> continue;
>
> if (!nr_retries--) {
> @@ -770,6 +807,8 @@ static void __mem_cgroup_commit_charge(s
> if (unlikely(PageCgroupUsed(pc))) {
> unlock_page_cgroup(pc);
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> + if (do_swap_account)
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> css_put(&mem->css);
> return;
> }
> @@ -851,6 +890,8 @@ static int mem_cgroup_move_account(struc
> if (spin_trylock(&to_mz->lru_lock)) {
> __mem_cgroup_remove_list(from_mz, pc);
> res_counter_uncharge(&from->res, PAGE_SIZE);
> + if (do_swap_account)
> + res_counter_uncharge(&from->memsw, PAGE_SIZE);
> pc->mem_cgroup = to;
> __mem_cgroup_add_list(to_mz, pc, false);
> ret = 0;
> @@ -896,8 +937,11 @@ static int mem_cgroup_move_parent(struct
> /* drop extra refcnt */
> css_put(&parent->css);
> /* uncharge if move fails */
> - if (ret)
> + if (ret) {
> res_counter_uncharge(&parent->res, PAGE_SIZE);
> + if (do_swap_account)
> + res_counter_uncharge(&parent->memsw, PAGE_SIZE);
> + }
>
> return ret;
> }
> @@ -972,7 +1016,6 @@ int mem_cgroup_cache_charge(struct page
> if (!(gfp_mask & __GFP_WAIT)) {
> struct page_cgroup *pc;
>
> -
> pc = lookup_page_cgroup(page);
> if (!pc)
> return 0;
> @@ -998,15 +1041,74 @@ int mem_cgroup_cache_charge(struct page
> int mem_cgroup_cache_charge_swapin(struct page *page, struct mm_struct *mm,
> gfp_t gfp_mask)
> {
> + struct mem_cgroup *mem;
> + swp_entry_t ent;
> + int ret;
> +
> if (mem_cgroup_subsys.disabled)
> return 0;
> - if (unlikely(!mm))
> - mm = &init_mm;
> - return mem_cgroup_charge_common(page, mm, gfp_mask,
> +
> + ent.val = page_private(page);
> + if (do_swap_account) {
> + mem = lookup_swap_cgroup(ent);
> + if (!mem || mem->obsolete)
> + goto charge_cur_mm;
> + ret = mem_cgroup_charge_common(page, NULL, gfp_mask,
> + MEM_CGROUP_CHARGE_TYPE_SHMEM, mem);
> + } else {
> +charge_cur_mm:
> + if (unlikely(!mm))
> + mm = &init_mm;
> + ret = mem_cgroup_charge_common(page, mm, gfp_mask,
> MEM_CGROUP_CHARGE_TYPE_SHMEM, NULL);
> + }
> + if (do_swap_account && !ret) {
> + /*
> + * At this point, we successfully charge both for mem and swap.
> + * fix this double counting, here.
> + */
> + mem = swap_cgroup_record(ent, NULL);
> + if (mem) {
> + /* If memcg is obsolete, memcg can be != ptr */
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + mem_cgroup_forget_swapref(mem);
> + }
> + }
> + return ret;
> +}
> +
> +/*
> + * look into swap_cgroup and charge against mem_cgroup which does swapout
> + * if we can. If not, charge against current mm.
> + */
> +
> +int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> + struct page *page, gfp_t mask, struct mem_cgroup **ptr)
> +{
> + struct mem_cgroup *mem;
> + swp_entry_t ent;
> +
> + if (mem_cgroup_subsys.disabled)
> + return 0;
>
> + if (!do_swap_account)
> + goto charge_cur_mm;
> +
> + ent.val = page_private(page);
> +
> + mem = lookup_swap_cgroup(ent);
> + if (!mem || mem->obsolete)
> + goto charge_cur_mm;
> + *ptr = mem;
> + return __mem_cgroup_try_charge(NULL, mask, ptr, true);
> +charge_cur_mm:
> + if (unlikely(!mm))
> + mm = &init_mm;
> + return __mem_cgroup_try_charge(mm, mask, ptr, true);
> }
>
hmm... this function is not called from any functions.
Should do_swap_page()->mem_cgroup_try_charge() and unuse_pte()->mem_cgroup_try_charge()
are changed to mem_cgroup_try_charge_swapin()?
> +
> +
> void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> {
> struct page_cgroup *pc;
> @@ -1017,6 +1119,23 @@ void mem_cgroup_commit_charge_swapin(str
> return;
> pc = lookup_page_cgroup(page);
> __mem_cgroup_commit_charge(ptr, pc, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> + /*
> + * Now swap is on-memory. This means this page may be
> + * counted both as mem and swap....double count.
> + * Fix it by uncharging from memsw. This SwapCache is stable
> + * because we're still under lock_page().
> + */
> + if (do_swap_account) {
> + swp_entry_t ent = {.val = page_private(page)};
> + struct mem_cgroup *memcg;
> + memcg = swap_cgroup_record(ent, NULL);
> + if (memcg) {
> + /* If memcg is obsolete, memcg can be != ptr */
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + mem_cgroup_forget_swapref(memcg);
> + }
> +
> + }
> }
>
> void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
> @@ -1026,35 +1145,50 @@ void mem_cgroup_cancel_charge_swapin(str
> if (!mem)
> return;
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> css_put(&mem->css);
> }
>
> -
> /*
> * uncharge if !page_mapped(page)
> + * returns memcg at success.
> */
> -static void
> +static struct mem_cgroup *
> __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> {
> struct page_cgroup *pc;
> struct mem_cgroup *mem;
>
> if (mem_cgroup_subsys.disabled)
> - return;
> + return NULL;
>
> + if (PageSwapCache(page))
> + return NULL;
> /*
> * Check if our page_cgroup is valid
> */
> pc = lookup_page_cgroup(page);
> if (unlikely(!pc || !PageCgroupUsed(pc)))
> - return;
> + return NULL;
>
> lock_page_cgroup(pc);
> + if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) {
> + if (PageAnon(page)) {
> + if (page_mapped(page)) {
> + unlock_page_cgroup(pc);
> + return NULL;
> + }
> + } else if (page->mapping && !page_is_file_cache(page)) {
> + /* This is on radix-tree. */
> + unlock_page_cgroup(pc);
> + return NULL;
> + }
> + }
> if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED && page_mapped(page))
> || !PageCgroupUsed(pc)) {
Isn't check for PCG_USED needed when MEM_CGROUP_CHARGE_TYPE_SWAPOUT?
> /* This happens at race in zap_pte_range() and do_swap_page()*/
> unlock_page_cgroup(pc);
> - return;
> + return NULL;
> }
> ClearPageCgroupUsed(pc);
> mem = pc->mem_cgroup;
> @@ -1063,9 +1197,11 @@ __mem_cgroup_uncharge_common(struct page
> * unlock this.
> */
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> + if (do_swap_account && ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> unlock_page_cgroup(pc);
> release_page_cgroup(pc);
> - return;
> + return mem;
> }
>
Now, anon pages are not uncharge if PageSwapCache,
I think "if (unused && ctype != MEM_CGROUP_CHARGE_TYPE_MAPPED)" at
mem_cgroup_end_migration() should be removed. Otherwise oldpage
is not uncharged if it is on swapcache, isn't it?
Thanks,
Daisuke Nishimura.
> void mem_cgroup_uncharge_page(struct page *page)
> @@ -1086,6 +1222,41 @@ void mem_cgroup_uncharge_cache_page(stru
> }
>
> /*
> + * called from __delete_from_swap_cache() and drop "page" account.
> + * memcg information is recorded to swap_cgroup of "ent"
> + */
> +void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
> +{
> + struct mem_cgroup *memcg;
> +
> + memcg = __mem_cgroup_uncharge_common(page,
> + MEM_CGROUP_CHARGE_TYPE_SWAPOUT);
> + /* record memcg information */
> + if (do_swap_account && memcg) {
> + swap_cgroup_record(ent, memcg);
> + atomic_inc(&memcg->swapref);
> + }
> +}
> +
> +/*
> + * called from swap_entry_free(). remove record in swap_cgroup and
> + * uncharge "memsw" account.
> + */
> +void mem_cgroup_uncharge_swap(swp_entry_t ent)
> +{
> + struct mem_cgroup *memcg;
> +
> + if (!do_swap_account)
> + return;
> +
> + memcg = swap_cgroup_record(ent, NULL);
> + if (memcg) {
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + mem_cgroup_forget_swapref(memcg);
> + }
> +}
> +
> +/*
> * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> * page belongs to.
> */
> @@ -1219,13 +1390,56 @@ int mem_cgroup_resize_limit(struct mem_c
> ret = -EBUSY;
> break;
> }
> - progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> + progress = try_to_free_mem_cgroup_pages(memcg,
> + GFP_KERNEL);
> if (!progress)
> retry_count--;
> }
> return ret;
> }
>
> +int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
> + unsigned long long val)
> +{
> + int retry_count = MEM_CGROUP_RECLAIM_RETRIES;
> + unsigned long flags;
> + u64 memlimit, oldusage, curusage;
> + int ret;
> +
> + if (!do_swap_account)
> + return -EINVAL;
> +
> + while (retry_count) {
> + if (signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> + /*
> + * Rather than hide all in some function, I do this in
> + * open coded mannaer. You see what this really does.
> + * We have to guarantee mem->res.limit < mem->memsw.limit.
> + */
> + spin_lock_irqsave(&memcg->res.lock, flags);
> + memlimit = memcg->res.limit;
> + if (memlimit > val) {
> + spin_unlock_irqrestore(&memcg->res.lock, flags);
> + ret = -EINVAL;
> + break;
> + }
> + ret = res_counter_set_limit(&memcg->memsw, val);
> + oldusage = memcg->memsw.usage;
> + spin_unlock_irqrestore(&memcg->res.lock, flags);
> +
> + if (!ret)
> + break;
> + try_to_free_mem_cgroup_pages(memcg, GFP_HIGHUSER_MOVABLE);
> + curusage = res_counter_read_u64(&memcg->memsw, RES_USAGE);
> + if (curusage >= oldusage)
> + retry_count--;
> + }
> + return ret;
> +}
> +
>
> /*
> * This routine traverse page_cgroup in given list and drop them all.
> @@ -1353,8 +1567,25 @@ try_to_free:
>
> static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> {
> - return res_counter_read_u64(&mem_cgroup_from_cont(cont)->res,
> - cft->private);
> + struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> + u64 val = 0;
> + int type, name;
> +
> + type = MEMFILE_TYPE(cft->private);
> + name = MEMFILE_ATTR(cft->private);
> + switch (type) {
> + case _MEM:
> + val = res_counter_read_u64(&mem->res, name);
> + break;
> + case _MEMSWAP:
> + if (do_swap_account)
> + val = res_counter_read_u64(&mem->memsw, name);
> + break;
> + default:
> + BUG();
> + break;
> + }
> + return val;
> }
> /*
> * The user of this function is...
> @@ -1364,15 +1595,22 @@ static int mem_cgroup_write(struct cgrou
> const char *buffer)
> {
> struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> + int type, name;
> unsigned long long val;
> int ret;
>
> - switch (cft->private) {
> + type = MEMFILE_TYPE(cft->private);
> + name = MEMFILE_ATTR(cft->private);
> + switch (name) {
> case RES_LIMIT:
> /* This function does all necessary parse...reuse it */
> ret = res_counter_memparse_write_strategy(buffer, &val);
> - if (!ret)
> + if (ret)
> + break;
> + if (type == _MEM)
> ret = mem_cgroup_resize_limit(memcg, val);
> + else
> + ret = mem_cgroup_resize_memsw_limit(memcg, val);
> break;
> default:
> ret = -EINVAL; /* should be BUG() ? */
> @@ -1384,14 +1622,23 @@ static int mem_cgroup_write(struct cgrou
> static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> {
> struct mem_cgroup *mem;
> + int type, name;
>
> mem = mem_cgroup_from_cont(cont);
> - switch (event) {
> + type = MEMFILE_TYPE(event);
> + name = MEMFILE_ATTR(event);
> + switch (name) {
> case RES_MAX_USAGE:
> - res_counter_reset_max(&mem->res);
> + if (type == _MEM)
> + res_counter_reset_max(&mem->res);
> + else
> + res_counter_reset_max(&mem->memsw);
> break;
> case RES_FAILCNT:
> - res_counter_reset_failcnt(&mem->res);
> + if (type == _MEM)
> + res_counter_reset_failcnt(&mem->res);
> + else
> + res_counter_reset_failcnt(&mem->memsw);
> break;
> }
> return 0;
> @@ -1445,30 +1692,33 @@ static int mem_control_stat_show(struct
> cb->fill(cb, "unevictable", unevictable * PAGE_SIZE);
>
> }
> + /* showing refs from disk-swap */
> + cb->fill(cb, "swap_on_disk", atomic_read(&mem_cont->swapref)
> + * PAGE_SIZE);
> return 0;
> }
>
> static struct cftype mem_cgroup_files[] = {
> {
> .name = "usage_in_bytes",
> - .private = RES_USAGE,
> + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
> .read_u64 = mem_cgroup_read,
> },
> {
> .name = "max_usage_in_bytes",
> - .private = RES_MAX_USAGE,
> + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE),
> .trigger = mem_cgroup_reset,
> .read_u64 = mem_cgroup_read,
> },
> {
> .name = "limit_in_bytes",
> - .private = RES_LIMIT,
> + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT),
> .write_string = mem_cgroup_write,
> .read_u64 = mem_cgroup_read,
> },
> {
> .name = "failcnt",
> - .private = RES_FAILCNT,
> + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
> .trigger = mem_cgroup_reset,
> .read_u64 = mem_cgroup_read,
> },
> @@ -1476,6 +1726,31 @@ static struct cftype mem_cgroup_files[]
> .name = "stat",
> .read_map = mem_control_stat_show,
> },
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> + {
> + .name = "memsw.usage_in_bytes",
> + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
> + .read_u64 = mem_cgroup_read,
> + },
> + {
> + .name = "memsw.max_usage_in_bytes",
> + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE),
> + .trigger = mem_cgroup_reset,
> + .read_u64 = mem_cgroup_read,
> + },
> + {
> + .name = "memsw.limit_in_bytes",
> + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT),
> + .write_string = mem_cgroup_write,
> + .read_u64 = mem_cgroup_read,
> + },
> + {
> + .name = "failcnt",
> + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT),
> + .trigger = mem_cgroup_reset,
> + .read_u64 = mem_cgroup_read,
> + },
> +#endif
> };
>
> static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> @@ -1529,14 +1804,43 @@ static struct mem_cgroup *mem_cgroup_all
> return mem;
> }
>
> +/*
> + * At destroying mem_cgroup, references from swap_cgroup can remain.
> + * (scanning all at force_empty is too costly...)
> + *
> + * Instead of clearing all references at force_empty, we remember
> + * the number of reference from swap_cgroup and free mem_cgroup when
> + * it goes down to 0.
> + *
> + * When mem_cgroup is destroyed, mem->obsolete will be set to 0 and
> + * entry which points to this memcg will be ignore at swapin.
> + *
> + * Removal of cgroup itself succeeds regardless of refs from swap.
> + */
> +
> static void mem_cgroup_free(struct mem_cgroup *mem)
> {
> + if (do_swap_account) {
> + if (atomic_read(&mem->swapref) > 0)
> + return;
> + }
> if (sizeof(*mem) < PAGE_SIZE)
> kfree(mem);
> else
> vfree(mem);
> }
>
> +static void mem_cgroup_forget_swapref(struct mem_cgroup *mem)
> +{
> + if (!do_swap_account)
> + return;
> + if (atomic_dec_and_test(&mem->swapref)) {
> + if (!mem->obsolete)
> + return;
> + mem_cgroup_free(mem);
> + }
> +}
> +
> static void mem_cgroup_init_pcp(int cpu)
> {
> page_cgroup_start_cache_cpu(cpu);
> @@ -1589,6 +1893,7 @@ mem_cgroup_create(struct cgroup_subsys *
> }
>
> res_counter_init(&mem->res);
> + res_counter_init(&mem->memsw);
>
> for_each_node_state(node, N_POSSIBLE)
> if (alloc_mem_cgroup_per_zone_info(mem, node))
> @@ -1607,6 +1912,7 @@ static void mem_cgroup_pre_destroy(struc
> struct cgroup *cont)
> {
> struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> + mem->obsolete = 1;
> mem_cgroup_force_empty(mem);
> }
>
> Index: mmotm-2.6.27+/mm/swapfile.c
> ===================================================================
> --- mmotm-2.6.27+.orig/mm/swapfile.c
> +++ mmotm-2.6.27+/mm/swapfile.c
> @@ -271,8 +271,9 @@ out:
> return NULL;
> }
>
> -static int swap_entry_free(struct swap_info_struct *p, unsigned long offset)
> +static int swap_entry_free(struct swap_info_struct *p, swp_entry_t ent)
> {
> + unsigned long offset = swp_offset(ent);
> int count = p->swap_map[offset];
>
> if (count < SWAP_MAP_MAX) {
> @@ -287,6 +288,7 @@ static int swap_entry_free(struct swap_i
> swap_list.next = p - swap_info;
> nr_swap_pages++;
> p->inuse_pages--;
> + mem_cgroup_uncharge_swap(ent);
> }
> }
> return count;
> @@ -302,7 +304,7 @@ void swap_free(swp_entry_t entry)
>
> p = swap_info_get(entry);
> if (p) {
> - swap_entry_free(p, swp_offset(entry));
> + swap_entry_free(p, entry);
> spin_unlock(&swap_lock);
> }
> }
> @@ -421,7 +423,7 @@ void free_swap_and_cache(swp_entry_t ent
>
> p = swap_info_get(entry);
> if (p) {
> - if (swap_entry_free(p, swp_offset(entry)) == 1) {
> + if (swap_entry_free(p, entry) == 1) {
> page = find_get_page(&swapper_space, entry.val);
> if (page && !trylock_page(page)) {
> page_cache_release(page);
> Index: mmotm-2.6.27+/mm/swap_state.c
> ===================================================================
> --- mmotm-2.6.27+.orig/mm/swap_state.c
> +++ mmotm-2.6.27+/mm/swap_state.c
> @@ -17,6 +17,7 @@
> #include <linux/backing-dev.h>
> #include <linux/pagevec.h>
> #include <linux/migrate.h>
> +#include <linux/page_cgroup.h>
>
> #include <asm/pgtable.h>
>
> @@ -108,6 +109,8 @@ int add_to_swap_cache(struct page *page,
> */
> void __delete_from_swap_cache(struct page *page)
> {
> + swp_entry_t ent = {.val = page_private(page)};
> +
> BUG_ON(!PageLocked(page));
> BUG_ON(!PageSwapCache(page));
> BUG_ON(PageWriteback(page));
> @@ -119,6 +122,7 @@ void __delete_from_swap_cache(struct pag
> total_swapcache_pages--;
> __dec_zone_page_state(page, NR_FILE_PAGES);
> INC_CACHE_INFO(del_total);
> + mem_cgroup_uncharge_swapcache(page, ent);
> }
>
> /**
> Index: mmotm-2.6.27+/include/linux/swap.h
> ===================================================================
> --- mmotm-2.6.27+.orig/include/linux/swap.h
> +++ mmotm-2.6.27+/include/linux/swap.h
> @@ -332,6 +332,23 @@ static inline void disable_swap_token(vo
> put_swap_token(swap_token_mm);
> }
>
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> +/* This function requires swp_entry_t definition. see memcontrol.c */
> +extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent);
> +#else
> +static inline void
> +mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
> +{
> +}
> +#endif
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> +extern void mem_cgroup_uncharge_swap(swp_entry_t ent);
> +#else
> +static inline void mem_cgroup_uncharge_swap(swp_entry_t ent)
> +{
> +}
> +#endif
> +
> #else /* CONFIG_SWAP */
>
> #define total_swap_pages 0
> Index: mmotm-2.6.27+/include/linux/memcontrol.h
> ===================================================================
> --- mmotm-2.6.27+.orig/include/linux/memcontrol.h
> +++ mmotm-2.6.27+/include/linux/memcontrol.h
> @@ -32,6 +32,8 @@ extern int mem_cgroup_newpage_charge(str
> /* for swap handling */
> extern int mem_cgroup_try_charge(struct mm_struct *mm,
> gfp_t gfp_mask, struct mem_cgroup **ptr);
> +extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> + struct page *page, gfp_t mask, struct mem_cgroup **ptr);
> extern void mem_cgroup_commit_charge_swapin(struct page *page,
> struct mem_cgroup *ptr);
> extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr);
> @@ -83,7 +85,6 @@ extern long mem_cgroup_calc_reclaim(stru
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> extern int do_swap_account;
> #endif
> -
> #else /* CONFIG_CGROUP_MEM_RES_CTLR */
> struct mem_cgroup;
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-10-27 11:37 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-23 8:58 [RFC][PATCH 0/11] memcg updates / clean up, lazy lru ,mem+swap controller KAMEZAWA Hiroyuki
2008-10-23 8:59 ` [RFC][PATCH 1/11] memcg: fix kconfig menu comment KAMEZAWA Hiroyuki
2008-10-24 4:24 ` Randy Dunlap
2008-10-24 4:28 ` KAMEZAWA Hiroyuki
2008-10-23 9:00 ` [RFC][PATCH 2/11] cgroup: make cgroup kconfig as submenu KAMEZAWA Hiroyuki
2008-10-23 21:20 ` Paul Menage
2008-10-24 1:16 ` KAMEZAWA Hiroyuki
2008-10-23 9:02 ` [RFC][PATCH 3/11] memcg: charge commit cancel protocol KAMEZAWA Hiroyuki
2008-10-23 9:03 ` [RFC][PATCH 4/11] memcg: better page migration handling KAMEZAWA Hiroyuki
2008-10-23 9:05 ` [RFC][PATCH 5/11] memcg: account move and change force_empty KAMEZAWA Hiroyuki
2008-10-24 4:28 ` Randy Dunlap
2008-10-24 4:37 ` KAMEZAWA Hiroyuki
2008-10-23 9:06 ` [RFC][PATCH 6/11] memcg: lary LRU removal KAMEZAWA Hiroyuki
2008-10-23 9:08 ` [RFC][PATCH 7/11] memcg: lazy lru add KAMEZAWA Hiroyuki
2008-10-23 9:10 ` [RFC][PATCH 8/11] memcg: shmem account helper KAMEZAWA Hiroyuki
2008-10-23 9:12 ` [RFC][PATCH 9/11] memcg : mem+swap controlelr kconfig KAMEZAWA Hiroyuki
2008-10-24 4:32 ` Randy Dunlap
2008-10-24 4:37 ` KAMEZAWA Hiroyuki
2008-10-27 6:39 ` Daisuke Nishimura
2008-10-27 7:17 ` Li Zefan
2008-10-27 7:24 ` Daisuke Nishimura
2008-10-28 0:08 ` KAMEZAWA Hiroyuki
2008-10-23 9:13 ` [RFC][PATCH 10/11] memcg: swap cgroup KAMEZAWA Hiroyuki
2008-10-27 7:02 ` Daisuke Nishimura
2008-10-28 0:09 ` KAMEZAWA Hiroyuki
2008-10-23 9:16 ` [RFC][PATCH 11/11] memcg: mem+swap controler core KAMEZAWA Hiroyuki
2008-10-27 11:37 ` Daisuke Nishimura [this message]
2008-10-28 0:16 ` KAMEZAWA Hiroyuki
2008-10-28 2:06 ` Daisuke Nishimura
2008-10-28 2:30 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081027203751.b3b5a607.nishimura@mxp.nes.nec.co.jp \
--to=nishimura@mxp.nes.nec.co.jp \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox