From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: nishimura@mxp.nes.nec.co.jp,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"xemul@openvz.org" <xemul@openvz.org>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Dave Hansen <haveblue@us.ibm.com>,
ryov@valinux.co.jp
Subject: Re: [PATCH 9/12] memcg allocate all page_cgroup at boot
Date: Fri, 26 Sep 2008 10:00:22 +0900 [thread overview]
Message-ID: <20080926100022.8bfb8d4d.nishimura@mxp.nes.nec.co.jp> (raw)
In-Reply-To: <20080925153206.281243dc.kamezawa.hiroyu@jp.fujitsu.com>
On Thu, 25 Sep 2008 15:32:06 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Allocate all page_cgroup at boot and remove page_cgroup poitner
> from struct page. This patch adds an interface as
>
> struct page_cgroup *lookup_page_cgroup(struct page*)
>
> All FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.
>
> Remove page_cgroup pointer reduces the amount of memory by
> - 4 bytes per PAGE_SIZE.
> - 8 bytes per PAGE_SIZE
> if memory controller is disabled. (even if configured.)
> meta data usage of this is no problem in FLATMEM/DISCONTIGMEM.
> On SPARSEMEM, this makes mem_section[] size twice.
>
> On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
> On my x86-64 server with 48GB of memory, this saves 96MB of memory.
> (and uses xx kbytes for mem_section.)
> I think this reduction makes sense.
>
> By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
> This means
> - we're not necessary to be afraid of kmalloc faiulre.
> (this can happen because of gfp_mask type.)
> - we can avoid calling kmalloc/kfree.
> - we can avoid allocating tons of small objects which can be fragmented.
> - we can know what amount of memory will be used for this extra-lru handling.
>
> I added printk message as
>
> "allocated %ld bytes of page_cgroup"
> "please try cgroup_disable=memory option if you don't want"
>
> maybe enough informative for users.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> include/linux/memcontrol.h | 11 -
> include/linux/mm_types.h | 4
> include/linux/mmzone.h | 9 +
> include/linux/page_cgroup.h | 90 +++++++++++++++
> mm/Makefile | 2
> mm/memcontrol.c | 258 ++++++++++++--------------------------------
> mm/page_alloc.c | 12 --
> mm/page_cgroup.c | 253 +++++++++++++++++++++++++++++++++++++++++++
> 8 files changed, 431 insertions(+), 208 deletions(-)
>
> Index: mmotm-2.6.27-rc7+/mm/page_cgroup.c
> ===================================================================
> --- /dev/null
> +++ mmotm-2.6.27-rc7+/mm/page_cgroup.c
> @@ -0,0 +1,253 @@
> +#include <linux/mm.h>
> +#include <linux/mmzone.h>
> +#include <linux/bootmem.h>
> +#include <linux/bit_spinlock.h>
> +#include <linux/page_cgroup.h>
> +#include <linux/hash.h>
> +#include <linux/memory.h>
> +
> +static void __meminit
> +__init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
> +{
> + pc->flags = 0;
> + pc->mem_cgroup = NULL;
> + pc->page = pfn_to_page(pfn);
> +}
> +static unsigned long total_usage = 0;
> +
> +#ifdef CONFIG_FLAT_NODE_MEM_MAP
> +
> +
> +void __init pgdat_page_cgroup_init(struct pglist_data *pgdat)
> +{
> + pgdat->node_page_cgroup = NULL;
> +}
> +
> +struct page_cgroup *lookup_page_cgroup(struct page *page)
> +{
> + unsigned long pfn = page_to_pfn(page);
> + unsigned long offset;
> + struct page_cgroup *base;
> +
> + base = NODE_DATA(page_to_nid(nid))->node_page_cgroup;
page_to_nid(page) :)
> + if (unlikely(!base))
> + return NULL;
> +
> + offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
> + return base + offset;
> +}
> +
> +static int __init alloc_node_page_cgroup(int nid)
> +{
> + struct page_cgroup *base, *pc;
> + unsigned long table_size;
> + unsigned long start_pfn, nr_pages, index;
> +
> + start_pfn = NODE_DATA(nid)->node_start_pfn;
> + nr_pages = NODE_DATA(nid)->node_spanned_pages;
> +
> + table_size = sizeof(struct page_cgroup) * nr_pages;
> +
> + base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
> + table_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
> + if (!base)
> + return -ENOMEM;
> + for (index = 0; index < nr_pages; index++) {
> + pc = base + index;
> + __init_page_cgroup(pc, start_pfn + index);
> + }
> + NODE_DATA(nid)->node_page_cgroup = base;
> + total_usage += table_size;
> + return 0;
> +}
> +
> +void __init free_node_page_cgroup(int nid)
> +{
> + unsigned long table_size;
> + unsigned long nr_pages;
> + struct page_cgroup *base;
> +
> + base = NODE_DATA(nid)->node_page_cgroup;
> + if (!base)
> + return;
> + nr_pages = NODE_DATA(nid)->node_spanned_pages;
> +
> + table_size = sizeof(struct page_cgroup) * nr_pages;
> +
> + free_bootmem_node(NODE_DATA(nid),
> + (unsigned long)base, table_size);
> + NODE_DATA(nid)->node_page_cgroup = NULL;
> +}
> +
Hmm, who uses this function?
(snip)
> @@ -812,49 +708,41 @@ __mem_cgroup_uncharge_common(struct page
>
> if (mem_cgroup_subsys.disabled)
> return;
> + /* check the condition we can know from page */
>
> - /*
> - * Check if our page_cgroup is valid
> - */
> - lock_page_cgroup(page);
> - pc = page_get_page_cgroup(page);
> - if (unlikely(!pc))
> - goto unlock;
> -
> - VM_BUG_ON(pc->page != page);
> + pc = lookup_page_cgroup(page);
> + if (unlikely(!pc || !PageCgroupUsed(pc)))
> + return;
> + preempt_disable();
> + lock_page_cgroup(pc);
> + if (unlikely(page_mapped(page))) {
> + unlock_page_cgroup(pc);
> + preempt_enable();
> + return;
> + }
Just for clarification, in what sequence will the page be mapped here?
mem_cgroup_uncharge_page checks whether the page is mapped.
> + ClearPageCgroupUsed(pc);
> + unlock_page_cgroup(pc);
>
> - if ((ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> - && ((PageCgroupCache(pc) || page_mapped(page))))
> - goto unlock;
> -retry:
> mem = pc->mem_cgroup;
> mz = page_cgroup_zoneinfo(pc);
> +
> spin_lock_irqsave(&mz->lru_lock, flags);
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED &&
> - unlikely(mem != pc->mem_cgroup)) {
> - /* MAPPED account can be done without lock_page().
> - Check race with mem_cgroup_move_account() */
> - spin_unlock_irqrestore(&mz->lru_lock, flags);
> - goto retry;
> - }
By these changes, ctype becomes unnecessary so it can be removed.
> __mem_cgroup_remove_list(mz, pc);
> spin_unlock_irqrestore(&mz->lru_lock, flags);
> -
> - page_assign_page_cgroup(page, NULL);
> - unlock_page_cgroup(page);
> -
> -
> - res_counter_uncharge(&mem->res, PAGE_SIZE);
> + pc->mem_cgroup = NULL;
> css_put(&mem->css);
> + preempt_enable();
> + res_counter_uncharge(&mem->res, PAGE_SIZE);
>
> - kmem_cache_free(page_cgroup_cache, pc);
> return;
> -unlock:
> - unlock_page_cgroup(page);
> }
>
Thanks,
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-09-26 1:00 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-25 6:11 [PATCH 0/12] memcg updates v5 KAMEZAWA Hiroyuki
2008-09-25 6:13 ` [PATCH 1/12] memcg avoid accounting special mappings not on LRU KAMEZAWA Hiroyuki
2008-09-26 8:25 ` Balbir Singh
2008-09-26 9:17 ` KAMEZAWA Hiroyuki
2008-09-26 9:32 ` Balbir Singh
2008-09-26 9:55 ` KAMEZAWA Hiroyuki
2008-09-25 6:14 ` [PATCH 2/12] memcg move charege() call to swapped-in page under lock_page() KAMEZAWA Hiroyuki
2008-09-26 8:36 ` Balbir Singh
2008-09-26 9:18 ` KAMEZAWA Hiroyuki
2008-09-25 6:15 ` [PATCH 3/12] memcg make root cgroup unlimited KAMEZAWA Hiroyuki
2008-09-26 8:41 ` Balbir Singh
2008-09-26 9:21 ` KAMEZAWA Hiroyuki
2008-09-26 9:29 ` Balbir Singh
2008-09-26 9:59 ` KAMEZAWA Hiroyuki
2008-09-25 6:16 ` [PATCH 4/12] memcg make page->mapping NULL before calling uncharge KAMEZAWA Hiroyuki
2008-09-26 9:47 ` Balbir Singh
2008-09-26 10:07 ` KAMEZAWA Hiroyuki
2008-09-25 6:17 ` [PATCH 5/12] memcg make page_cgroup->flags atomic KAMEZAWA Hiroyuki
2008-09-27 6:58 ` Balbir Singh
2008-09-25 6:18 ` [PATCH 6/12] memcg optimize percpu stat KAMEZAWA Hiroyuki
2008-09-26 9:53 ` Balbir Singh
2008-09-25 6:27 ` [PATCH 7/12] memcg add function to move account KAMEZAWA Hiroyuki
2008-09-26 7:30 ` Daisuke Nishimura
2008-09-26 9:24 ` KAMEZAWA Hiroyuki
2008-09-27 7:56 ` Balbir Singh
2008-09-27 8:35 ` kamezawa.hiroyu
2008-09-25 6:29 ` [PATCH 8/12] memcg rewrite force empty to move account to root KAMEZAWA Hiroyuki
2008-09-25 6:32 ` [PATCH 9/12] memcg allocate all page_cgroup at boot KAMEZAWA Hiroyuki
2008-09-25 18:40 ` Dave Hansen
2008-09-26 1:17 ` KAMEZAWA Hiroyuki
2008-09-26 1:22 ` KAMEZAWA Hiroyuki
2008-09-26 1:00 ` Daisuke Nishimura [this message]
2008-09-26 1:43 ` KAMEZAWA Hiroyuki
2008-09-26 2:05 ` KAMEZAWA Hiroyuki
2008-09-26 5:54 ` Daisuke Nishimura
2008-09-26 6:54 ` KAMEZAWA Hiroyuki
2008-09-27 3:47 ` KAMEZAWA Hiroyuki
2008-09-27 3:25 ` KAMEZAWA Hiroyuki
2008-09-26 2:21 ` [PATCH(fixed) " KAMEZAWA Hiroyuki
2008-09-26 2:25 ` [PATCH(fixed) 10/12] free page cgroup from LRU in lazy KAMEZAWA Hiroyuki
2008-09-26 2:28 ` [PATCH(fixed) 11/12] free page cgroup from LRU in add KAMEZAWA Hiroyuki
2008-10-01 4:03 ` [PATCH 9/12] memcg allocate all page_cgroup at boot Balbir Singh
2008-10-01 5:07 ` KAMEZAWA Hiroyuki
2008-10-01 5:30 ` Balbir Singh
2008-10-01 5:41 ` KAMEZAWA Hiroyuki
2008-10-01 6:12 ` KAMEZAWA Hiroyuki
2008-10-01 6:26 ` Balbir Singh
2008-10-01 5:32 ` KAMEZAWA Hiroyuki
2008-10-01 5:59 ` Balbir Singh
2008-10-01 6:17 ` KAMEZAWA Hiroyuki
2008-09-25 6:33 ` [PATCH 10/12] memcg free page_cgroup from LRU in lazy KAMEZAWA Hiroyuki
2008-09-25 6:35 ` [PATCH 11/12] memcg add to " KAMEZAWA Hiroyuki
2008-09-25 6:36 ` [PATCH 12/12] memcg: fix race at charging swap-in KAMEZAWA Hiroyuki
2008-09-26 2:32 ` [PATCH 0/12] memcg updates v5 Daisuke Nishimura
2008-09-26 2:58 ` KAMEZAWA Hiroyuki
2008-09-26 3:04 ` KAMEZAWA Hiroyuki
2008-09-26 3:00 ` Daisuke Nishimura
2008-09-26 4:05 ` KAMEZAWA Hiroyuki
2008-09-26 5:24 ` Daisuke Nishimura
2008-09-26 9:28 ` KAMEZAWA Hiroyuki
2008-09-26 10:43 ` KAMEZAWA Hiroyuki
2008-09-27 2:53 ` KAMEZAWA Hiroyuki
2008-09-26 8:18 ` Balbir Singh
2008-09-26 9:22 ` KAMEZAWA Hiroyuki
2008-09-26 9:31 ` Balbir Singh
2008-09-26 10:36 ` KAMEZAWA Hiroyuki
2008-09-27 3:19 ` KAMEZAWA Hiroyuki
2008-09-29 3:02 ` Balbir Singh
2008-09-29 3:27 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080926100022.8bfb8d4d.nishimura@mxp.nes.nec.co.jp \
--to=nishimura@mxp.nes.nec.co.jp \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=haveblue@us.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryov@valinux.co.jp \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox