From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, mgorman@suse.de, dhillf@gmail.com,
aarcange@redhat.com, mhocko@suse.cz, akpm@linux-foundation.org,
hannes@cmpxchg.org, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org
Subject: Re: [PATCH -V2 4/9] memcg: Add non reclaim resource tracking to memcg
Date: Thu, 8 Mar 2012 14:56:28 +0900 [thread overview]
Message-ID: <20120308145628.f911419d.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <87ipikdyud.fsf@linux.vnet.ibm.com>
On Sun, 04 Mar 2012 23:37:22 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Fri, 2 Mar 2012 17:38:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Thu, 1 Mar 2012 14:46:15 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > > + help
> > > + Add non reclaim resource management to memory resource controller.
> > > + Currently only HugeTLB pages will be managed using this extension.
> > > + The controller limit is enforced during mmap(2), so that
> > > + application can fall back to allocations using smaller page size
> > > + if the memory controller limit prevented them from allocating HugeTLB
> > > + pages.
> > > +
> >
> > Hm. In other thread, KMEM accounting is discussed. There is 2 proposals and
> > - 1st is accounting only reclaimable slabs (as dcache etc.)
> > - 2nd is accounting all slab allocations.
> >
> > Here, 2nd one includes NORECLAIM kmem cache. (Discussion is not ended.)
> >
> > So, for your developments, How about MEM_RES_CTLR_HUGEPAGE ?
>
> Frankly I didn't like the noreclaim name, I also didn't want to indicate
> HUGEPAGE, because the code doesn't make any huge page assumption.
You can add this config for HUGEPAGE interfaces.
Later we can sort out other configs.
> >
> >
> > > config CGROUP_MEM_RES_CTLR_SWAP
> > > bool "Memory Resource Controller Swap Extension"
> > > depends on CGROUP_MEM_RES_CTLR && SWAP
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 6728a7a..b00d028 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -49,6 +49,7 @@
> > > #include <linux/page_cgroup.h>
> > > #include <linux/cpu.h>
> > > #include <linux/oom.h>
> > > +#include <linux/region.h>
> > > #include "internal.h"
> > > #include <net/sock.h>
> > > #include <net/tcp_memcontrol.h>
> > > @@ -214,6 +215,11 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg);
> > > static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
> > >
> > > /*
> > > + * Currently only hugetlbfs pages are tracked using no reclaim
> > > + * resource count. So we need only MAX_HSTATE res counter
> > > + */
> > > +#define MEMCG_MAX_NORECLAIM HUGE_MAX_HSTATE
> > > +/*
> > > * The memory controller data structure. The memory controller controls both
> > > * page cache and RSS per cgroup. We would eventually like to provide
> > > * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> > > @@ -235,6 +241,11 @@ struct mem_cgroup {
> > > */
> > > struct res_counter memsw;
> > > /*
> > > + * the counter to account for non reclaim resources
> > > + * like hugetlb pages
> > > + */
> > > + struct res_counter no_rcl_res[MEMCG_MAX_NORECLAIM];
> >
> > struct res_counter hugepages;
> >
> > will be ok.
> >
>
> My goal was to make this patch not to mention hugepages, because
> it doesn't really have any depedency on hugepages. That is one of the reason
> for adding MEMCG_MAX_NORECLAIM. Later if we want other in memory file system
> (shmemfs) to limit the resource usage in a similar fashion, we should be
> able to use this memcg changes.
>
> May be for this patchset I can make the changes you suggested and later
> when we want to reuse the code make it more generic ?
>
yes. If there is no user interface change, internal code change will be welcomed.
>
> >
> > > + /*
> > > * Per cgroup active and inactive list, similar to the
> > > * per zone LRU lists.
> > > */
> > > @@ -4887,6 +4898,7 @@ err_cleanup:
> > > static struct cgroup_subsys_state * __ref
> > > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > {
> > > + int idx;
> > > struct mem_cgroup *memcg, *parent;
> > > long error = -ENOMEM;
> > > int node;
> > > @@ -4922,6 +4934,10 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > if (parent && parent->use_hierarchy) {
> > > res_counter_init(&memcg->res, &parent->res);
> > > res_counter_init(&memcg->memsw, &parent->memsw);
> > > + for (idx = 0; idx < MEMCG_MAX_NORECLAIM; idx++) {
> > > + res_counter_init(&memcg->no_rcl_res[idx],
> > > + &parent->no_rcl_res[idx]);
> > > + }
> >
> > You can remove this kinds of loop and keep your implemenation simple.
>
>
> Can you explain this ? How can we remote the loop ?. We want to track
> each huge page size as a seperate resource.
>
Ah, sorry. I miseed it. please ignore.
> > > +long mem_cgroup_try_noreclaim_charge(struct list_head *chg_list,
> > > + unsigned long from, unsigned long to,
> > > + int idx)
> > > +{
> > > + long chg;
> > > + int ret = 0;
> > > + unsigned long csize;
> > > + struct mem_cgroup *memcg;
> > > + struct res_counter *fail_res;
> > > +
> > > + /*
> > > + * Get the task cgroup within rcu_readlock and also
> > > + * get cgroup reference to make sure cgroup destroy won't
> > > + * race with page_charge. We don't allow a cgroup destroy
> > > + * when the cgroup have some charge against it
> > > + */
> > > + rcu_read_lock();
> > > + memcg = mem_cgroup_from_task(current);
> > > + css_get(&memcg->css);
> >
> > css_tryget() ?
> >
>
>
> Why ?
>
current<->cgroup relationship isn't under any locks. So, we do speculative
access with rcu_read_lock() and css_tryget().
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-03-08 5:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-01 9:16 [PATCH -V2 0/9] memcg: add HugeTLB resource tracking Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 1/9] mm: move hugetlbfs region tracking function to common code Aneesh Kumar K.V
2012-03-01 22:33 ` Andrew Morton
2012-03-04 17:37 ` Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 2/9] mm: Update region function to take new data arg Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 3/9] hugetlbfs: Use the generic region API and drop local one Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 4/9] memcg: Add non reclaim resource tracking to memcg Aneesh Kumar K.V
2012-03-02 8:38 ` KAMEZAWA Hiroyuki
2012-03-04 18:07 ` Aneesh Kumar K.V
2012-03-08 5:56 ` KAMEZAWA Hiroyuki [this message]
2012-03-08 11:48 ` Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 5/9] hugetlbfs: Add memory controller support for shared mapping Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 6/9] hugetlbfs: Add memory controller support for private mapping Aneesh Kumar K.V
2012-05-17 23:16 ` Darrick J. Wong
2012-03-01 9:16 ` [PATCH -V2 7/9] memcg: track resource index in cftype private Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 8/9] hugetlbfs: Add memcg control files for hugetlbfs Aneesh Kumar K.V
2012-03-01 9:16 ` [PATCH -V2 9/9] memcg: Add memory controller documentation for hugetlb management Aneesh Kumar K.V
2012-03-01 22:40 ` [PATCH -V2 0/9] memcg: add HugeTLB resource tracking Andrew Morton
2012-03-02 3:28 ` David Gibson
2012-03-04 18:09 ` Aneesh Kumar K.V
2012-03-06 2:38 ` David Gibson
2012-03-04 19:15 ` Aneesh Kumar K.V
2012-03-05 13:56 ` Hillf Danton
2012-03-06 14:05 ` Aneesh Kumar K.V
2012-03-02 5:48 ` KAMEZAWA Hiroyuki
2012-03-04 18:14 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120308145628.f911419d.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cgroups@vger.kernel.org \
--cc=dhillf@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox