From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
Sudhir Kumar <skumar@linux.vnet.ibm.com>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
Paul Menage <menage@google.com>,
lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
Pavel Emelianov <xemul@openvz.org>,
riel@redhat.com,
"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: [RFC][PATCH 0/4] Memory controller soft limit patches
Date: Thu, 8 Jan 2009 13:21:41 +0900 [thread overview]
Message-ID: <20090108132141.30bc3ce2.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090108035930.GB7294@balbir.in.ibm.com>
On Thu, 8 Jan 2009 09:29:30 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-08 09:30:40]:
>
> > On Thu, 08 Jan 2009 00:11:10 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >
> > >
> > > Here is v1 of the new soft limit implementation. Soft limits is a new feature
> > > for the memory resource controller, something similar has existed in the
> > > group scheduler in the form of shares. We'll compare shares and soft limits
> > > below. I've had soft limit implementations earlier, but I've discarded those
> > > approaches in favour of this one.
> > >
> > > Soft limits are the most useful feature to have for environments where
> > > the administrator wants to overcommit the system, such that only on memory
> > > contention do the limits become active. The current soft limits implementation
> > > provides a soft_limit_in_bytes interface for the memory controller and not
> > > for memory+swap controller. The implementation maintains an RB-Tree of groups
> > > that exceed their soft limit and starts reclaiming from the group that
> > > exceeds this limit by the maximum amount.
> > >
> > > This is an RFC implementation and is not meant for inclusion
> > >
> > Core implemantation seems simple and the feature sounds good.
>
> Thanks!
>
> > But, before reviewing into details, 3 points.
> >
> > 1. please fix current bugs on hierarchy management, before new feature.
> > AFAIK, OOM-Kill under hierarchy is broken. (I have patches but waits for
> > merge window close.)
>
> I've not hit the OOM-kill issue under hierarchy so far, is the OOM
> killer selecting a bad task to kill? I'll debug/reproduce the issue.
> I am not posting these patches for inclusion, fixing bugs is
> definitely the highest priority.
>
Assume follwoing hierarchy.
group_A/ limit=100M usage=1M
group_01/ no limit usage=1M
group_02/ no limit usage=98M (does memory leak.)
Q. What happens a task on group_02 causes oom ?
A. A task in group_A dies.
is my problem. (As I said, I'll post a patch .) This is my homework for a month.
(I'll use CSS_ID to fix this.)
Any this will allow to skip my logic to check "Is this OOM is from memcg?"
And makes system panic if vm.panic_on_oom==1.
> > I wonder there will be some others. Lockdep error which Nishimura reported
> > are all fixed now ?
>
> I run all my kernels and tests with lockdep enabled, I did not see any
> lockdep errors showing up.
>
ok.
> >
> > 2. You inserts reclaim-by-soft-limit into alloc_pages(). But, to do this,
> > you have to pass zonelist to try_to_free_mem_cgroup_pages() and have to modify
> > try_to_free_mem_cgroup_pages().
> > 2-a) If not, when the memory request is for gfp_mask==GFP_DMA or allocation
> > is under a cpuset, memory reclaim will not work correctlly.
>
> The idea behind adding the code in alloc_pages() is to detect
> contention and trim mem cgroups down, if they have grown beyond their
> soft limit
>
Allowing usual direct reclaim go on and just waking up "balance_soft_limit_daemon()"
will be enough.
> > 2-b) try_to_free_mem_cgroup_pages() cannot do good work for order > 1 allocation.
> >
> > Please try fake-numa (or real NUMA machine) and cpuset.
>
> Yes, order > 1 is documented in the patch and you can see the code as
> well. Your suggestion is to look at the gfp_mask as well, I'll do
> that.
>
and zonelist/nodemask.
generic try_to_free_pages() doesn't have nodemask as its argument but it checks cpuset.
In shrink_zones().
==
1504 /*
1505 * Take care memory controller reclaiming has small influence
1506 * to global LRU.
1507 */
1508 if (scan_global_lru(sc)) {
1509 if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
1510 continue;
1511 note_zone_scanning_priority(zone, priority);
1512
1513 if (zone_is_all_unreclaimable(zone) &&
1514 priority != DEF_PRIORITY)
1515 continue; /* Let kswapd poll it */
1516 sc->all_unreclaimable = 0;
1517 } else {
1518 /*
1519 * Ignore cpuset limitation here. We just want to reduce
1520 * # of used pages by us regardless of memory shortage.
1521 */
1522 sc->all_unreclaimable = 0;
1523 mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
1524 priority);
1525 }
==
This is because "reclaim by memcg" can happen even if there are enough memory.
try_to_free_mem_cgroup_pages() is called when "hit limit".
So, there will be some issues to be improved if you want to use
try_to_free_mem_cgroup_pages() for recovering "memory shortage".
I think above is one of issue. Some more assumption will corrupt.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-01-08 4:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-07 18:41 Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 1/4] Memory controller soft limit documentation Balbir Singh
2009-01-14 1:45 ` Paul Menage
2009-01-14 5:30 ` Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 2/4] Memory controller soft limit interface Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 3/4] Memory controller soft limit organize cgroups Balbir Singh
2009-01-08 1:11 ` KAMEZAWA Hiroyuki
2009-01-08 4:25 ` Balbir Singh
2009-01-08 4:28 ` KAMEZAWA Hiroyuki
2009-01-08 4:41 ` Balbir Singh
2009-01-08 4:57 ` KAMEZAWA Hiroyuki
2009-01-07 18:41 ` [RFC][PATCH 4/4] Memory controller soft limit reclaim on contention Balbir Singh
2009-01-07 18:56 ` [RFC][PATCH 0/4] Memory controller soft limit patches Dhaval Giani
2009-01-08 0:37 ` KAMEZAWA Hiroyuki
2009-01-08 3:46 ` Balbir Singh
2009-01-08 0:30 ` KAMEZAWA Hiroyuki
2009-01-08 3:59 ` Balbir Singh
2009-01-08 4:21 ` KAMEZAWA Hiroyuki [this message]
2009-01-08 4:41 ` Daisuke Nishimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090108132141.30bc3ce2.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=skumar@linux.vnet.ibm.com \
--cc=xemul@openvz.org \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox