Re: [RFC][PATCH 0/4] Memory controller soft limit patches

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: balbir@linux.vnet.ibm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	Paul Menage <menage@google.com>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
	Pavel Emelianov <xemul@openvz.org>,
	riel@redhat.com,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>
Subject: Re: [RFC][PATCH 0/4] Memory controller soft limit patches
Date: Thu, 8 Jan 2009 13:21:41 +0900	[thread overview]
Message-ID: <20090108132141.30bc3ce2.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090108035930.GB7294@balbir.in.ibm.com>

On Thu, 8 Jan 2009 09:29:30 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-01-08 09:30:40]:
> 
> > On Thu, 08 Jan 2009 00:11:10 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > 
> > > Here is v1 of the new soft limit implementation. Soft limits is a new feature
> > > for the memory resource controller, something similar has existed in the
> > > group scheduler in the form of shares. We'll compare shares and soft limits
> > > below. I've had soft limit implementations earlier, but I've discarded those
> > > approaches in favour of this one.
> > > 
> > > Soft limits are the most useful feature to have for environments where
> > > the administrator wants to overcommit the system, such that only on memory
> > > contention do the limits become active. The current soft limits implementation
> > > provides a soft_limit_in_bytes interface for the memory controller and not
> > > for memory+swap controller. The implementation maintains an RB-Tree of groups
> > > that exceed their soft limit and starts reclaiming from the group that
> > > exceeds this limit by the maximum amount.
> > > 
> > > This is an RFC implementation and is not meant for inclusion
> > > 
> > Core implemantation seems simple and the feature sounds good.
> 
> Thanks!
> 
> > But, before reviewing into details, 3 points.
> > 
> >   1. please fix current bugs on hierarchy management, before new feature.
> >      AFAIK, OOM-Kill under hierarchy is broken. (I have patches but waits for
> >      merge window close.)
> 
> I've not hit the OOM-kill issue under hierarchy so far, is the OOM
> killer selecting a bad task to kill? I'll debug/reproduce the issue.
> I am not posting these patches for inclusion, fixing bugs is
> definitely the highest priority.
> 
Assume follwoing hierarchy.

   group_A/    limit=100M   usage=1M
	group_01/ no limit  usage=1M
	group_02/ no limit  usage=98M (does memory leak.)

   Q. What happens a task on group_02 causes oom ?
   A. A task in group_A dies.
   

is my problem. (As I said, I'll post a patch .) This is my homework for a month.
(I'll use CSS_ID to fix this.)
Any this will allow to skip my logic to check "Is this OOM is from memcg?"
And makes system panic if vm.panic_on_oom==1.





> >      I wonder there will be some others. Lockdep error which Nishimura reported
> >      are all fixed now ?
> 
> I run all my kernels and tests with lockdep enabled, I did not see any
> lockdep errors showing up.
> 
ok.

> > 
> >   2. You inserts reclaim-by-soft-limit into alloc_pages(). But, to do this,
> >      you have to pass zonelist to try_to_free_mem_cgroup_pages() and have to modify
> >      try_to_free_mem_cgroup_pages().
> >      2-a) If not, when the memory request is for gfp_mask==GFP_DMA or allocation
> >           is under a cpuset, memory reclaim will not work correctlly.
> 
> The idea behind adding the code in alloc_pages() is to detect
> contention and trim mem cgroups down, if they have grown beyond their
> soft limit
> 
Allowing usual direct reclaim go on and just waking up "balance_soft_limit_daemon()"
will be enough.

> >      2-b) try_to_free_mem_cgroup_pages() cannot do good work for order > 1 allocation.
> >   
> >      Please try fake-numa (or real NUMA machine) and cpuset.
> 
> Yes, order > 1 is documented in the patch and you can see the code as
> well. Your suggestion is to look at the gfp_mask as well, I'll do
> that.
> 
and zonelist/nodemask.

generic try_to_free_pages() doesn't have nodemask as its argument but it checks cpuset.

In shrink_zones().
==
1504                 /*
1505                  * Take care memory controller reclaiming has small influence
1506                  * to global LRU.
1507                  */
1508                 if (scan_global_lru(sc)) {
1509                         if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
1510                                 continue;
1511                         note_zone_scanning_priority(zone, priority);
1512 
1513                         if (zone_is_all_unreclaimable(zone) &&
1514                                                 priority != DEF_PRIORITY)
1515                                 continue;       /* Let kswapd poll it */
1516                         sc->all_unreclaimable = 0;
1517                 } else {
1518                         /*
1519                          * Ignore cpuset limitation here. We just want to reduce
1520                          * # of used pages by us regardless of memory shortage.
1521                          */
1522                         sc->all_unreclaimable = 0;
1523                         mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
1524                                                         priority);
1525                 }
==
This is because "reclaim by memcg" can happen even if there are enough memory.
try_to_free_mem_cgroup_pages() is called when "hit limit".

So, there will be some issues to be improved if you want to use
try_to_free_mem_cgroup_pages() for recovering "memory shortage". 
I think above is one of issue. Some more assumption will corrupt.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-01-08  4:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-07 18:41 Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 1/4] Memory controller soft limit documentation Balbir Singh
2009-01-14  1:45   ` Paul Menage
2009-01-14  5:30     ` Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 2/4] Memory controller soft limit interface Balbir Singh
2009-01-07 18:41 ` [RFC][PATCH 3/4] Memory controller soft limit organize cgroups Balbir Singh
2009-01-08  1:11   ` KAMEZAWA Hiroyuki
2009-01-08  4:25     ` Balbir Singh
2009-01-08  4:28       ` KAMEZAWA Hiroyuki
2009-01-08  4:41         ` Balbir Singh
2009-01-08  4:57           ` KAMEZAWA Hiroyuki
2009-01-07 18:41 ` [RFC][PATCH 4/4] Memory controller soft limit reclaim on contention Balbir Singh
2009-01-07 18:56 ` [RFC][PATCH 0/4] Memory controller soft limit patches Dhaval Giani
2009-01-08  0:37   ` KAMEZAWA Hiroyuki
2009-01-08  3:46     ` Balbir Singh
2009-01-08  0:30 ` KAMEZAWA Hiroyuki
2009-01-08  3:59   ` Balbir Singh
2009-01-08  4:21     ` KAMEZAWA Hiroyuki [this message]
2009-01-08  4:41     ` Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090108132141.30bc3ce2.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=skumar@linux.vnet.ibm.com \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox