linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Suleiman Souhlal <suleiman@google.com>,
	Suleiman Souhlal <ssouhlal@freebsd.org>,
	cgroups@vger.kernel.org, penberg@kernel.org, yinghan@google.com,
	hughd@google.com, gthelen@google.com, linux-mm@kvack.org,
	devel@openvz.org
Subject: Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.
Date: Sat, 3 Mar 2012 11:22:08 -0300	[thread overview]
Message-ID: <4F522910.1050402@parallels.com> (raw)
In-Reply-To: <20120301150537.8996bbf6.kamezawa.hiroyu@jp.fujitsu.com>

On 03/01/2012 03:05 AM, KAMEZAWA Hiroyuki wrote:
> On Wed, 29 Feb 2012 21:24:11 -0300
> Glauber Costa<glommer@parallels.com>  wrote:
>
>> On 02/29/2012 09:10 PM, KAMEZAWA Hiroyuki wrote:
>>> On Wed, 29 Feb 2012 11:09:50 -0800
>>> Suleiman Souhlal<suleiman@google.com>   wrote:
>>>
>>>> On Tue, Feb 28, 2012 at 10:00 PM, KAMEZAWA Hiroyuki
>>>> <kamezawa.hiroyu@jp.fujitsu.com>   wrote:
>>>>> On Mon, 27 Feb 2012 14:58:47 -0800
>>>>> Suleiman Souhlal<ssouhlal@FreeBSD.org>   wrote:
>>>>>
>>>>>> This is used to indicate that we don't want an allocation to be accounted
>>>>>> to the current cgroup.
>>>>>>
>>>>>> Signed-off-by: Suleiman Souhlal<suleiman@google.com>
>>>>>
>>>>> I don't like this.
>>>>>
>>>>> Please add
>>>>>
>>>>> ___GFP_ACCOUNT  "account this allocation to memcg"
>>>>>
>>>>> Or make this as slab's flag if this work is for slab allocation.
>>>>
>>>> We would like to account for all the slab allocations that happen in
>>>> process context.
>>>>
>>>> Manually marking every single allocation or kmem_cache with a GFP flag
>>>> really doesn't seem like the right thing to do..
>>>>
>>>> Can you explain why you don't like this flag?
>>>>
>>>
>>> For example, tcp buffer limiting has another logic for buffer size controling.
>>> _AND_, most of kernel pages are not reclaimable at all.
>>> I think you should start from reclaimable caches as dcache, icache etc.
>>>
>>> If you want to use this wider, you can discuss
>>>
>>> + #define GFP_KERNEL	(.....| ___GFP_ACCOUNT)
>>>
>>> in future. I'd like to see small start because memory allocation failure
>>> is always terrible and make the system unstable. Even if you notify
>>> "Ah, kernel memory allocation failed because of memory.limit? and
>>>    many unreclaimable memory usage. Please tweak the limitation or kill tasks!!"
>>>
>>> The user can't do anything because he can't create any new task because of OOM.
>>>
>>> The system will be being unstable until an admin, who is not under any limit,
>>> tweaks something or reboot the system.
>>>
>>> Please do small start until you provide Eco-System to avoid a case that
>>> the admin cannot login and what he can do was only reboot.
>>>
>> Having the root cgroup to be always unlimited should already take care
>> of the most extreme cases, right?
>>
> If an admin can login into root cgroup ;)
> Anyway, if someone have a container under cgroup via hosting service,
> he can do noting if oom killer cannot recover his container. It can be
> caused by kernel memory limit. And I'm not sure he can do shutdown because
> he can't login.
>

To be fair, I think this may be unavoidable. Even if we are only dealing 
with reclaimable slabs, having reclaimable slabs doesn't mean they are 
always reclaimable. Unlike user memory, that we can swap at will (unless 
mlock'd, but that is a different issue), we can have so many objects 
locked, that reclaim is effectively impossible. And with the right 
pattern, that may not even need to be that many: all one needs to do, is 
figure out a way to pin one object per slab page, and that's it: you'll 
never get rid of them.

So although obviously being nice making sure we did everything we could 
to recover from oom scenarios, once we start tracking kernel memory, 
this may not be possible. So the whole point for me, is guaranteeing 
that one container cannot destroy the others - which is the reality if 
one of them can go an grab all kmem =p

That said, I gave this an extra thought. GFP flags are in theory 
targeted at a single allocation. So I think this is wrong. We either 
track or not a cache, not an allocation. Once we decided that a cache 
should be tracked, it should be tracked and end of story.

So how about using a SLAB flag instead?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-03 14:23 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-27 22:58 [PATCH 00/10] memcg: Kernel Memory Accounting Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 01/10] memcg: Kernel memory accounting infrastructure Suleiman Souhlal
2012-02-28 13:10   ` Glauber Costa
2012-02-29  0:37     ` Suleiman Souhlal
2012-02-28 13:11   ` Glauber Costa
2012-02-27 22:58 ` [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup Suleiman Souhlal
2012-02-28 19:00   ` Glauber Costa
2012-02-29  0:24     ` Suleiman Souhlal
2012-02-29 16:51       ` Glauber Costa
2012-02-29  6:22   ` KAMEZAWA Hiroyuki
2012-02-29 19:00     ` Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 03/10] memcg: Reclaim when more than one page needed Suleiman Souhlal
2012-02-29  6:18   ` KAMEZAWA Hiroyuki
2012-02-27 22:58 ` [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT Suleiman Souhlal
2012-02-29  6:00   ` KAMEZAWA Hiroyuki
2012-02-29 16:53     ` Glauber Costa
2012-02-29 19:09     ` Suleiman Souhlal
2012-03-01  0:10       ` KAMEZAWA Hiroyuki
2012-03-01  0:24         ` Glauber Costa
2012-03-01  6:05           ` KAMEZAWA Hiroyuki
2012-03-03 14:22             ` Glauber Costa [this message]
2012-03-03 16:38               ` Suleiman Souhlal
2012-03-03 23:24                 ` Glauber Costa
2012-03-04  0:10                   ` Suleiman Souhlal
2012-03-06 10:36                     ` Glauber Costa
2012-03-06 16:13                       ` Suleiman Souhlal
2012-03-06 18:31                         ` Glauber Costa
2012-02-27 22:58 ` [PATCH 05/10] memcg: Slab accounting Suleiman Souhlal
2012-02-28 13:24   ` Glauber Costa
2012-02-28 23:31     ` Suleiman Souhlal
2012-02-29 17:00       ` Glauber Costa
2012-02-27 22:58 ` [PATCH 06/10] memcg: Track all the memcg children of a kmem_cache Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 07/10] memcg: Stop res_counter underflows Suleiman Souhlal
2012-02-28 13:31   ` Glauber Costa
2012-02-28 23:07     ` Suleiman Souhlal
2012-02-29 17:05       ` Glauber Costa
2012-02-29 19:17         ` Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 08/10] memcg: Add CONFIG_CGROUP_MEM_RES_CTLR_KMEM_ACCT_ROOT Suleiman Souhlal
2012-02-28 13:34   ` Glauber Costa
2012-02-28 23:36     ` Suleiman Souhlal
2012-02-28 23:54       ` KAMEZAWA Hiroyuki
2012-02-29 17:09       ` Glauber Costa
2012-02-29 19:24         ` Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 09/10] memcg: Per-memcg memory.kmem.slabinfo file Suleiman Souhlal
2012-02-27 22:58 ` [PATCH 10/10] memcg: Document kernel memory accounting Suleiman Souhlal
2012-02-27 23:05   ` Randy Dunlap
2012-02-28  8:49 ` [PATCH 00/10] memcg: Kernel Memory Accounting Pekka Enberg
2012-02-28 22:12   ` Suleiman Souhlal
2012-02-28 13:03 ` Glauber Costa
2012-02-28 22:47   ` Suleiman Souhlal
2012-02-29 16:47     ` Glauber Costa
2012-02-29 19:28       ` Suleiman Souhlal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F522910.1050402@parallels.com \
    --to=glommer@parallels.com \
    --cc=cgroups@vger.kernel.org \
    --cc=devel@openvz.org \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=ssouhlal@freebsd.org \
    --cc=suleiman@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox