From: Suleiman Souhlal <suleiman@google.com>
To: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@parallels.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, kamezawa.hiroyu@jp.fujitsu.com,
Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
Greg Thelen <gthelen@google.com>, Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
devel@openvz.org, David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH v3 13/28] slub: create duplicate cache
Date: Tue, 29 May 2012 13:57:41 -0700 [thread overview]
Message-ID: <CABCjUKCPoL1+qzjX85RVGpRBn_javD3JY2avstYuoM=tsJa8dA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1205291514090.2504@router.home>
Hi Christoph,
On Tue, May 29, 2012 at 1:21 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 30 May 2012, Glauber Costa wrote:
>
>> Well, I'd have to dive in the code a bit more, but that the impression that
>> the documentation gives me, by saying:
>>
>> "Cpusets constrain the CPU and Memory placement of tasks to only
>> the resources within a task's current cpuset."
>>
>> is that you can't allocate from a node outside that set. Is this correct?
>
> Basically yes but there are exceptions (like slab queues etc). Look at the
> hardwall stuff too that allows more exceptions for kernel allocations to
> use memory from other nodes.
>
>> So extrapolating this to memcg, the situation is as follows:
>>
>> * You can't use more memory than what you are assigned to.
>> * In order to do that, you need to account the memory you are using
>> * and to account the memory you are using, all objects in the page
>> must belong to you.
>
> Cpusets work at the page boundary and they do not have the requirement you
> are mentioning of all objects in the page having to belong to a certain
> cpusets. Let that go and things become much easier.
>
>> With a predictable enough workload, this is a recipe for working around the
>> very protection we need to establish: one can DoS a physical box full of
>> containers, by always allocating in someone else's pages, and pinning kernel
>> memory down. Never releasing it, so the shrinkers are useless.
>
> Sure you can construct hyperthetical cases like that. But then that is
> true already of other container like logic in the kernel already.
>
>> So I still believe that if a page is allocated to a cgroup, all the objects in
>> there belong to it - unless of course the sharing actually means something -
>> and identifying this is just too complicated.
>
> We have never worked container like logic like that in the kernel due to
> the complicated logic you would have to put in. The requirement that all
> objects in a page come from the same container is not necessary. If you
> drop this notion then things become very easy and the patches will become
> simple.
Back when we (Google) started using cpusets for memory isolation (fake
NUMA), we found that there was a significant isolation breakage coming
from slab pages belonging to one cpuset being used by other cpusets,
which caused us problems: It was very easy for one job to cause slab
growth in another container, which would cause it to OOM, despite
being well-behaved.
Because of this, we had to add logic to prevent that from happening
(by making sure we only allocate objects from pages coming from our
allowed nodes).
Now that we're switching to doing containers with memcg, I think this
is a hard requirement, for us. :-(
-- Suleiman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-05-29 20:57 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-25 13:03 [PATCH v3 00/28] kmem limitation for memcg Glauber Costa
2012-05-25 13:03 ` [PATCH v3 01/28] slab: move FULL state transition to an initcall Glauber Costa
2012-05-25 13:03 ` [PATCH v3 02/28] memcg: Always free struct memcg through schedule_work() Glauber Costa
2012-05-25 13:03 ` [PATCH v3 03/28] slab: rename gfpflags to allocflags Glauber Costa
2012-05-25 13:03 ` [PATCH v3 04/28] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-05-25 13:03 ` [PATCH v3 05/28] memcg: Reclaim when more than one page needed Glauber Costa
2012-05-29 14:19 ` Christoph Lameter
2012-05-29 14:20 ` Christoph Lameter
2012-05-29 15:45 ` Glauber Costa
2012-05-25 13:03 ` [PATCH v3 06/28] slab: use obj_size field of struct kmem_cache when not debugging Glauber Costa
2012-05-25 13:03 ` [PATCH v3 07/28] memcg: change defines to an enum Glauber Costa
2012-05-25 13:03 ` [PATCH v3 08/28] res_counter: don't force return value checking in res_counter_charge_nofail Glauber Costa
2012-05-25 13:03 ` [PATCH v3 09/28] kmem slab accounting basic infrastructure Glauber Costa
2012-05-25 13:03 ` [PATCH v3 10/28] slab/slub: struct memcg_params Glauber Costa
2012-05-25 13:03 ` [PATCH v3 11/28] slub: consider a memcg parameter in kmem_create_cache Glauber Costa
2012-05-25 13:03 ` [PATCH v3 12/28] slab: pass memcg parameter to kmem_cache_create Glauber Costa
2012-05-29 14:27 ` Christoph Lameter
2012-05-29 15:50 ` Glauber Costa
2012-05-29 16:33 ` Christoph Lameter
2012-05-29 16:36 ` Glauber Costa
2012-05-29 16:52 ` Christoph Lameter
2012-05-29 16:59 ` Glauber Costa
2012-05-30 11:01 ` Frederic Weisbecker
2012-05-25 13:03 ` [PATCH v3 13/28] slub: create duplicate cache Glauber Costa
2012-05-29 14:36 ` Christoph Lameter
2012-05-29 15:56 ` Glauber Costa
2012-05-29 16:05 ` Christoph Lameter
2012-05-29 17:05 ` Glauber Costa
2012-05-29 17:25 ` Christoph Lameter
2012-05-29 17:27 ` Glauber Costa
2012-05-29 19:26 ` Christoph Lameter
2012-05-29 19:40 ` Glauber Costa
2012-05-29 19:55 ` Christoph Lameter
2012-05-29 20:08 ` Glauber Costa
2012-05-29 20:21 ` Christoph Lameter
2012-05-29 20:25 ` Glauber Costa
2012-05-30 1:29 ` Tejun Heo
2012-05-30 7:28 ` [Devel] " James Bottomley
2012-05-30 7:54 ` Glauber Costa
2012-05-30 8:02 ` Tejun Heo
2012-05-30 15:37 ` Christoph Lameter
2012-05-29 20:57 ` Suleiman Souhlal [this message]
2012-05-25 13:03 ` [PATCH v3 14/28] slab: " Glauber Costa
2012-05-25 13:03 ` [PATCH v3 15/28] slub: always get the cache from its page in kfree Glauber Costa
2012-05-29 14:42 ` Christoph Lameter
2012-05-29 15:59 ` Glauber Costa
2012-05-25 13:03 ` [PATCH v3 16/28] memcg: kmem controller charge/uncharge infrastructure Glauber Costa
2012-05-29 14:47 ` Christoph Lameter
2012-05-29 16:00 ` Glauber Costa
2012-05-30 12:17 ` Frederic Weisbecker
2012-05-30 12:26 ` Glauber Costa
2012-05-30 12:34 ` Frederic Weisbecker
2012-05-30 12:38 ` Glauber Costa
2012-05-30 13:11 ` Frederic Weisbecker
2012-05-30 13:09 ` Glauber Costa
2012-05-30 13:04 ` Frederic Weisbecker
2012-05-30 13:06 ` Glauber Costa
2012-05-30 13:37 ` Frederic Weisbecker
2012-05-30 13:37 ` Glauber Costa
2012-05-30 13:53 ` Frederic Weisbecker
2012-05-30 13:55 ` Glauber Costa
2012-05-30 15:33 ` Frederic Weisbecker
2012-05-30 16:16 ` Glauber Costa
2012-05-25 13:03 ` [PATCH v3 17/28] skip memcg kmem allocations in specified code regions Glauber Costa
2012-05-25 13:03 ` [PATCH v3 18/28] slub: charge allocation to a memcg Glauber Costa
2012-05-29 14:51 ` Christoph Lameter
2012-05-29 16:06 ` Glauber Costa
2012-05-25 13:03 ` [PATCH v3 19/28] slab: per-memcg accounting of slab caches Glauber Costa
2012-05-29 14:52 ` Christoph Lameter
2012-05-29 16:07 ` Glauber Costa
2012-05-29 16:13 ` Glauber Costa
2012-05-25 13:03 ` [PATCH v3 20/28] memcg: disable kmem code when not in use Glauber Costa
2012-05-25 13:03 ` [PATCH v3 21/28] memcg: destroy memcg caches Glauber Costa
2012-05-25 13:03 ` [PATCH v3 22/28] memcg/slub: shrink dead caches Glauber Costa
2012-05-25 13:03 ` [PATCH v3 23/28] slab: Track all the memcg children of a kmem_cache Glauber Costa
2012-05-25 13:03 ` [PATCH v3 24/28] memcg: Per-memcg memory.kmem.slabinfo file Glauber Costa
2012-05-25 13:03 ` [PATCH v3 25/28] slub: create slabinfo file for memcg Glauber Costa
2012-05-25 13:03 ` [PATCH v3 26/28] slub: track all children of a kmem cache Glauber Costa
2012-05-25 13:03 ` [PATCH v3 27/28] memcg: propagate kmem limiting information to children Glauber Costa
2012-05-25 13:03 ` [PATCH v3 28/28] Documentation: add documentation for slab tracker for memcg Glauber Costa
2012-05-25 13:34 ` [PATCH v3 00/28] kmem limitation " Michal Hocko
2012-05-25 14:34 ` Christoph Lameter
2012-05-28 8:32 ` Glauber Costa
2012-05-29 15:07 ` Christoph Lameter
2012-05-29 15:44 ` Glauber Costa
2012-05-29 16:01 ` Christoph Lameter
2012-06-07 10:26 ` Frederic Weisbecker
2012-06-07 10:53 ` Glauber Costa
2012-06-07 14:00 ` Frederic Weisbecker
2012-06-14 2:24 ` Kamezawa Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABCjUKCPoL1+qzjX85RVGpRBn_javD3JY2avstYuoM=tsJa8dA@mail.gmail.com' \
--to=suleiman@google.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=devel@openvz.org \
--cc=glommer@parallels.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=mhocko@suse.cz \
--cc=penberg@cs.helsinki.fi \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox