Re: [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vladimir Davydov <vdavydov@parallels.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Glauber Costa <glommer@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	devel@openvz.org, Christoph Lameter <cl@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>
Subject: Re: [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg
Date: Fri, 28 Mar 2014 11:56:29 +0400	[thread overview]
Message-ID: <53352B2D.4000306@parallels.com> (raw)
In-Reply-To: <CAHH2K0YFB9yXF_oyxhQt9EiD_kuBuK7py6ah8YEy2H70P8SC_A@mail.gmail.com>

On 03/28/2014 12:42 AM, Greg Thelen wrote:
> On Thu, Mar 27, 2014 at 12:37 AM, Vladimir Davydov
> <vdavydov@parallels.com> wrote:
>> On 03/27/2014 08:31 AM, Greg Thelen wrote:
>>> Before this change both of the following allocations are charged to
>>> memcg (assuming kmem accounting is enabled):
>>>  a = kmalloc(KMALLOC_MAX_CACHE_SIZE, GFP_KERNEL)
>>>  b = kmalloc(KMALLOC_MAX_CACHE_SIZE + 1, GFP_KERNEL)
>>>
>>> After this change only 'a' is charged; 'b' goes directly to page
>>> allocator which no longer does accounting.
>>
>> Why do we need to charge 'b' in the first place? Can the userspace
>> trigger such allocations massively? If there can only be one or two such
>> allocations from a cgroup, is there any point in charging them?
> 
> Of the top of my head I don't know of any >8KIB kmalloc()s so I can't
> say if they're directly triggerable by user space en masse.  But we
> recently ran into some order:3 allocations in networking.  The
> networking allocations used a non-generic kmem_cache (rather than
> kmalloc which started this discussion).  For details, see ed98df3361f0
> ("net: use __GFP_NORETRY for high order allocations").  I can't say if
> such allocations exist in device drivers, but given the networking
> example, it's conceivable that they may (or will) exist.

Hmm, also not sure about device drivers, but the sock frag pages you
mentioned are worth charging I guess.

For such non-generic kmem allocations, we have two options: either go
with __GFP_KMEMCG, or introduce special alloc/free_kmem_pages methods,
which would be used instead of kmalloc for large allocations (e.g.
threadinfo, sock frags). I vote for the second, because I dislike having
kmemcg charging in the general allocation path.

Anyway, that brings us back to the necessity to reliably track arbitrary
pages in kmemcg to allow reparenting.

> With slab this isn't a problem because sla has kmalloc kmem_caches for
> all supported allocation sizes.  However, slub shows this issue for
> any kmalloc() allocations larger than 8KIB (at least on x86_64).  It
> seems like a strange directly to take kmem accounting to say that
> kmalloc allocations are kmem limited, but only if they are either less
> than a threshold size or done with slab.  Simply increasing the size
> of a data structure doesn't seem like it should automatically cause
> the memory to become exempt from kmem limits.

Sounds fair.

>> In fact, do we actually need to charge every random kmem allocation? I
>> guess not. For instance, filesystems often allocate data shared among
>> all the FS users. It's wrong to charge such allocations to a particular
>> memcg, IMO. That said the next step is going to be adding a per kmem
>> cache flag specifying if allocations from this cache should be charged
>> so that accounting will work only for those caches that are marked so
>> explicitly.
> 
> It's a question of what direction to approach kmem slab accounting
> from: either opt-out (as the code currently is), or opt-in (with per
> kmem_cache flags as you suggest).  I agree that some structures end up
> being shared (e.g. filesystem block bit map structures).  In an
> opt-out system these are charged to a memcg initially and remain
> charged there until the memcg is deleted at which point the shared
> objects are reparented to a shared location.  While this isn't
> perfect, it's unclear if it's better or worse than analyzing each
> class of allocation and deciding if they should be opt'd-in.  One
> could (though I'm not) make the case that even dentries are easily
> shareable between containers and thus shouldn't be accounted to a
> single memcg.  But given user space's ability to DoS a machine with
> dentires, they should be accounted.

Again, you must be right. After a bit of thinking I agree that deciding
which caches should be accounted and which shouldn't would be
cumbersome. Opt-out would be clearer, I guess.

>> There is one more argument for removing kmalloc_large accounting - we
>> don't have an easy way to track such allocations, which prevents us from
>> reparenting kmemcg charges on css offline. Of course, we could link
>> kmalloc_large pages in some sort of per-memcg list which would allow us
>> to find them on css offline, but I don't think such a complication is
>> justified.
> 
> I assume that reparenting of such non kmem_cache allocations (e.g.
> large kmalloc) is difficult because such pages refer to the memcg,
> which we're trying to delete and the memcg has no index of such pages.
>  If such zombie memcg are undesirable, then an alternative to indexing
> the pages is to define a kmem context object which such large pages
> point to.  The kmem context would be reparented without needing to
> adjust the individual large pages.  But there are plenty of options.

I like the idea about the context object. For usual kmalloc'ed data, we
already have one - the kmem_cache itself. For non-generic kmem (e.g.
threadinfo pages), we could easily introduce a separate one with the
pointer to the owning memcg on it. Reparenting wouldn't be a problem at
all then.

I guess I'll give it a try in the next iteration. Thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-03-28  7:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26 15:28 [PATCH -mm 0/4] kmemcg: get rid of __GFP_KMEMCG Vladimir Davydov
2014-03-26 15:28 ` [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg Vladimir Davydov
2014-03-26 21:53   ` Michal Hocko
2014-03-27  7:34     ` Vladimir Davydov
2014-03-27 20:40       ` Michal Hocko
2014-03-27  4:31   ` Greg Thelen
2014-03-27  7:37     ` Vladimir Davydov
2014-03-27 20:42       ` Greg Thelen
2014-03-28  7:56         ` Vladimir Davydov [this message]
2014-03-27 20:43       ` Michal Hocko
2014-03-28  7:58         ` Vladimir Davydov
2014-03-26 15:28 ` [PATCH -mm 2/4] sl[au]b: charge slabs to memcg explicitly Vladimir Davydov
2014-03-26 21:58   ` Michal Hocko
2014-03-27  7:38     ` Vladimir Davydov
2014-03-27 20:38       ` Michal Hocko
2014-03-26 15:28 ` [PATCH -mm 3/4] fork: charge threadinfo " Vladimir Davydov
2014-03-26 22:00   ` Michal Hocko
2014-03-27  7:39     ` Vladimir Davydov
2014-03-26 15:28 ` [PATCH -mm 4/4] mm: kill __GFP_KMEMCG Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53352B2D.4000306@parallels.com \
    --to=vdavydov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=devel@openvz.org \
    --cc=glommer@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox