Re: High kmalloc-32 slab cache consumption with 10k containers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Bharata B Rao <bharata@linux.ibm.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	aneesh.kumar@linux.ibm.com
Subject: Re: High kmalloc-32 slab cache consumption with 10k containers
Date: Mon, 19 Apr 2021 11:23:56 +1000	[thread overview]
Message-ID: <20210419012356.GZ1990290@dread.disaster.area> (raw)
In-Reply-To: <20210416044439.GB1749436@in.ibm.com>

On Fri, Apr 16, 2021 at 10:14:39AM +0530, Bharata B Rao wrote:
> On Wed, Apr 07, 2021 at 08:28:07AM +1000, Dave Chinner wrote:
> > On Mon, Apr 05, 2021 at 11:18:48AM +0530, Bharata B Rao wrote:
> > 
> > > As an alternative approach, I have this below hack that does lazy
> > > list_lru creation. The memcg-specific list is created and initialized
> > > only when there is a request to add an element to that particular
> > > list. Though I am not sure about the full impact of this change
> > > on the owners of the lists and also the performance impact of this,
> > > the overall savings look good.
> > 
> > Avoiding memory allocation in list_lru_add() was one of the main
> > reasons for up-front static allocation of memcg lists. We cannot do
> > memory allocation while callers are holding multiple spinlocks in
> > core system algorithms (e.g. dentry_kill -> retain_dentry ->
> > d_lru_add -> list_lru_add), let alone while holding an internal
> > spinlock.
> > 
> > Putting a GFP_ATOMIC allocation inside 3-4 nested spinlocks in a
> > path we know might have memory demand in the *hundreds of GB* range
> > gets an NACK from me. It's a great idea, but it's just not a
> 
> I do understand that GFP_ATOMIC allocations are really not preferrable
> but want to point out that the allocations in the range of hundreds of
> GBs get reduced to tens of MBs when we do lazy list_lru head allocations
> under GFP_ATOMIC.

That does not make GFP_ATOMIC allocations safe or desirable. In
general, using GFP_ATOMIC outside of interrupt context indicates
something is being done incorrectly. Especially if it can be
triggered from userspace, which is likely in this particular case...



> As shown earlier, this is what I see in my experimental setup with
> 10k containers:
> 
> Number of kmalloc-32 allocations
> 		Before		During		After
> W/o patch	178176		3442409472	388933632
> W/  patch	190464		468992		468992

SO now we have an additional half million GFP_ATOMIC allocations
when we currently have none. That's not an improvement, that rings
loud alarm bells.

> This does really depend and vary on the type of the container and
> the number of mounts it does, but I suspect we are looking
> at GFP_ATOMIC allocations in the MB range. Also the number of
> GFP_ATOMIC slab allocation requests matter I suppose.

They are slab allocations, which mean every single one of them
could require a new slab backing page (pages!) to be allocated.
Hence the likely memory demand might be a lot higher than the
optimal case you are considering here...

> There are other users of list_lru, but I was just looking at
> dentry and inode list_lru usecase. It appears to me that for both
> dentry and inode, we can tolerate the failure from list_lru_add
> due to GFP_ATOMIC allocation failure. The failure to add dentry
> or inode to the lru list means that they won't be retained in
> the lru list, but would be freed immediately. Is this understanding
> correct?

No. Both retain_dentry() and iput_final() would currently leak
objects that fail insertion into the LRU. They don't check for
insertion success at all.

But, really, this is irrelevant - GFP_ATOMIC usage is the problem,
and allowing it to fail doesn't avoid the problems that unbound
GFP_ATOMIC allocation can have on the stability of the rest of the
system when low on memory. Being able to handle a GFP_ATOMIC memory
allocation failure doesn't change the fact that you should not be
doing GFP_ATOMIC allocation in the first place...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2021-04-19  1:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05  5:48 Bharata B Rao
2021-04-05  8:22 ` kernel test robot
2021-04-05 18:08 ` Yang Shi
2021-04-05 18:38   ` Roman Gushchin
2021-04-06 10:13     ` Bharata B Rao
2021-04-06 10:05   ` Bharata B Rao
2021-04-07  1:39     ` Yang Shi
2021-04-06 22:28 ` Dave Chinner
2021-04-07  5:05   ` Bharata B Rao
2021-04-07 10:07     ` Kirill Tkhai
2021-04-07 11:47       ` Bharata B Rao
2021-04-07 12:49         ` Kirill Tkhai
2021-04-07 13:57   ` Christian Brauner
2021-04-15  5:23   ` Bharata B Rao
2021-04-15  6:54     ` Michal Hocko
2021-04-15  7:21       ` Bharata B Rao
2021-04-16  4:44   ` Bharata B Rao
2021-04-19  1:23     ` Dave Chinner [this message]
2021-04-07 11:54 ` Michal Hocko
2021-04-07 13:32   ` Christian Brauner
2021-04-07 13:43   ` Bharata B Rao
2021-04-07 13:57     ` Michal Hocko
2021-04-07 15:42   ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210419012356.GZ1990290@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox