Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: ktkhai@virtuozzo.com, hannes@cmpxchg.org, mhocko@suse.com,
	 kirill.shutemov@linux.intel.com, hughd@google.com,
	shakeelb@google.com,  akpm@linux-foundation.org,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware
Date: Tue, 28 May 2019 18:22:49 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1905281817090.86034@chino.kir.corp.google.com> (raw)
In-Reply-To: <1559047464-59838-1-git-send-email-yang.shi@linux.alibaba.com>

On Tue, 28 May 2019, Yang Shi wrote:

> 
> I got some reports from our internal application team about memcg OOM.
> Even though the application has been killed by oom killer, there are
> still a lot THPs reside, page reclaim doesn't reclaim them at all.
> 
> Some investigation shows they are on deferred split queue, memcg direct
> reclaim can't shrink them since THP deferred split shrinker is not memcg
> aware, this may cause premature OOM in memcg.  The issue can be
> reproduced easily by the below test:
> 

Right, we've also encountered this.  I talked to Kirill about it a week or 
so ago where the suggestion was to split all compound pages on the 
deferred split queues under the presence of even memory pressure.

That breaks cgroup isolation and perhaps unfairly penalizes workloads that 
are running attached to other memcg hierarchies that are not under 
pressure because their compound pages are now split as a side effect.  
There is a benefit to keeping these compound pages around while not under 
memory pressure if all pages are subsequently mapped again.

> $ cgcreate -g memory:thp
> $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
> $ cgexec -g memory:thp ./transhuge-stress 4000
> 
> transhuge-stress comes from kernel selftest.
> 
> It is easy to hit OOM, but there are still a lot THP on the deferred split
> queue, memcg direct reclaim can't touch them since the deferred split
> shrinker is not memcg aware.
> 

Yes, we have seen this on at least 4.15 as well.

> Convert deferred split shrinker memcg aware by introducing per memcg deferred
> split queue.  The THP should be on either per node or per memcg deferred
> split queue if it belongs to a memcg.  When the page is immigrated to the
> other memcg, it will be immigrated to the target memcg's deferred split queue
> too.
> 
> And, move deleting THP from deferred split queue in page free before memcg
> uncharge so that the page's memcg information is available.
> 
> Reuse the second tail page's deferred_list for per memcg list since the same
> THP can't be on multiple deferred split queues at the same time.
> 
> Remove THP specific destructor since it is not used anymore with memcg aware
> THP shrinker (Please see the commit log of patch 2/3 for the details).
> 
> Make deferred split shrinker not depend on memcg kmem since it is not slab.
> It doesn't make sense to not shrink THP even though memcg kmem is disabled.
> 
> With the above change the test demonstrated above doesn't trigger OOM anymore
> even though with cgroup.memory=nokmem.
> 

I'm curious if your internal applications team is also asking for 
statistics on how much memory can be freed if the deferred split queues 
can be shrunk?  We have applications that monitor their own memory usage 
through memcg stats or usage and proactively try to reduce that usage when 
it is growing too large.  The deferred split queues have significantly 
increased both memcg usage and rss when they've upgraded kernels.

How are your applications monitoring how much memory from deferred split 
queues can be freed on memory pressure?  Any thoughts on providing it as a 
memcg stat?

Thanks!

next prev parent reply	other threads:[~2019-05-29  1:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-28 12:44 Yang Shi
2019-05-28 12:44 ` [PATCH 1/3] mm: thp: make " Yang Shi
2019-05-28 14:42   ` Kirill Tkhai
2019-05-29  2:43     ` Yang Shi
2019-05-29  8:14       ` Kirill Tkhai
2019-05-29 11:25         ` Yang Shi
2019-06-10  8:23           ` Kirill Tkhai
2019-06-10 17:25             ` Yang Shi
2019-06-13  8:19               ` Kirill Tkhai
2019-06-13 17:53                 ` Yang Shi
2019-05-30 12:07   ` Kirill A. Shutemov
2019-05-30 13:29     ` Yang Shi
2019-05-28 12:44 ` [PATCH 2/3] mm: thp: remove THP destructor Yang Shi
2019-05-28 12:44 ` [PATCH 3/3] mm: shrinker: make shrinker not depend on memcg kmem Yang Shi
2019-05-30 12:08   ` Kirill A. Shutemov
2019-05-30 13:20     ` Yang Shi
2019-05-29  1:22 ` David Rientjes [this message]
2019-05-29  2:34   ` [RFC PATCH 0/3] Make deferred split shrinker memcg aware Yang Shi
2019-05-29 21:07     ` David Rientjes
2019-05-30  3:22       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1905281817090.86034@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox