Re: [PATCH 5/6] mm/vmscan: Don't change pgdat state on base of a single LRU list state.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tejun Heo <tj@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 5/6] mm/vmscan: Don't change pgdat state on base of a single LRU list state.
Date: Wed, 21 Mar 2018 18:57:39 +0300	[thread overview]
Message-ID: <3c5d5884-44a6-0c7f-dca3-adcde718e5ea@virtuozzo.com> (raw)
In-Reply-To: <20180321113217.GG23100@dhcp22.suse.cz>

On 03/21/2018 02:32 PM, Michal Hocko wrote:
> On Wed 21-03-18 13:40:32, Andrey Ryabinin wrote:
>> On 03/20/2018 06:25 PM, Michal Hocko wrote:
>>> On Thu 15-03-18 19:45:52, Andrey Ryabinin wrote:
>>>> We have separate LRU list for each memory cgroup. Memory reclaim iterates
>>>> over cgroups and calls shrink_inactive_list() every inactive LRU list.
>>>> Based on the state of a single LRU shrink_inactive_list() may flag
>>>> the whole node as dirty,congested or under writeback. This is obviously
>>>> wrong and hurtful. It's especially hurtful when we have possibly
>>>> small congested cgroup in system. Than *all* direct reclaims waste time
>>>> by sleeping in wait_iff_congested().
>>>
>>> I assume you have seen this in real workloads. Could you be more
>>> specific about how you noticed the problem?
>>>
>>
>> Does it matter?
> 
> Yes. Having relevant information in the changelog can help other people
> to evaluate whether they need to backport the patch. Their symptoms
> might be similar or even same.
> 
>> One of our userspace processes have some sort of watchdog.
>> When it doesn't receive some event in time it complains that process stuck.
>> In this case in-kernel allocation stuck in wait_iff_congested.
> 
> OK, so normally it would exhibit as a long stall in the page allocator.
> Anyway I was more curious about the setup. I assume you have many memcgs
> and some of them with a very small hard limit which triggers the
> throttling to other memcgs?

Quite some time went since this was observed, so I may don't remember all details by now.
Can't tell you whether there really was many memcgs or just a few, but the more memcgs we have
the more severe the issue is, since wait_iff_congested() called per-lru.

What I've seen was one cgroup A doing a lot of write on NFS. It's easy to congest the NFS
by generating more than nfs_congestion_kb writeback pages.
Other task (the one that with watchdog) from different cgroup B went into *global* direct reclaim
and stalled in wait_iff_congested().
System had dozens gigabytes of clean inactive file pages and relatively few dirty/writeback on NFS.

So, to trigger the issue one must have one memcg with mostly dirty pages on congested device.
It doesn't have to be small or hard limit memcg.
Global reclaim kicks in, sees 'congested' memcg, sets CONGESTED bit, stalls in wait_iff_congested(),
goes to the next memcg stalls again, and so on and on until the reclaim goal is satisfied.

next prev parent reply	other threads:[~2018-03-21 15:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-15 16:45 [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 2/6] mm/vmscan: Update stale comments Andrey Ryabinin
2018-03-20 15:00   ` Michal Hocko
2018-03-15 16:45 ` [PATCH 3/6] mm/vmscan: replace mm_vmscan_lru_shrink_inactive with shrink_page_list tracepoint Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 4/6] mm/vmscan: remove redundant current_may_throttle() check Andrey Ryabinin
2018-03-20 15:11   ` Michal Hocko
2018-03-15 16:45 ` [PATCH 5/6] mm/vmscan: Don't change pgdat state on base of a single LRU list state Andrey Ryabinin
2018-03-20 15:25   ` Michal Hocko
2018-03-21 10:40     ` Andrey Ryabinin
2018-03-21 11:32       ` Michal Hocko
2018-03-21 15:57         ` Andrey Ryabinin [this message]
2018-03-15 16:45 ` [PATCH 6/6] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim Andrey Ryabinin
2018-03-20 15:29   ` Michal Hocko
2018-03-21 11:14     ` Andrey Ryabinin
2018-03-21 11:43       ` Michal Hocko
2018-03-21 17:01         ` Andrey Ryabinin
2018-03-15 18:57 ` [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Shakeel Butt
2018-03-20 15:00 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c5d5884-44a6-0c7f-dca3-adcde718e5ea@virtuozzo.com \
    --to=aryabinin@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox