linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Kalvachev <ikalvachev@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>, Greg Thelen <gthelen@google.com>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] mm: fix kswap excessive pressure after wrong condition transfer
Date: Mon, 11 Jun 2018 18:38:54 +0300	[thread overview]
Message-ID: <CABA=pqf81WiOEhX-_O8EJ-cr_QMTFML3vvRzMrcEkbiXD4ogiA@mail.gmail.com> (raw)
In-Reply-To: <112df846-76d6-140f-8fdb-44dd0437c859@suse.cz>

On 6/1/18, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 06/01/2018 01:30 AM, Hugh Dickins wrote:
>> On Fri, 1 Jun 2018, Ivan Kalvachev wrote:
>>> On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
>>>>
>>>> This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
>>>>
>>>
>>> Yes, it seems to be the same problem.
>>> It also have better technical description.
>>
>> Well, your paragraph above on "Big memory consumers" gives a much
>> better user viewpoint, and a more urgent case for the patch to go in,
>> to stable if it does not make 4.17.0.
>>
>> But I am surprised: the change is in a block of code only used in
>> one of the modes of compaction (not in  reclaim itself), and I thought
>> it was a mode which gives up quite easily, rather than visibly blocking.
>>
>> So I wonder if there's another issue to be improved here,
>> and the mistreatment of the ex-swap pages just exposed it somehow.
>> Cc'ing Vlastimil and David in case it triggers any insight from them.
>
> My guess is that the problem is compaction fails because of the
> isolation failures, causing further reclaim/complaction attempts with
> higher priority, in the context of non-costly thus non-failing
> allocations. Initially I thought that increased priority of compaction
> would eventually synchronous and thus not go via this block of code
> anymore. But (see isolate_migratepages()) only MIGRATE_SYNC compaction
> mode drops the ISOLATE_ASYNC_MIGRATE isolate_mode flag. And MIGRATE_SYNC
> is only used for compaction triggered via /proc - direct compaction
> stops at MIGRATE_SYNC_LIGHT. Maybe that could be changed? Mel had
> reasons to limit to SYNC_LIGHT, I guess...
>
> If the above is correct, it means that even with gigabytes of free
> memory you can fail order-3 (max non-costly order) allocation if
> compaction doesn't work properly. That's a bit surprising, but not
> impossible I guess...

Is somebody working on testing this guess?

I don't fully understand this explanation, however I cannot imagine
non-costly allocation to fail when there are gigabytes of free
(unused) memory.

That's why I still think that the possibility that this bug is
triggering some underlying issue. So I did a little bit more poking
around.

For clarity, I'll be referring to the commits as:
-the bug : 69d763fc6d3a ("mm: pin address_space before dereferencing
it while isolating an LRU page")
-the fix : 145e1a71e090("mm: fix the NULL mapping case in __isolate_lru_page()")

The following results might be interesting to you:

1. I've discovered that 4.14.41 does not exhibit any problems, despite
having "the bug" backported into it . I used it again for a while, to
make sure I haven't overlooked it. No issues at all.

2. The 4.15 kernels were shortly supported so I backported "the bug"
on my own and run the kernel (first 4.15.18, later 4.15.0 ). At first
I thought that they were not affected, because I was not getting
blocking during use. However `top` showed that they also tend to
accumulate gigabytes of "free ram". Likely they were just better at
swapping unused pages.

3. I've tried the original 4.16.13 that has "the bug" but not "the
fix", however this time I disabled the "Transparent Hugepage Support"
from `make menuconfig`.
I ran that kernel for a while without any sign of issues.

So, before I start another round of bisect,
Does anybody have an educated guess what commit might have introduced
this behavior?

Do you think it is unintended behavior that should be investigated?

Any other hits?

Best Regards
   Ivan Kalvachev


>>>
>>> Such let down.
>>> It took me so much time to bisect the issue...
>>
>> Thank you for all your work on it, odd how we found it at the same
>> time: I was just porting Mel's patch into another tree, had to make
>> a change near there, and suddenly noticed that the test was wrong.
>>
>> Hugh
>>
>>>
>>> Well, I hope that the fix will get into 4.17 release in time.
>>
>
>

      reply	other threads:[~2018-06-11 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-31 19:34 Ivan Kalvachev
2018-05-31 19:51 ` Greg Thelen
2018-05-31 21:39   ` Ivan Kalvachev
2018-05-31 23:30     ` Hugh Dickins
2018-06-01  8:49       ` Vlastimil Babka
2018-06-11 15:38         ` Ivan Kalvachev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABA=pqf81WiOEhX-_O8EJ-cr_QMTFML3vvRzMrcEkbiXD4ogiA@mail.gmail.com' \
    --to=ikalvachev@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox