Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Mon, 25 Nov 2019 22:34:11 +0100	[thread overview]
Message-ID: <8634664b-5e3a-5875-1344-bddd94a48716@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1911251233020.245192@chino.kir.corp.google.com>

On 11/25/19 9:38 PM, David Rientjes wrote:
> On Mon, 25 Nov 2019, Michal Hocko wrote:
> 
>>> So my question would be: if we know the previous behavior that allowed 
>>> excessive swap and recalling into compaction was deemed harmful for the 
>>> local node, why do we now believe it cannot be harmful if done for all 
>>> system memory?
>>
>> I have to say that I got lost in your explanation. I have already
>> pointed this out in a previous email you didn't reply to. But the main
>> difference to previous __GFP_THISNODE behavior is that it is used along
>> with __GFP_NORETRY and that reduces the overall effort of the reclaim
>> AFAIU. If that is not the case then please be _explicit_ why.
>>
> 
> I'm referring to the second allocation in alloc_pages_vma() after the 
> patch:
> 
>  			/*
>  			 * If hugepage allocations are configured to always
>  			 * synchronous compact or the vma has been madvised
>  			 * to prefer hugepage backing, retry allowing remote
> -			 * memory as well.
> +			 * memory with both reclaim and compact as well.
>  			 */
>  			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
>  				page = __alloc_pages_node(hpage_node,
> - 						gfp | __GFP_NORETRY, order);
> +							gfp, order);
> 
> So we now do not have __GFP_NORETRY nor __GFP_THISNODE so this bypasses 
> all the precautionary logic in the page allocator that avoids excessive 
> swap: it is free to continue looping, swapping, and thrashing, trying to 
> allocate hugepages if all memory is fragmented.

Well, it's not completely free to do all that, there's other
reclaim/compaction logic that terminates the attempts. If it's
insufficient for some workloads, I would like to know, with hard data. I
think the past observations of thrashing with __GFP_THISNODE do not
prove the reclaim/compaction logic is wrong, see below.

> Qemu uses MADV_HUGEPAGE so this allocation *will* be attempted for 
> Andrea's workload.  The swap storms were reported for the same allocation 
> but with __GFP_THISNODE so it only occurred for local fragmentation and 
> low-on-memory conditions for the local node in the past.  This is now 
> opened up for all nodes.

I think it's also possible to explain the thrashing with __GFP_THISNODE
without the premise of local fragmentation and suboptimal
reclaim/compact decisions. THP __GFP_THISNODE allocations might simply
put too much pressure on the local node with a workload that might be
sized for the whole system. It's the same problem as with the node
reclaim mode. Even if there's no bad fragmentation and THP allocations
succeed without much trouble, they fill the node and cause reclaim (both
kswapd and direct). Fallback order-0 allocations are not restricted, so
they will use all nodes. The new THP retry without __GFP_THISNODE is
also not restricted, so it's very much possible it will not cause this
thrashing with a properly sized workload for the whole system.

> So the question is: what prevents the exact same issue from happening 
> again for Andrea's usecase if all memory on the system is fragmented?  I'm 
> assuming that if this were tested under such conditions that the swap 
> storms would be much worse.

We will only know if Andrea tests it. But if his workloads were fine
with just the initial __GFP_THISNODE reverts, they should be still fine
after this patch?

next prev parent reply	other threads:[~2019-11-25 21:38 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:54 David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05  9:00   ` Michal Hocko
2019-09-05 11:22     ` Vlastimil Babka
2019-09-05 20:53       ` Mike Kravetz
2019-09-06 20:16         ` David Rientjes
2019-09-06 20:49       ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54   ` David Rientjes
2019-09-07 19:51     ` David Rientjes
2019-09-07 19:55       ` Linus Torvalds
2019-09-08  1:50         ` David Rientjes
2019-09-08 12:47           ` Vlastimil Babka
2019-09-08 20:45             ` David Rientjes
2019-09-09  8:37               ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06   ` David Rientjes
2019-09-09 19:30     ` Michal Hocko
2019-09-25  7:08       ` Michal Hocko
2019-09-26 19:03         ` David Rientjes
2019-09-27  7:48           ` Michal Hocko
2019-09-28 20:59             ` Linus Torvalds
2019-09-30 11:28               ` Michal Hocko
2019-10-01  5:43                 ` Michal Hocko
2019-10-01  8:37                   ` Michal Hocko
2019-10-18 14:15                     ` Michal Hocko
2019-10-23 11:03                       ` Vlastimil Babka
2019-10-24 18:59                         ` David Rientjes
2019-10-29 14:14                           ` Vlastimil Babka
2019-10-29 15:15                             ` Michal Hocko
2019-10-29 21:33                               ` Andrew Morton
2019-10-29 21:45                                 ` Vlastimil Babka
2019-10-29 23:25                                 ` David Rientjes
2019-11-05 13:02                                   ` Michal Hocko
2019-11-06  1:01                                     ` David Rientjes
2019-11-06  7:35                                       ` Michal Hocko
2019-11-06 21:32                                         ` David Rientjes
2019-11-13 11:20                                           ` Mel Gorman
2019-11-25  0:10                                             ` David Rientjes
2019-11-25 11:47                                               ` Michal Hocko
2019-11-25 20:38                                                 ` David Rientjes
2019-11-25 21:34                                                   ` Vlastimil Babka [this message]
2019-10-01 13:50                   ` Vlastimil Babka
2019-10-01 20:31                     ` David Rientjes
2019-10-01 21:54                       ` Vlastimil Babka
2019-10-02 10:34                         ` Michal Hocko
2019-10-02 22:32                           ` David Rientjes
2019-10-03  8:00                             ` Vlastimil Babka
2019-10-04 12:18                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8634664b-5e3a-5875-1344-bddd94a48716@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox