Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	 Andrea Arcangeli <aarcange@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,  Zi Yan <zi.yan@cs.rutgers.edu>,
	 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"
Date: Thu, 13 Jun 2019 13:17:49 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1906131300220.27665@chino.kir.corp.google.com> (raw)
In-Reply-To: <20190607083255.GA18435@dhcp22.suse.cz>

On Fri, 7 Jun 2019, Michal Hocko wrote:

> > So my proposed change would be:
> >  - give the page allocator a consistent indicator that compaction failed
> >    because we are low on memory (make COMPACT_SKIPPED really mean this),
> >  - if we get this in the page allocator and we are allocating thp, fail,
> >    reclaim is unlikely to help here and is much more likely to be
> >    disruptive
> >      - we could retry compaction if we haven't scanned all memory and
> >        were contended,
> >  - if the hugepage allocation fails, have thp check watermarks for order-0 
> >    pages without any padding,
> >  - if watermarks succeed, fail the thp allocation: we can't allocate
> >    because of fragmentation and it's better to return node local memory,
> 
> Doesn't this lead to the same THP low success rate we have seen with one
> of the previous patches though?
> 

From my recollection, the only other patch that was tested involved 
__GFP_NORETRY and avoiding reclaim entirely for thp allocations when 
checking for high-order allocations.

This in the page allocator:

                /*
                 * Checks for costly allocations with __GFP_NORETRY, which
                 * includes THP page fault allocations
                 */
                if (costly_order && (gfp_mask & __GFP_NORETRY)) {
			...
			if (compact_result == COMPACT_DEFERRED)
				goto nopage;

Yet there is no handling for COMPACT_SKIPPED (or what my plan above 
defines COMPACT_SKIPPED to be).  I don't think anything was tried that 
tests why compaction failed, i.e. was it because the two scanners met, 
because hugepage-order memory was found available, because the zone lock 
was contended or we hit need_resched(), we're failing even order-0 
watermarks, etc.  I don't think the above plan has been tried, if someone 
has tried it, please let me know.

I haven't seen any objection to disabling reclaim entirely when order-0 
watermarks are failing in compaction.  We simply can't guarantee that it 
is useful work with the current implementation of compaction.  There are 
several reasons that I've enumerated why compaction can still fail even 
after successful reclaim.

The point is that removing __GFP_THISNODE is not a fix for this if the 
remote memory is fragmented as well: it assumes that hugepages are 
available remotely when they aren't available locally otherwise we seem 
swap storms both locally and remotely.  Relying on that is not in the best 
interest of any user of transparent hugepages.

> Let me remind you of the previous semantic I was proposing
> http://lkml.kernel.org/r/20181206091405.GD1286@dhcp22.suse.cz and that
> didn't get shot down. Linus had some follow up ideas on how exactly
> the fallback order should look like and that is fine. We should just
> measure differences between local node cheep base page vs. remote THP on
> _real_ workloads. Any microbenchmark which just measures a latency is
> inherently misleading.
> 

I think both seek to add the possibility of allocating hugepages remotely 
in certain circumstances and that can be influenced by MADV_HUGEPAGE.  I 
don't think we need to try hugepage specific mempolicies unless it is 
shown to be absolutely necessary although a usecase for this could be made 
separate to this discussion.

There's a benefit to faulting remote hugepages over remote pages for 
everybody involved.  My argument is that we can determine the need for 
that based on failed order-0 watermark checks in compaction: if the node 
would require reclaim to even fault a page, it is likely better over the 
long term to fault a remote hugepage.

I think this can be made to work and is not even difficult to do.  If 
anybody has any objection please let me know.

next prev parent reply	other threads:[~2019-06-13 20:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-03 22:31 [PATCH 0/2] reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Andrea Arcangeli
2019-05-03 22:31 ` [PATCH 1/2] Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Andrea Arcangeli
2019-05-04 12:03   ` Michal Hocko
2019-05-03 22:31 ` [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations" Andrea Arcangeli
2019-05-04 12:11   ` Michal Hocko
2019-05-09  8:38   ` Mel Gorman
2019-05-15 20:26   ` David Rientjes
2019-05-20 15:36     ` Mel Gorman
2019-05-20 17:54       ` David Rientjes
2019-05-24  0:57         ` Andrew Morton
2019-05-24 20:29           ` Andrea Arcangeli
2019-05-29  2:06             ` David Rientjes
2019-05-29 21:24           ` David Rientjes
2019-05-31  9:22             ` Michal Hocko
2019-05-31 21:53               ` David Rientjes
2019-06-05  9:32                 ` Michal Hocko
2019-06-06 22:12                   ` David Rientjes
2019-06-07  8:32                     ` Michal Hocko
2019-06-13 20:17                       ` David Rientjes [this message]
2019-06-21 21:16                     ` David Rientjes
2019-07-30 13:11           ` Michal Hocko
2019-07-30 18:05             ` Andrew Morton
2019-07-31  8:17               ` Michal Hocko
2019-05-24 10:07         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1906131300220.27665@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox