From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Mon, 9 Sep 2019 21:30:20 +0200 [thread overview]
Message-ID: <20190909193020.GD2063@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1909051400380.217933@chino.kir.corp.google.com>
On Thu 05-09-19 14:06:28, David Rientjes wrote:
> On Wed, 4 Sep 2019, Andrea Arcangeli wrote:
>
> > > This is an admittedly hacky solution that shouldn't cause anybody to
> > > regress based on NUMA and the semantics of MADV_HUGEPAGE for the past
> > > 4 1/2 years for users whose workload does fit within a socket.
> >
> > How can you live with the below if you can't live with 5.3-rc6? Here
> > you allocate remote THP if the local THP allocation fails.
> >
> > > page = __alloc_pages_node(hpage_node,
> > > gfp | __GFP_THISNODE, order);
> > > +
> > > + /*
> > > + * If hugepage allocations are configured to always
> > > + * synchronous compact or the vma has been madvised
> > > + * to prefer hugepage backing, retry allowing remote
> > > + * memory as well.
> > > + */
> > > + if (!page && (gfp & __GFP_DIRECT_RECLAIM))
> > > + page = __alloc_pages_node(hpage_node,
> > > + gfp | __GFP_NORETRY, order);
> > > +
> >
> > You're still going to get THP allocate remote _before_ you have a
> > chance to allocate 4k local this way. __GFP_NORETRY won't make any
> > difference when there's THP immediately available in the remote nodes.
> >
>
> This is incorrect: the fallback allocation here is only if the initial
> allocation with __GFP_THISNODE fails. In that case, we were able to
> compact memory to make a local hugepage available without incurring
> excessive swap based on the RFC patch that appears as patch 3 in this
> series.
That patch is quite obscure and specific to pageblock_order+ sizes and
for some reason requires __GPF_IO without any explanation on why. The
problem is not THP specific, right? Any other high order has the same
problem AFAICS. So it is just a hack and that's why it is hard to reason
about.
I believe it would be the best to start by explaining why we do not see
the same problem with order-0 requests. We do not enter the slow path
and thus the memory reclaim if there is any other node to pass through
watermakr as well right? So essentially we are relying on kswapd to keep
nodes balanced so that allocation request can be satisfied from a local
node. We do have kcompactd to do background compaction. Why do we want
to rely on the direct compaction instead? What is the fundamental
difference?
Your changelog goes in length about some problems in the compaction but
I really do not see the underlying problem description. We cannot do any
sensible fix/heuristic without capturing that IMHO. Either there is
some fundamental difference between direct and background compaction
and doing a the former one is necessary and we should be doing that by
default for all higher order requests that are sleepable (aka
__GFP_DIRECT_RECLAIM) or there is something to fix for the background
compaction to be more pro-active.
> > I said one good thing about this patch series, that it fixes the swap
> > storms. But upstream 5.3 fixes the swap storms too and what you sent
> > is not nearly equivalent to the mempolicy that Michal was willing
> > to provide you and that we thought you needed to get bigger guarantees
> > of getting only local 2m or local 4k pages.
> >
>
> I haven't seen such a patch series, is there a link?
not yet unfortunatelly. So far I haven't heard that you are even
interested in that policy. You have never commented on that IIRC.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2019-09-09 19:30 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-04 19:54 David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05 9:00 ` Michal Hocko
2019-09-05 11:22 ` Vlastimil Babka
2019-09-05 20:53 ` Mike Kravetz
2019-09-06 20:16 ` David Rientjes
2019-09-06 20:49 ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54 ` David Rientjes
2019-09-07 19:51 ` David Rientjes
2019-09-07 19:55 ` Linus Torvalds
2019-09-08 1:50 ` David Rientjes
2019-09-08 12:47 ` Vlastimil Babka
2019-09-08 20:45 ` David Rientjes
2019-09-09 8:37 ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06 ` David Rientjes
2019-09-09 19:30 ` Michal Hocko [this message]
2019-09-25 7:08 ` Michal Hocko
2019-09-26 19:03 ` David Rientjes
2019-09-27 7:48 ` Michal Hocko
2019-09-28 20:59 ` Linus Torvalds
2019-09-30 11:28 ` Michal Hocko
2019-10-01 5:43 ` Michal Hocko
2019-10-01 8:37 ` Michal Hocko
2019-10-18 14:15 ` Michal Hocko
2019-10-23 11:03 ` Vlastimil Babka
2019-10-24 18:59 ` David Rientjes
2019-10-29 14:14 ` Vlastimil Babka
2019-10-29 15:15 ` Michal Hocko
2019-10-29 21:33 ` Andrew Morton
2019-10-29 21:45 ` Vlastimil Babka
2019-10-29 23:25 ` David Rientjes
2019-11-05 13:02 ` Michal Hocko
2019-11-06 1:01 ` David Rientjes
2019-11-06 7:35 ` Michal Hocko
2019-11-06 21:32 ` David Rientjes
2019-11-13 11:20 ` Mel Gorman
2019-11-25 0:10 ` David Rientjes
2019-11-25 11:47 ` Michal Hocko
2019-11-25 20:38 ` David Rientjes
2019-11-25 21:34 ` Vlastimil Babka
2019-10-01 13:50 ` Vlastimil Babka
2019-10-01 20:31 ` David Rientjes
2019-10-01 21:54 ` Vlastimil Babka
2019-10-02 10:34 ` Michal Hocko
2019-10-02 22:32 ` David Rientjes
2019-10-03 8:00 ` Vlastimil Babka
2019-10-04 12:18 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190909193020.GD2063@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox