linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Stefan Priebe <s.priebe@profihost.ag>
Subject: Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings
Date: Wed, 12 Sep 2018 09:54:17 -0400	[thread overview]
Message-ID: <20180912135417.GA15194@redhat.com> (raw)
In-Reply-To: <20180911115613.GR10951@dhcp22.suse.cz>

Hello,

On Tue, Sep 11, 2018 at 01:56:13PM +0200, Michal Hocko wrote:
> Well, it seems that expectations differ for users. It seems that kvm
> users do not really agree with your interpretation.

Like David also mentioned here:

lkml.kernel.org/r/alpine.DEB.2.21.1808211021110.258924@chino.kir.corp.google.com

depends on the hardware what is a win, so there's no one size fits
all.

For two sockets providing remote THP to KVM is likely a win, but
changing the defaults depending on boot-time NUMA topology makes
things less deterministic and it's also impossible to define an exact
break even point.

> I do realize that this is a gray zone because nobody bothered to define
> the semantic since the MADV_HUGEPAGE has been introduced (a826e422420b4
> is exceptionaly short of information). So we are left with more or less
> undefined behavior and define it properly now. As we can see this might
> regress in some workloads but I strongly suspect that an explicit
> binding sounds more logical approach than a thp specific mpol mode. If
> anything this should be a more generic memory policy basically saying
> that a zone/node reclaim mode should be enabled for the particular
> allocation.

MADV_HUGEPAGE means the allocation is long lived, so the cost of
compaction is worth it in direct reclaim. Not much else. That is not
the problem.

The problem is that even if you ignore the breakage and regression to
real life workloads, what is happening right now obviously would
require root privilege but MADV_HUEGPAGE requires no root privilege.

Swapping heavy because MADV_HUGEPAGE when there are gigabytes free on
other nodes and not even 4k would be swapped-out with THP turned off
in sysfs, is simply not possibly what MADV_HUGEPAGE could have been
about, and it's a kernel regression that never existed until that
commit that added __GFP_THISNODE to the default THP heuristic in
mempolicy.

I think we should defer the problem of what is better between 4k NUMA
local or remote THP by default for later, I provided two options
myself because it didn't matter so much which option we picked in the
short term, as long as the bug was fixed.

I wasn't particularly happy about your patch because it still swaps
with certain defrag settings which is still allowing things that
shouldn't happen without some kind of privileged capability.

If you can update the patch to prevent swapping in all cases it's a go
as far as I'm concerned. The main difference is that you're dropping
the THP logic in the mempolicy code which will make it worse for some
case and I was trying to retain it for all cases where swapping
wouldn't happen anyway and such logic would have still provided the
behavior David prefers to those cases.

Adding the new feature to create a THP specific mempolicy can be done
later. In the meanwhile the current mempolicy code can always override
whatever THP default behavior that gets out of this, just it will
require the admin to setup a mempolicy to enforce the preferred
behavior to 4k and THP allocations alike.

Thanks,
Andrea

  parent reply	other threads:[~2018-09-12 13:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 13:05 Michal Hocko
2018-09-08 18:52 ` Stefan Priebe - Profihost AG
2018-09-10  7:39   ` Michal Hocko
2018-09-11  9:03   ` Vlastimil Babka
2018-09-10 20:08 ` David Rientjes
2018-09-10 20:22   ` Stefan Priebe - Profihost AG
2018-09-11  8:51   ` Vlastimil Babka
2018-09-11 11:56   ` Michal Hocko
2018-09-11 20:30     ` David Rientjes
2018-09-12 12:05       ` Michal Hocko
2018-09-12 20:40         ` David Rientjes
2018-09-12 13:54     ` Andrea Arcangeli [this message]
2018-09-12 14:21       ` Michal Hocko
2018-09-12 15:25         ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2018-08-23 10:52 [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Michal Hocko
2018-08-28  7:53 ` Michal Hocko
2018-08-28  8:18   ` Michal Hocko
     [not found]     ` <D5F4A33C-0A37-495C-9468-D6866A862097@cs.rutgers.edu>
2018-08-29 14:28       ` Michal Hocko
2018-08-29 14:35         ` Michal Hocko
2018-08-29 15:22           ` Zi Yan
2018-08-29 15:47             ` Michal Hocko
2018-08-29 16:06               ` Zi Yan
2018-08-29 16:25                 ` Michal Hocko
2018-08-29 19:24                   ` [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-08-29 22:54                     ` Zi Yan
2018-08-30  7:00                       ` Michal Hocko
2018-08-30 13:22                         ` Zi Yan
2018-08-30 13:45                           ` Michal Hocko
2018-08-30 14:02                             ` Zi Yan
2018-08-30 16:19                               ` Stefan Priebe - Profihost AG
2018-08-30 16:40                               ` Michal Hocko
2018-09-05  3:44                                 ` Andrea Arcangeli
2018-09-05  7:08                                   ` Michal Hocko
2018-09-06 11:10                                     ` Vlastimil Babka
2018-09-06 11:16                                       ` Vlastimil Babka
2018-09-06 11:25                                         ` Michal Hocko
2018-09-06 12:35                                           ` Zi Yan
2018-09-06 10:59                       ` Vlastimil Babka
2018-09-06 11:17                         ` Zi Yan
2018-08-30  6:47                     ` Michal Hocko
2018-09-06 11:18                       ` Vlastimil Babka
2018-09-06 11:27                         ` Michal Hocko
2018-09-12 17:29                     ` Mel Gorman
2018-09-17  6:11                       ` Michal Hocko
2018-09-17  7:04                         ` Stefan Priebe - Profihost AG
2018-09-17  9:32                           ` Stefan Priebe - Profihost AG
2018-09-17 11:27                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912135417.GA15194@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox