linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	David Rientjes <rientjes@google.com>,
	kirill.shutemov@linux.intel.com, adobriyan@gmail.com,
	Linux API <linux-api@vger.kernel.org>,
	Andrei Vagin <avagin@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Linux-MM layout <linux-mm@kvack.org>
Subject: Re: + mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch added to -mm tree
Date: Sun, 30 Dec 2018 19:55:56 +0200	[thread overview]
Message-ID: <20181230175555.GA31291@rapoport-lnx> (raw)
In-Reply-To: <00ec4644-70c2-4bd1-ec3f-b994fa0669e8@suse.cz>

On Fri, Dec 28, 2018 at 11:54:17AM +0100, Vlastimil Babka wrote:
> On 12/28/18 9:18 AM, Michal Hocko wrote:
> > On Thu 27-12-18 21:31:00, Andrew Morton wrote:
> >> On Thu, 27 Dec 2018 14:11:14 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Mon, Dec 24, 2018 at 10:17:31AM +0100, Michal Hocko wrote:
> >>>> On Mon 24-12-18 01:05:57, David Rientjes wrote:
> >>>> [...]
> >>>>> I'm not interested in having a 100 email thread about this when a clear 
> >>>>> and simple fix exists that actually doesn't break user code.
> >>>>
> >>>> You are breaking everybody who really wants to query MADV_NOHUGEPAGE
> >>>> status by this flag. Is there anybody doing that?
> >>>
> >>> Yes.
> >>>
> >>> https://github.com/checkpoint-restore/criu/blob/v3.11/criu/proc_parse.c#L143
> >>
> >> Ugh.  So the regression fix causes a regression?
> > 
> > Yes. The patch from David will hardcode the nohugepage vm flag if the
> > THP was disabled by the prctl at the time of the snapshot. And if the
> > application later enables THP by the prctl then existing mappings would
> > never get the THP enabled status back.
> > 
> > This is the kind of a potential regression I was poiting out earlier
> > when explaining that the patch encodes the logic into the flag exporting
> > and that means that whoever wants to get the raw value of the flag will
> > not be able to do so. Please note that the raw value is exactly what
> > this interface is documented and supposed to export. And as the
> > documentation explains it is implementation specific and anybody to use
> > it should be careful.
> 
> Let's add some CRIU guys in the loop (dunno if the right ones). We're
> discussing David's patch [1] that makes 'nh' and 'hg' flags reported in
> /proc/pid/smaps (and set by madvise) overridable by
> prctl(PR_SET_THP_DISABLE). This was sort of accidental behavior (but
> only for mappings created after the prctl call) before 4.13 commit
> 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active").
> 
> For David's userspace that commit is a regression as there are false
> positives when checking for vma's that are eligible for THP (=don't have
> the 'nh' flag in smaps) but didn't really obtain THP's. The userspace
> assumes it's due to fragmentation (=bad) and cannot know that it's due
> to the prctl(). But we fear that making prctl() affect smaps vma flags
> means that CRIU can't query them accurately anymore, and thus it's a
> regression for CRIU. Can you comment on that?

CRIU parses VmFlags from /proc/pid/smaps during the checkpoint and then
uses their raw values to recreate exactly the same mappings during restore.
We use appropriate madvise() calls to re-establish VMA flags.

Implicitly setting the 'nh' flag in /proc/pid/smaps when THP usage was
restricted with prctl() will make CRIU to call madvise(MADV_NOHUPEPAGE) for
all the mappings with 'nh' set as CRIU cannot detect the actual reason for
VMA being marked as VM_NOHUGEPAGE.

As Andrew pointed out, if the restored process will re-enable THP with
prtcl(), the VMAs will retail VM_NOHUGEPAGE flag. And, I don't see how can
we work around this in CRIU.

> Michal has a patch [2] that reports the prctl() status separately, but
> that doesn't help David's existing userspace. For CRIU this also won't
> help as long the smaps vma flags still silently included the prctl()
> status. Do you see some solution that would work for everybody?

Nothing comes to mind at the moment, maybe next year ;-)

> [1]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
> [2]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-proc-report-pr_set_thp_disable-in-proc.patch
> 

-- 
Sincerely yours,
Mike.

  parent reply	other threads:[~2018-12-30 17:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.DEB.2.21.1812210123210.232416@chino.kir.corp.google.com>
     [not found] ` <14e15543-c18b-6fa0-e107-194216ef3ada@suse.cz>
     [not found]   ` <20181221151256.GA6410@dhcp22.suse.cz>
     [not found]     ` <20181221140301.0e87b79b923ceb6d0f683749@linux-foundation.org>
     [not found]       ` <alpine.DEB.2.21.1812211419320.219499@chino.kir.corp.google.com>
     [not found]         ` <20181224080426.GC9063@dhcp22.suse.cz>
     [not found]           ` <alpine.DEB.2.21.1812240058060.114867@chino.kir.corp.google.com>
     [not found]             ` <20181224091731.GB16738@dhcp22.suse.cz>
     [not found]               ` <20181227111114.5tvvkddyp7cytzeb@kshutemo-mobl1>
     [not found]                 ` <20181227213100.aeee730c1f9ec5cb11de39a3@linux-foundation.org>
     [not found]                   ` <20181228081847.GP16738@dhcp22.suse.cz>
2018-12-28 10:54                     ` Vlastimil Babka
2018-12-28 12:19                       ` Michal Hocko
2018-12-28 12:35                       ` Cyrill Gorcunov
2018-12-30 17:55                       ` Mike Rapoport [this message]
2019-01-15  6:32                       ` Mike Rapoport
2019-01-21 10:21                         ` Michal Hocko
2019-01-21 18:00                           ` Cyrill Gorcunov
2019-01-21 18:18                             ` Michal Hocko
2019-01-21 18:24                               ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181230175555.GA31291@rapoport-lnx \
    --to=rppt@linux.ibm.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox