From: Mike Rapoport <rppt@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
David Rientjes <rientjes@google.com>,
kirill.shutemov@linux.intel.com, adobriyan@gmail.com,
Linux API <linux-api@vger.kernel.org>,
Andrei Vagin <avagin@gmail.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Pavel Emelyanov <xemul@virtuozzo.com>,
Linux-MM layout <linux-mm@kvack.org>
Subject: Re: + mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch added to -mm tree
Date: Sun, 30 Dec 2018 19:55:56 +0200 [thread overview]
Message-ID: <20181230175555.GA31291@rapoport-lnx> (raw)
In-Reply-To: <00ec4644-70c2-4bd1-ec3f-b994fa0669e8@suse.cz>
On Fri, Dec 28, 2018 at 11:54:17AM +0100, Vlastimil Babka wrote:
> On 12/28/18 9:18 AM, Michal Hocko wrote:
> > On Thu 27-12-18 21:31:00, Andrew Morton wrote:
> >> On Thu, 27 Dec 2018 14:11:14 +0300 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >>
> >>> On Mon, Dec 24, 2018 at 10:17:31AM +0100, Michal Hocko wrote:
> >>>> On Mon 24-12-18 01:05:57, David Rientjes wrote:
> >>>> [...]
> >>>>> I'm not interested in having a 100 email thread about this when a clear
> >>>>> and simple fix exists that actually doesn't break user code.
> >>>>
> >>>> You are breaking everybody who really wants to query MADV_NOHUGEPAGE
> >>>> status by this flag. Is there anybody doing that?
> >>>
> >>> Yes.
> >>>
> >>> https://github.com/checkpoint-restore/criu/blob/v3.11/criu/proc_parse.c#L143
> >>
> >> Ugh. So the regression fix causes a regression?
> >
> > Yes. The patch from David will hardcode the nohugepage vm flag if the
> > THP was disabled by the prctl at the time of the snapshot. And if the
> > application later enables THP by the prctl then existing mappings would
> > never get the THP enabled status back.
> >
> > This is the kind of a potential regression I was poiting out earlier
> > when explaining that the patch encodes the logic into the flag exporting
> > and that means that whoever wants to get the raw value of the flag will
> > not be able to do so. Please note that the raw value is exactly what
> > this interface is documented and supposed to export. And as the
> > documentation explains it is implementation specific and anybody to use
> > it should be careful.
>
> Let's add some CRIU guys in the loop (dunno if the right ones). We're
> discussing David's patch [1] that makes 'nh' and 'hg' flags reported in
> /proc/pid/smaps (and set by madvise) overridable by
> prctl(PR_SET_THP_DISABLE). This was sort of accidental behavior (but
> only for mappings created after the prctl call) before 4.13 commit
> 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active").
>
> For David's userspace that commit is a regression as there are false
> positives when checking for vma's that are eligible for THP (=don't have
> the 'nh' flag in smaps) but didn't really obtain THP's. The userspace
> assumes it's due to fragmentation (=bad) and cannot know that it's due
> to the prctl(). But we fear that making prctl() affect smaps vma flags
> means that CRIU can't query them accurately anymore, and thus it's a
> regression for CRIU. Can you comment on that?
CRIU parses VmFlags from /proc/pid/smaps during the checkpoint and then
uses their raw values to recreate exactly the same mappings during restore.
We use appropriate madvise() calls to re-establish VMA flags.
Implicitly setting the 'nh' flag in /proc/pid/smaps when THP usage was
restricted with prctl() will make CRIU to call madvise(MADV_NOHUPEPAGE) for
all the mappings with 'nh' set as CRIU cannot detect the actual reason for
VMA being marked as VM_NOHUGEPAGE.
As Andrew pointed out, if the restored process will re-enable THP with
prtcl(), the VMAs will retail VM_NOHUGEPAGE flag. And, I don't see how can
we work around this in CRIU.
> Michal has a patch [2] that reports the prctl() status separately, but
> that doesn't help David's existing userspace. For CRIU this also won't
> help as long the smaps vma flags still silently included the prctl()
> status. Do you see some solution that would work for everybody?
Nothing comes to mind at the moment, maybe next year ;-)
> [1]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
> [2]
> https://www.ozlabs.org/~akpm/mmots/broken-out/mm-proc-report-pr_set_thp_disable-in-proc.patch
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2018-12-30 17:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <alpine.DEB.2.21.1812210123210.232416@chino.kir.corp.google.com>
[not found] ` <14e15543-c18b-6fa0-e107-194216ef3ada@suse.cz>
[not found] ` <20181221151256.GA6410@dhcp22.suse.cz>
[not found] ` <20181221140301.0e87b79b923ceb6d0f683749@linux-foundation.org>
[not found] ` <alpine.DEB.2.21.1812211419320.219499@chino.kir.corp.google.com>
[not found] ` <20181224080426.GC9063@dhcp22.suse.cz>
[not found] ` <alpine.DEB.2.21.1812240058060.114867@chino.kir.corp.google.com>
[not found] ` <20181224091731.GB16738@dhcp22.suse.cz>
[not found] ` <20181227111114.5tvvkddyp7cytzeb@kshutemo-mobl1>
[not found] ` <20181227213100.aeee730c1f9ec5cb11de39a3@linux-foundation.org>
[not found] ` <20181228081847.GP16738@dhcp22.suse.cz>
2018-12-28 10:54 ` Vlastimil Babka
2018-12-28 12:19 ` Michal Hocko
2018-12-28 12:35 ` Cyrill Gorcunov
2018-12-30 17:55 ` Mike Rapoport [this message]
2019-01-15 6:32 ` Mike Rapoport
2019-01-21 10:21 ` Michal Hocko
2019-01-21 18:00 ` Cyrill Gorcunov
2019-01-21 18:18 ` Michal Hocko
2019-01-21 18:24 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181230175555.GA31291@rapoport-lnx \
--to=rppt@linux.ibm.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@gmail.com \
--cc=gorcunov@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
--cc=xemul@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox