From: "Zach O'Keefe" <zokeefe@google.com>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>,
linux-mm@kvack.org, linux-man@vger.kernel.org,
Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [PATCH man-pages v4] madvise.2: add documentation for MADV_COLLAPSE
Date: Mon, 31 Oct 2022 17:38:01 -0700 [thread overview]
Message-ID: <CAAa6QmQN1u5ynyE7Lce9xEKwRQpG6OU8ZOcgFk5nc1h-AN4YgQ@mail.gmail.com> (raw)
In-Reply-To: <4b4a42ee-9243-96aa-b581-d56ae420f84a@gmail.com>
Hey Alex,
On Mon, Oct 31, 2022 at 4:37 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hey Zach!
>
> On 10/31/22 23:55, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> > MADV_COLLAPSE"). Update the man-pages for madvise(2) and
> > process_madvise(2).
> >
> > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Okay, now I have some more comments:
>
Thank you :)
> - A few changes about semantic newlines. See a diff at the bottom of this email
> that you can apply.
>
> - An accident.
>
> - Some paragraph I don't really understand.
>
> Cheers,
>
> Alex
>
> > ---
> >
> > v3[1] -> v4
> > - Rebased to latest master
> > - (Alejandro Colomar) Fixed weird, non-ascii chars: e2 80 99 -> "'"
> > - (Alejandro Colomar) Replaced .BR with .B directive when the entire
> > line was bold (no non-bold part)
> >
> > [1] https://lore.kernel.org/linux-man/bb3b5c3c-3966-ea1a-6d84-4f7f3afa37ca@gmail.com/T/#u
> >
> > man2/madvise | 0
> > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++-
> > man2/process_madvise.2 | 10 +++++
> > 3 files changed, 98 insertions(+), 2 deletions(-)
> > create mode 100644 man2/madvise
> >
> > diff --git a/man2/madvise b/man2/madvise
> > new file mode 100644
> > index 000000000..e69de29bb
>
> Heh! This was a funny accident. I realized because autocomplete showed it as a
> possibility. :)
>
> The diff at the bottom removes it.
>
Sorry about that - thanks for noticing!
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index edf805740..dca42c7d6 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -386,9 +386,10 @@ set (see
> > .BR prctl (2)).
> > .IP
> > The
> > -.B MADV_HUGEPAGE
> > +.BR MADV_HUGEPAGE ,
> > +.BR MADV_NOHUGEPAGE ,
> > and
> > -.B MADV_NOHUGEPAGE
> > +.B MADV_COLLAPSE
> > operations are available only if the kernel was configured with
> > .B CONFIG_TRANSPARENT_HUGEPAGE
> > and file/shmem memory is only supported if the kernel was configured with
> > @@ -401,6 +402,81 @@ and
> > .I length
> > will not be backed by transparent hugepages.
> > .TP
> > +.BR MADV_COLLAPSE " (since Linux 6.1)"
> > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> > +Perform a best-effort synchronous collapse of the native pages mapped by the
> > +memory range into Transparent Huge Pages (THPs).
> > +.B MADV_COLLAPSE
> > +operates on the current state of memory of the calling process and makes no
> > +persistent changes or guarantees on how pages will be mapped,
> > +constructed,
> > +or faulted in the future.
> > +.IP
> > +.B MADV_COLLAPSE
> > +supports private anonymous pages (see
> > +.BR mmap (2)),
> > +shmem pages,
> > +and file-backed pages.
> > +See
> > +.B MADV_HUGEPAGE
> > +for general information on memory requirements for THP.
> > +If the range provided spans multiple VMAs,
> > +the semantics of the collapse over each VMA is independent from the others.
> > +If collapse of a given huge page-aligned/sized region fails,
> > +the operation may continue to attempt collapsing the remainder of the
> > +specified memory.
> > +.B MADV_COLLAPSE
> > +will automatically clamp the provided range to be hugepage-aligned.
> > +.IP
> > +All non-resident pages covered by the range will first be
> > +swapped/faulted-in,
> > +before being copied onto a freshly allocated hugepage.
> > +If the native pages compose the same PTE-mapped hugepage,
> > +and are suitably aligned,
> > +allocation of a new hugepage may be elided and collapse may happen
> > +in-place.
> > +Unmapped pages will have their data directly initialized to 0 in the new
> > +hugepage.
> > +However,
> > +for every eligible hugepage-aligned/sized region to be collapsed,
> > +at least one page must currently be backed by physical memory.
> > +.IP
> > +.B MADV_COLLAPSE
> > +is independent of any sysfs
> > +(see
> > +.BR sysfs (5))
> > +setting under
> > +.IR /sys/kernel/mm/transparent_hugepage ,
> > +both in terms of determining THP eligibility,
> > +and allocation semantics.
> > +See Linux kernel source file
> > +.I Documentation/admin\-guide/mm/transhuge.rst
> > +for more information.
> > +.B MADV_COLLAPSE
> > +also ignores
> > +.B huge=
> > +tmpfs mount when operating on tmpfs files.
> > +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> > +regardless of VMA flags
> > +(though
> > +.B VM_NOHUGEPAGE
> > +is still respected).
> > +.IP
> > +When the system has multiple NUMA nodes,
> > +the hugepage will be allocated from the node providing the most native
> > +pages.
> > +.IP
> > +If all hugepage-sized/aligned regions covered by the provided range were
> > +either successfully collapsed,
> > +or were already PMD-mapped THPs,
> > +this operation will be deemed successful.
> > +Note that this doesn't guarantee anything about other possible mappings of
> > +the memory.
> > +Also note that many failures might have occurred since the operation may
> > +continue to collapse in the event collapse of a single hugepage-sized/aligned
> > +region fails.
>
> I don't understand this last paragraph (since "Also note ..."). Could you
> please reword it a little bit?
>
Sure - I can see that it's hard to parse.
Further up I note that, "If collapse of a given huge
page-aligned/sized region fails, the operation may continue to attempt
collapsing the remainder of the specified memory."
Then perhaps it's enough to just state, "In the event multiple
hugepage-aligned/sized areas fail to collapse, only the most
recently-failed code will be set in errno"
The idea here being: errno only communicates the reason for 1/N
failures that might have occured.
However -- on second thought -- perhaps this isn't particularly
useful, as it's already implied. So, my new suggestion would be that
we should drop it. What do you think?
> > +.TP
> > .BR MADV_DONTDUMP " (since Linux 3.4)"
> > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
> > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> > @@ -620,6 +696,11 @@ A kernel resource was temporarily unavailable.
> > .B EBADF
> > The map exists, but the area maps something that isn't a file.
> > .TP
> > +.B EBUSY
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Could not charge hugepage to cgroup: cgroup limit exceeded.
> > +.TP
> > .B EFAULT
> > .I advice
> > is
> > @@ -717,6 +798,11 @@ maximum resident set size.
> > Not enough memory: paging in failed.
> > .TP
> > .B ENOMEM
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Not enough memory: could not allocate hugepage.
> > +.TP
> > +.B ENOMEM
> > Addresses in the specified range are not currently
> > mapped, or are outside the address space of the process.
> > .TP
> > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> > index ac98850a9..92878286b 100644
> > --- a/man2/process_madvise.2
> > +++ b/man2/process_madvise.2
> > @@ -73,6 +73,10 @@ argument is one of the following values:
> > See
> > .BR madvise (2).
> > .TP
> > +.B MADV_COLLAPSE
> > +See
> > +.BR madvise (2).
> > +.TP
> > .B MADV_PAGEOUT
> > See
> > .BR madvise (2).
> > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
> > .TP
> > .B ESRCH
> > The target process does not exist (i.e., it has terminated and been waited on).
> > +.PP
> > +See
> > +.BR madvise (2)
> > +for
> > +.IR advice -specific
> > +errors.
> > .SH VERSIONS
> > This system call first appeared in Linux 5.10.
> > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
>
> Diff for changing a few line breaks (and removing the spurious file):
>
Thank you so much for this! :)
> diff --git a/man2/madvise b/man2/madvise
> deleted file mode 100644
> index e69de29bb..000000000
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index dca42c7d6..7f34301d3 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -405,11 +405,12 @@ .SS Linux-specific advice values
> .BR MADV_COLLAPSE " (since Linux 6.1)"
> .\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> .\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> -Perform a best-effort synchronous collapse of the native pages mapped by the
> -memory range into Transparent Huge Pages (THPs).
> +Perform a best-effort synchronous collapse of
> +the native pages mapped by the memory range
> +into Transparent Huge Pages (THPs).
> .B MADV_COLLAPSE
> -operates on the current state of memory of the calling process and makes no
> -persistent changes or guarantees on how pages will be mapped,
> +operates on the current state of memory of the calling process and
> +makes no persistent changes or guarantees on how pages will be mapped,
> constructed,
> or faulted in the future.
> .IP
> @@ -424,20 +425,20 @@ .SS Linux-specific advice values
> If the range provided spans multiple VMAs,
> the semantics of the collapse over each VMA is independent from the others.
> If collapse of a given huge page-aligned/sized region fails,
> -the operation may continue to attempt collapsing the remainder of the
> -specified memory.
> +the operation may continue to attempt collapsing
> +the remainder of the specified memory.
> .B MADV_COLLAPSE
> will automatically clamp the provided range to be hugepage-aligned.
> .IP
> -All non-resident pages covered by the range will first be
> -swapped/faulted-in,
> +All non-resident pages covered by the range
> +will first be swapped/faulted-in,
> before being copied onto a freshly allocated hugepage.
> If the native pages compose the same PTE-mapped hugepage,
> and are suitably aligned,
> -allocation of a new hugepage may be elided and collapse may happen
> -in-place.
> -Unmapped pages will have their data directly initialized to 0 in the new
> -hugepage.
> +allocation of a new hugepage may be elided and
> +collapse may happen in-place.
> +Unmapped pages will have their data directly initialized to 0
> +in the new hugepage.
> However,
> for every eligible hugepage-aligned/sized region to be collapsed,
> at least one page must currently be backed by physical memory.
> @@ -464,15 +465,15 @@ .SS Linux-specific advice values
> is still respected).
> .IP
> When the system has multiple NUMA nodes,
> -the hugepage will be allocated from the node providing the most native
> -pages.
> +the hugepage will be allocated from
> +the node providing the most native pages.
> .IP
> If all hugepage-sized/aligned regions covered by the provided range were
> either successfully collapsed,
> or were already PMD-mapped THPs,
> this operation will be deemed successful.
> -Note that this doesn't guarantee anything about other possible mappings of
> -the memory.
> +Note that this doesn't guarantee anything about
> +other possible mappings of the memory.
> Also note that many failures might have occurred since the operation may
> continue to collapse in the event collapse of a single hugepage-sized/aligned
> region fails.
>
>
> --
> <http://www.alejandro-colomar.es/>
Best,
Zach
next prev parent reply other threads:[~2022-11-01 0:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 22:55 Zach OKeefe
2022-10-31 23:36 ` Alejandro Colomar
2022-11-01 0:38 ` Zach O'Keefe [this message]
2022-11-01 11:38 ` Alejandro Colomar
2022-11-01 15:04 ` Zach O'Keefe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAa6QmQN1u5ynyE7Lce9xEKwRQpG6OU8ZOcgFk5nc1h-AN4YgQ@mail.gmail.com \
--to=zokeefe@google.com \
--cc=alx.manpages@gmail.com \
--cc=linux-man@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mtk.manpages@gmail.com \
--cc=shy828301@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox