Hey Zach! On 10/31/22 23:55, Zach OKeefe wrote: > From: Zach O'Keefe > > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > MADV_COLLAPSE"). Update the man-pages for madvise(2) and > process_madvise(2). > > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > Signed-off-by: Zach O'Keefe Okay, now I have some more comments: - A few changes about semantic newlines. See a diff at the bottom of this email that you can apply. - An accident. - Some paragraph I don't really understand. Cheers, Alex > --- > > v3[1] -> v4 > - Rebased to latest master > - (Alejandro Colomar) Fixed weird, non-ascii chars: e2 80 99 -> "'" > - (Alejandro Colomar) Replaced .BR with .B directive when the entire > line was bold (no non-bold part) > > [1] https://lore.kernel.org/linux-man/bb3b5c3c-3966-ea1a-6d84-4f7f3afa37ca@gmail.com/T/#u > > man2/madvise | 0 > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > man2/process_madvise.2 | 10 +++++ > 3 files changed, 98 insertions(+), 2 deletions(-) > create mode 100644 man2/madvise > > diff --git a/man2/madvise b/man2/madvise > new file mode 100644 > index 000000000..e69de29bb Heh! This was a funny accident. I realized because autocomplete showed it as a possibility. :) The diff at the bottom removes it. > diff --git a/man2/madvise.2 b/man2/madvise.2 > index edf805740..dca42c7d6 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -386,9 +386,10 @@ set (see > .BR prctl (2)). > .IP > The > -.B MADV_HUGEPAGE > +.BR MADV_HUGEPAGE , > +.BR MADV_NOHUGEPAGE , > and > -.B MADV_NOHUGEPAGE > +.B MADV_COLLAPSE > operations are available only if the kernel was configured with > .B CONFIG_TRANSPARENT_HUGEPAGE > and file/shmem memory is only supported if the kernel was configured with > @@ -401,6 +402,81 @@ and > .I length > will not be backed by transparent hugepages. > .TP > +.BR MADV_COLLAPSE " (since Linux 6.1)" > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > +Perform a best-effort synchronous collapse of the native pages mapped by the > +memory range into Transparent Huge Pages (THPs). > +.B MADV_COLLAPSE > +operates on the current state of memory of the calling process and makes no > +persistent changes or guarantees on how pages will be mapped, > +constructed, > +or faulted in the future. > +.IP > +.B MADV_COLLAPSE > +supports private anonymous pages (see > +.BR mmap (2)), > +shmem pages, > +and file-backed pages. > +See > +.B MADV_HUGEPAGE > +for general information on memory requirements for THP. > +If the range provided spans multiple VMAs, > +the semantics of the collapse over each VMA is independent from the others. > +If collapse of a given huge page-aligned/sized region fails, > +the operation may continue to attempt collapsing the remainder of the > +specified memory. > +.B MADV_COLLAPSE > +will automatically clamp the provided range to be hugepage-aligned. > +.IP > +All non-resident pages covered by the range will first be > +swapped/faulted-in, > +before being copied onto a freshly allocated hugepage. > +If the native pages compose the same PTE-mapped hugepage, > +and are suitably aligned, > +allocation of a new hugepage may be elided and collapse may happen > +in-place. > +Unmapped pages will have their data directly initialized to 0 in the new > +hugepage. > +However, > +for every eligible hugepage-aligned/sized region to be collapsed, > +at least one page must currently be backed by physical memory. > +.IP > +.B MADV_COLLAPSE > +is independent of any sysfs > +(see > +.BR sysfs (5)) > +setting under > +.IR /sys/kernel/mm/transparent_hugepage , > +both in terms of determining THP eligibility, > +and allocation semantics. > +See Linux kernel source file > +.I Documentation/admin\-guide/mm/transhuge.rst > +for more information. > +.B MADV_COLLAPSE > +also ignores > +.B huge= > +tmpfs mount when operating on tmpfs files. > +Allocation for the new hugepage may enter direct reclaim and/or compaction, > +regardless of VMA flags > +(though > +.B VM_NOHUGEPAGE > +is still respected). > +.IP > +When the system has multiple NUMA nodes, > +the hugepage will be allocated from the node providing the most native > +pages. > +.IP > +If all hugepage-sized/aligned regions covered by the provided range were > +either successfully collapsed, > +or were already PMD-mapped THPs, > +this operation will be deemed successful. > +Note that this doesn't guarantee anything about other possible mappings of > +the memory. > +Also note that many failures might have occurred since the operation may > +continue to collapse in the event collapse of a single hugepage-sized/aligned > +region fails. I don't understand this last paragraph (since "Also note ..."). Could you please reword it a little bit? > +.TP > .BR MADV_DONTDUMP " (since Linux 3.4)" > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > @@ -620,6 +696,11 @@ A kernel resource was temporarily unavailable. > .B EBADF > The map exists, but the area maps something that isn't a file. > .TP > +.B EBUSY > +(for > +.BR MADV_COLLAPSE ) > +Could not charge hugepage to cgroup: cgroup limit exceeded. > +.TP > .B EFAULT > .I advice > is > @@ -717,6 +798,11 @@ maximum resident set size. > Not enough memory: paging in failed. > .TP > .B ENOMEM > +(for > +.BR MADV_COLLAPSE ) > +Not enough memory: could not allocate hugepage. > +.TP > +.B ENOMEM > Addresses in the specified range are not currently > mapped, or are outside the address space of the process. > .TP > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > index ac98850a9..92878286b 100644 > --- a/man2/process_madvise.2 > +++ b/man2/process_madvise.2 > @@ -73,6 +73,10 @@ argument is one of the following values: > See > .BR madvise (2). > .TP > +.B MADV_COLLAPSE > +See > +.BR madvise (2). > +.TP > .B MADV_PAGEOUT > See > .BR madvise (2). > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > .TP > .B ESRCH > The target process does not exist (i.e., it has terminated and been waited on). > +.PP > +See > +.BR madvise (2) > +for > +.IR advice -specific > +errors. > .SH VERSIONS > This system call first appeared in Linux 5.10. > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc Diff for changing a few line breaks (and removing the spurious file): diff --git a/man2/madvise b/man2/madvise deleted file mode 100644 index e69de29bb..000000000 diff --git a/man2/madvise.2 b/man2/madvise.2 index dca42c7d6..7f34301d3 100644 --- a/man2/madvise.2 +++ b/man2/madvise.2 @@ -405,11 +405,12 @@ .SS Linux-specific advice values .BR MADV_COLLAPSE " (since Linux 6.1)" .\" commit 7d8faaf155454f8798ec56404faca29a82689c77 .\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 -Perform a best-effort synchronous collapse of the native pages mapped by the -memory range into Transparent Huge Pages (THPs). +Perform a best-effort synchronous collapse of +the native pages mapped by the memory range +into Transparent Huge Pages (THPs). .B MADV_COLLAPSE -operates on the current state of memory of the calling process and makes no -persistent changes or guarantees on how pages will be mapped, +operates on the current state of memory of the calling process and +makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future. .IP @@ -424,20 +425,20 @@ .SS Linux-specific advice values If the range provided spans multiple VMAs, the semantics of the collapse over each VMA is independent from the others. If collapse of a given huge page-aligned/sized region fails, -the operation may continue to attempt collapsing the remainder of the -specified memory. +the operation may continue to attempt collapsing +the remainder of the specified memory. .B MADV_COLLAPSE will automatically clamp the provided range to be hugepage-aligned. .IP -All non-resident pages covered by the range will first be -swapped/faulted-in, +All non-resident pages covered by the range +will first be swapped/faulted-in, before being copied onto a freshly allocated hugepage. If the native pages compose the same PTE-mapped hugepage, and are suitably aligned, -allocation of a new hugepage may be elided and collapse may happen -in-place. -Unmapped pages will have their data directly initialized to 0 in the new -hugepage. +allocation of a new hugepage may be elided and +collapse may happen in-place. +Unmapped pages will have their data directly initialized to 0 +in the new hugepage. However, for every eligible hugepage-aligned/sized region to be collapsed, at least one page must currently be backed by physical memory. @@ -464,15 +465,15 @@ .SS Linux-specific advice values is still respected). .IP When the system has multiple NUMA nodes, -the hugepage will be allocated from the node providing the most native -pages. +the hugepage will be allocated from +the node providing the most native pages. .IP If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. -Note that this doesn't guarantee anything about other possible mappings of -the memory. +Note that this doesn't guarantee anything about +other possible mappings of the memory. Also note that many failures might have occurred since the operation may continue to collapse in the event collapse of a single hugepage-sized/aligned region fails. --