Hey Zach, On 12/11/22 22:51, Zach O'Keefe wrote: > On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar > wrote: >> >> Hi Zach, > > Hey Alex, > >> On 10/22/22 00:33, Zach OKeefe wrote: >>> From: Zach O'Keefe >>> >>> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 >>> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and >>> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to >>> MADV_COLLAPSE"). Update the man-pages for madvise(2) and >>> process_madvise(2). >>> >>> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ >>> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ >>> Signed-off-by: Zach O'Keefe >> >> Please see a few comments below. >> > > Thanks for the mail. So, this patch was taken as commit b106cd5bf > ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your > comments below were > applied (I think, by you) as fixes pre-commit. However, there are some > new comments (or ones > that address the same lines, but in different ways). Is this mail to > log ~ what changes were done, > or is there anything actionable here on my side? Ah no, it's just that I had it marked as unread for some reason, so I thought I had forgotten to respond (and I forgot that I had applied it). :-) So, no action required. Regarding different suggestions, heh, it demonstrates that it's not exactly deterministic :P Cheers, Alex P.S.: Do you know if I have anything missing from you or any of your collegues? > > Best, > Zach > > Thanks for this. >> Cheers, >> >> Alex >> >>> --- >>> man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- >>> man2/process_madvise.2 | 10 +++++ >>> 2 files changed, 98 insertions(+), 2 deletions(-) >>> >>> diff --git a/man2/madvise.2 b/man2/madvise.2 >>> index df3413cc8..b03fc731d 100644 >>> --- a/man2/madvise.2 >>> +++ b/man2/madvise.2 >>> @@ -385,9 +385,10 @@ set (see >>> .BR prctl (2) ). >>> .IP >>> The >>> -.B MADV_HUGEPAGE >>> +.BR MADV_HUGEPAGE , >>> +.BR MADV_NOHUGEPAGE , >>> and >>> -.B MADV_NOHUGEPAGE >>> +.B MADV_COLLAPSE >>> operations are available only if the kernel was configured with >>> .B CONFIG_TRANSPARENT_HUGEPAGE >>> and file/shmem memory is only supported if the kernel was configured with >>> @@ -400,6 +401,81 @@ and >>> .I length >>> will not be backed by transparent hugepages. >>> .TP >>> +.BR MADV_COLLAPSE " (since Linux 6.1)" >>> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 >>> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 >>> +Perform a best-effort synchronous collapse of the native pages mapped by the >> >> Please use semantic line breaks. In this case, I'd break after "pages". >> >> man-pages(7): >> Use semantic newlines >> In the source of a manual page, new sentences should be started on new >> lines, long sentences should be split into lines at clause breaks (com‐ >> mas, semicolons, colons, and so on), and long clauses should be split >> at phrase boundaries. This convention, sometimes known as "semantic >> newlines", makes it easier to see the effect of patches, which often >> operate at the level of individual sentences, clauses, or phrases. >> >>> +memory range into Transparent Huge Pages (THPs). >>> +.B MADV_COLLAPSE >>> +operates on the current state of memory of the calling process and makes no >> >> Here I'd break after "and". >> >>> +persistent changes or guarantees on how pages will be mapped, >>> +constructed, >>> +or faulted in the future. >>> +.IP >>> +.B MADV_COLLAPSE >>> +supports private anonymous pages (see >>> +.BR mmap (2)), >>> +shmem pages, >>> +and file-backed pages. >>> +See >>> +.B MADV_HUGEPAGE >>> +for general information on memory requirements for THP. >>> +If the range provided spans multiple VMAs, >>> +the semantics of the collapse over each VMA is independent from the others. >>> +If collapse of a given huge page-aligned/sized region fails, >>> +the operation may continue to attempt collapsing the remainder of the >> >> Break after "collapsing". >> >>> +specified memory. >>> +.B MADV_COLLAPSE >>> +will automatically clamp the provided range to be hugepage-aligned. >>> +.IP >>> +All non-resident pages covered by the range will first be >> >> Break after "range". >> >>> +swapped/faulted-in, >>> +before being copied onto a freshly allocated hugepage. >>> +If the native pages compose the same PTE-mapped hugepage, >>> +and are suitably aligned, >>> +allocation of a new hugepage may be elided and collapse may happen >> >> Break before or after "and". >> >>> +in-place. >>> +Unmapped pages will have their data directly initialized to 0 in the new >> >> Break after "0". >> >>> +hugepage. >>> +However, >>> +for every eligible hugepage-aligned/sized region to be collapsed, >>> +at least one page must currently be backed by physical memory. >>> +.IP >>> +.BR MADV_COLLAPSE >> >> s/BR/B/ >> >>> +is independent of any sysfs >>> +(see >>> +.BR sysfs (5)) >>> +setting under >>> +.IR /sys/kernel/mm/transparent_hugepage , >>> +both in terms of determining THP eligibility, >>> +and allocation semantics. >>> +See Linux kernel source file >>> +.I Documentation/admin\-guide/mm/transhuge.rst >>> +for more information. >>> +.BR MADV_COLLAPSE >> >> s/BR/B/ >> >>> +also ignores >>> +.B huge= >>> +tmpfs mount when operating on tmpfs files. >>> +Allocation for the new hugepage may enter direct reclaim and/or compaction, >>> +regardless of VMA flags >>> +(though >>> +.BR VM_NOHUGEPAGE >> >> s/BR/B/ >> >>> +is still respected). >>> +.IP >>> +When the system has multiple NUMA nodes, >>> +the hugepage will be allocated from the node providing the most native >> >> Break after "from". >> >>> +pages. >>> +.IP >>> +If all hugepage-sized/aligned regions covered by the provided range were >> >> Prefer English rather than "/". >> >>> +either successfully collapsed, >>> +or were already PMD-mapped THPs, >>> +this operation will be deemed successful. >>> +Note that this doesn’t guarantee anything about other possible mappings of >> >> Break after "about". >> >>> +the memory. >>> +Also note that many failures might have occurred since the operation may >>> +continue to collapse in the event collapse of a single hugepage-sized/aligned >> >> Add some omitted "that" or something that will help readability to >> non-native-English readers. >> >> And break at a better place. >> >>> +region fails. >>> +.TP >>> .BR MADV_DONTDUMP " (since Linux 3.4)" >>> .\" commit 909af768e88867016f427264ae39d27a57b6a8ed >>> .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 >>> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. >>> .B EBADF >>> The map exists, but the area maps something that isn't a file. >>> .TP >>> +.B EBUSY >>> +(for >>> +.BR MADV_COLLAPSE ) >>> +Could not charge hugepage to cgroup: cgroup limit exceeded. >>> +.TP >>> .B EFAULT >>> .I advice >>> is >>> @@ -716,6 +797,11 @@ maximum resident set size. >>> Not enough memory: paging in failed. >>> .TP >>> .B ENOMEM >>> +(for >>> +.BR MADV_COLLAPSE ) >>> +Not enough memory: could not allocate hugepage. >>> +.TP >>> +.B ENOMEM >>> Addresses in the specified range are not currently >>> mapped, or are outside the address space of the process. >>> .TP >>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 >>> index 44d3b94e8..8b0ddccdd 100644 >>> --- a/man2/process_madvise.2 >>> +++ b/man2/process_madvise.2 >>> @@ -73,6 +73,10 @@ argument is one of the following values: >>> See >>> .BR madvise (2). >>> .TP >>> +.B MADV_COLLAPSE >>> +See >>> +.BR madvise (2). >>> +.TP >>> .B MADV_PAGEOUT >>> See >>> .BR madvise (2). >>> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process >>> .TP >>> .B ESRCH >>> The target process does not exist (i.e., it has terminated and been waited on). >>> +.PP >>> +See >>> +.BR madvise (2) >>> +for >>> +.IR advice -specific >>> +errors. >>> .SH VERSIONS >>> This system call first appeared in Linux 5.10. >>> .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc >> >> -- >> --