linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation
@ 2022-10-21 22:32 Zach OKeefe
  2022-10-21 22:32 ` [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Zach OKeefe @ 2022-10-21 22:32 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe

From: Zach O'Keefe <zokeefe@google.com>

v3 Forward

Changes from v2[1]:

- Patch 3 adds a NOTES section that records the security
  requirements when process_madvise(2) syscall was added in Linux 5.10
  (Alex Colomar).
- Patch 3 also corrects documented
  PTRACE_MODE_READ_REALCREDS ptrace mode for process_madvise(2).

[1] https://lore.kernel.org/linux-man/20221018235051.152548-1-zokeefe@google.com/

----------------------------------------------

v2 Forward

Changes from v1[2]:

- All patches have seen some reformatting and/or typo corrections
  (Alex Colomar).
- Patch 3 has had the commit description reworded to reflect that this commit
  is a "fix" rather than an "update" (Alex Colomar).
- Rebased on man-pages-6.01

[2] https://lore.kernel.org/linux-man/20221017175523.2048887-1-zokeefe@google.com/T/#m8e9e94ed52c99e7cf4969bd992492359c59a0faa

----------------------------------------------

Hey Alex / Michael,

This series adds MADV_COLLAPSE (expected for Linux 6.1) documentation to
madvise(2) and process_madvise(2).  A few prerequisite patches are included to
fix up existing MADV_HUGEPAGE and process_madvise(2) documentation, as well as
add some additional clarity for madvise(2) "advice" probing.

Series applies on man-pages-6.00, and I've attempted to use semantic newlines,
though I can't claim I've made the right choices everywhere for long clauses.

Thanks,
Zach



Zach O'Keefe (4):
  madvise.2: update THP file/shmem documentation for +5.4
  madvise.2: document reliable probe for advice support
  process_madvise.2: fix capability and ptrace requirements
  madvise.2: add documentation for MADV_COLLAPSE

 man2/madvise.2         | 133 +++++++++++++++++++++++++++++++++++++++--
 man2/process_madvise.2 |  31 ++++++++--
 2 files changed, 155 insertions(+), 9 deletions(-)

-- 
2.38.0.135.g90850a2211-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-21 22:32 [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
@ 2022-10-21 22:32 ` Zach OKeefe
  2022-10-30 11:41   ` Alejandro Colomar
  2022-10-21 22:32 ` [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Zach OKeefe @ 2022-10-21 22:32 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe

From: Zach O'Keefe <zokeefe@google.com>

Since Linux 5.4, Transparent Huge Pages now support both file-backed
memory and shmem memory. Update MADV_HUGEPAGE advice description to
reflect this.

Additionally, expand the description of requirements for memory to be
considered eligible for THP: alignment / mapping requirements, VMA
flags, prctl(2) settings, inode status, etc.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2 | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/man2/madvise.2 b/man2/madvise.2
index 81cce56af..64f788ace 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
 .I addr
 and
 .IR length .
-Currently, Transparent Huge Pages work only with private anonymous pages (see
-.BR mmap (2)).
 The kernel will regularly scan the areas marked as huge page candidates
 to replace them with huge pages.
 The kernel will also allocate huge pages directly when the region is
@@ -354,12 +352,46 @@ an access pattern that the developer knows in advance won't risk
 to increase the memory footprint of the application when transparent
 hugepages are enabled.
 .IP
+.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
+Since Linux 5.4,
+automatic scan of eligible areas and replacement by huge pages works with
+private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+For all memory types,
+memory may only be replaced by huge pages on hugepage-aligned boundaries.
+For file-mapped memory \(em including tmpfs (see
+.BR tmpfs (2))
+\(em the mapping must also be naturally hugepage-aligned within the file.
+Additionally,
+for file-backed,
+non-tmpfs memory,
+the file must not be open for write and the mapping must be executable.
+.IP
+The VMA must not be marked
+.BR VM_NOHUGEPAGE ,
+.BR VM_HUGETLB ,
+.BR VM_IO ,
+.BR VM_DONTEXPAND ,
+.BR VM_MIXEDMAP ,
+or
+.BR VM_PFNMAP ,
+nor can it be stack memory or backed by a DAX-enabled device
+(unless the DAX device is hot-plugged as System RAM).
+The process must also not have
+.B PR_SET_THP_DISABLE
+set (see
+.BR prctl (2) ).
+.IP
 The
 .B MADV_HUGEPAGE
 and
 .B MADV_NOHUGEPAGE
 operations are available only if the kernel was configured with
-.BR CONFIG_TRANSPARENT_HUGEPAGE .
+.B CONFIG_TRANSPARENT_HUGEPAGE
+and file/shmem memory is only supported if the kernel was configured with
+.BR CONFIG_READ_ONLY_THP_FOR_FS .
 .TP
 .BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
 Ensures that memory in the address range specified by
-- 
2.38.0.135.g90850a2211-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support
  2022-10-21 22:32 [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
  2022-10-21 22:32 ` [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
@ 2022-10-21 22:32 ` Zach OKeefe
  2022-10-30 11:44   ` Alejandro Colomar
  2022-10-21 22:32 ` [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements Zach OKeefe
  2022-10-21 22:33 ` [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
  3 siblings, 1 reply; 25+ messages in thread
From: Zach OKeefe @ 2022-10-21 22:32 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe

From: Zach O'Keefe <zokeefe@google.com>

EINVAL is an overloaded error code for madvise(2) and it's not clear
under what context it means "advice is not valid" vs another error.

Explicitly document that madvise(0, 0, advice) can reliably be used to
probe for kernel support for "advice", returning zero iff "advice" is
supported by the kernel.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2 | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/man2/madvise.2 b/man2/madvise.2
index 64f788ace..df3413cc8 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -790,6 +790,11 @@ that are not mapped, the Linux version of
 ignores them and applies the call to the rest (but returns
 .B ENOMEM
 from the system call, as it should).
+.PP
+.IR madvise(0,\ 0,\ advice)
+will return zero iff
+.I advice
+is supported by the kernel and can be relied on to probe for support.
 .\" .SH HISTORY
 .\" The
 .\" .BR madvise ()
-- 
2.38.0.135.g90850a2211-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements
  2022-10-21 22:32 [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
  2022-10-21 22:32 ` [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
  2022-10-21 22:32 ` [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
@ 2022-10-21 22:32 ` Zach OKeefe
  2022-10-30 11:50   ` Alejandro Colomar
  2022-10-21 22:33 ` [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
  3 siblings, 1 reply; 25+ messages in thread
From: Zach OKeefe @ 2022-10-21 22:32 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe,
	Suren Baghdasaryan, Minchan Kim

From: Zach O'Keefe <zokeefe@google.com>

The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
("mm/madvise: introduce process_madvise() syscall: an external memory
hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
attach requirement for process_madvise") which replaced this with a
combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
performance).

The initial commit of process_madvise(2) to man-pages project, made
after the second patch, included two errors:

1) CAP_SYS_ADMIN instead of CAP_SYS_NICE
2) PTRACE_MODE_READ_REALCREDS instead of PTRACE_MODE_READ_FSCREDS

Correct this in the man-page for process_madvise(2).

Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/process_madvise.2 | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 6208206e4..44d3b94e8 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -105,16 +105,20 @@ remote process.
 No further elements will be processed beyond that point.
 (See the discussion regarding partial advice in RETURN VALUE.)
 .PP
-Permission to apply advice to another process is governed by a
+.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
+Starting in Linux 5.12,
+permission to apply advice to another process is governed by
 ptrace access mode
-.B PTRACE_MODE_READ_REALCREDS
+.B PTRACE_MODE_READ_FSCREDS
 check (see
 .BR ptrace (2));
 in addition,
 because of the performance implications of applying the advice,
 the caller must have the
-.B CAP_SYS_ADMIN
-capability.
+.B CAP_SYS_NICE
+capability
+(see
+.BR capabilities (7)).
 .SH RETURN VALUE
 On success,
 .BR process_madvise ()
@@ -180,6 +184,15 @@ configuration option.
 The
 .BR process_madvise ()
 system call is Linux-specific.
+.SH NOTES
+When this system call first appeared in Linux 5.10,
+permission to apply advice to another process was entirely governed by
+ptrace access mode
+.B PTRACE_MODE_ATTACH_FSCREDS
+check (see
+.BR ptrace (2)).
+This requirement was relaxed in Linux 5.12 so that the caller didn't require
+full control over the target process.
 .SH SEE ALSO
 .BR madvise (2),
 .BR pidfd_open (2),
-- 
2.38.0.135.g90850a2211-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-21 22:32 [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
                   ` (2 preceding siblings ...)
  2022-10-21 22:32 ` [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements Zach OKeefe
@ 2022-10-21 22:33 ` Zach OKeefe
  2022-10-31 21:15   ` Alejandro Colomar
  2022-12-11 17:59   ` Alejandro Colomar
  3 siblings, 2 replies; 25+ messages in thread
From: Zach OKeefe @ 2022-10-21 22:33 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe

From: Zach O'Keefe <zokeefe@google.com>

Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE").  Update the man-pages for madvise(2) and
process_madvise(2).

Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
 man2/process_madvise.2 | 10 +++++
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/man2/madvise.2 b/man2/madvise.2
index df3413cc8..b03fc731d 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -385,9 +385,10 @@ set (see
 .BR prctl (2) ).
 .IP
 The
-.B MADV_HUGEPAGE
+.BR MADV_HUGEPAGE ,
+.BR MADV_NOHUGEPAGE ,
 and
-.B MADV_NOHUGEPAGE
+.B MADV_COLLAPSE
 operations are available only if the kernel was configured with
 .B CONFIG_TRANSPARENT_HUGEPAGE
 and file/shmem memory is only supported if the kernel was configured with
@@ -400,6 +401,81 @@ and
 .I length
 will not be backed by transparent hugepages.
 .TP
+.BR MADV_COLLAPSE " (since Linux 6.1)"
+.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
+.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
+Perform a best-effort synchronous collapse of the native pages mapped by the
+memory range into Transparent Huge Pages (THPs).
+.B MADV_COLLAPSE
+operates on the current state of memory of the calling process and makes no
+persistent changes or guarantees on how pages will be mapped,
+constructed,
+or faulted in the future.
+.IP
+.B MADV_COLLAPSE
+supports private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+See
+.B MADV_HUGEPAGE
+for general information on memory requirements for THP.
+If the range provided spans multiple VMAs,
+the semantics of the collapse over each VMA is independent from the others.
+If collapse of a given huge page-aligned/sized region fails,
+the operation may continue to attempt collapsing the remainder of the
+specified memory.
+.B MADV_COLLAPSE
+will automatically clamp the provided range to be hugepage-aligned.
+.IP
+All non-resident pages covered by the range will first be
+swapped/faulted-in,
+before being copied onto a freshly allocated hugepage.
+If the native pages compose the same PTE-mapped hugepage,
+and are suitably aligned,
+allocation of a new hugepage may be elided and collapse may happen
+in-place.
+Unmapped pages will have their data directly initialized to 0 in the new
+hugepage.
+However,
+for every eligible hugepage-aligned/sized region to be collapsed,
+at least one page must currently be backed by physical memory.
+.IP
+.BR MADV_COLLAPSE
+is independent of any sysfs
+(see
+.BR sysfs (5))
+setting under
+.IR /sys/kernel/mm/transparent_hugepage ,
+both in terms of determining THP eligibility,
+and allocation semantics.
+See Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more information.
+.BR MADV_COLLAPSE
+also ignores
+.B huge=
+tmpfs mount when operating on tmpfs files.
+Allocation for the new hugepage may enter direct reclaim and/or compaction,
+regardless of VMA flags
+(though
+.BR VM_NOHUGEPAGE
+is still respected).
+.IP
+When the system has multiple NUMA nodes,
+the hugepage will be allocated from the node providing the most native
+pages.
+.IP
+If all hugepage-sized/aligned regions covered by the provided range were
+either successfully collapsed,
+or were already PMD-mapped THPs,
+this operation will be deemed successful.
+Note that this doesn’t guarantee anything about other possible mappings of
+the memory.
+Also note that many failures might have occurred since the operation may
+continue to collapse in the event collapse of a single hugepage-sized/aligned
+region fails.
+.TP
 .BR MADV_DONTDUMP " (since Linux 3.4)"
 .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
 .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
@@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
 .B EBADF
 The map exists, but the area maps something that isn't a file.
 .TP
+.B EBUSY
+(for
+.BR MADV_COLLAPSE )
+Could not charge hugepage to cgroup: cgroup limit exceeded.
+.TP
 .B EFAULT
 .I advice
 is
@@ -716,6 +797,11 @@ maximum resident set size.
 Not enough memory: paging in failed.
 .TP
 .B ENOMEM
+(for
+.BR MADV_COLLAPSE )
+Not enough memory: could not allocate hugepage.
+.TP
+.B ENOMEM
 Addresses in the specified range are not currently
 mapped, or are outside the address space of the process.
 .TP
diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 44d3b94e8..8b0ddccdd 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -73,6 +73,10 @@ argument is one of the following values:
 See
 .BR madvise (2).
 .TP
+.B MADV_COLLAPSE
+See
+.BR madvise (2).
+.TP
 .B MADV_PAGEOUT
 See
 .BR madvise (2).
@@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
 .TP
 .B ESRCH
 The target process does not exist (i.e., it has terminated and been waited on).
+.PP
+See
+.BR madvise (2)
+for
+.IR advice -specific
+errors.
 .SH VERSIONS
 This system call first appeared in Linux 5.10.
 .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
-- 
2.38.0.135.g90850a2211-goog



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-21 22:32 ` [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
@ 2022-10-30 11:41   ` Alejandro Colomar
  2022-10-31 16:33     ` Zach O'Keefe
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-30 11:41 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 4423 bytes --]

Hi Zach!

On 10/22/22 00:32, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Since Linux 5.4, Transparent Huge Pages now support both file-backed
> memory and shmem memory. Update MADV_HUGEPAGE advice description to
> reflect this.
> 
> Additionally, expand the description of requirements for memory to be
> considered eligible for THP: alignment / mapping requirements, VMA
> flags, prctl(2) settings, inode status, etc.
> 
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Patch applied!  Thanks.
Since you were interested, I amended it with the following diff:


diff --git a/man2/madvise.2 b/man2/madvise.2
index 64f788ace..48bda703c 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -361,9 +361,10 @@ .SS Linux-specific advice values
  and file-backed pages.
  For all memory types,
  memory may only be replaced by huge pages on hugepage-aligned boundaries.
-For file-mapped memory \(em including tmpfs (see
-.BR tmpfs (2))
-\(em the mapping must also be naturally hugepage-aligned within the file.
+For file-mapped memory
+\(emincluding tmpfs (see
+.BR tmpfs (2))\(em
+the mapping must also be naturally hugepage-aligned within the file.
  Additionally,
  for file-backed,
  non-tmpfs memory,
@@ -382,7 +383,7 @@ .SS Linux-specific advice values
  The process must also not have
  .B PR_SET_THP_DISABLE
  set (see
-.BR prctl (2) ).
+.BR prctl (2)).
  .IP
  The
  .B MADV_HUGEPAGE


- The em dashes you used were correct with spaces in both sides; there are those 
who use them with spaces in both sides and those who use them with no spaces at 
all.  And then there are those who put spaces as if they were parentheses 
(admittedly this is much less common).  I prefer this latter case, since it 
makes technical texts more parseable.

- I used a semantic newline break for the em dashes, since they are similar to a 
coma.

- Removed a spurious space in BR that would have made the formatting weird.

Cheers,

Alex


> ---
>   man2/madvise.2 | 38 +++++++++++++++++++++++++++++++++++---
>   1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index 81cce56af..64f788ace 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
>   .I addr
>   and
>   .IR length .
> -Currently, Transparent Huge Pages work only with private anonymous pages (see
> -.BR mmap (2)).
>   The kernel will regularly scan the areas marked as huge page candidates
>   to replace them with huge pages.
>   The kernel will also allocate huge pages directly when the region is
> @@ -354,12 +352,46 @@ an access pattern that the developer knows in advance won't risk
>   to increase the memory footprint of the application when transparent
>   hugepages are enabled.
>   .IP
> +.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
> +Since Linux 5.4,
> +automatic scan of eligible areas and replacement by huge pages works with
> +private anonymous pages (see
> +.BR mmap (2)),
> +shmem pages,
> +and file-backed pages.
> +For all memory types,
> +memory may only be replaced by huge pages on hugepage-aligned boundaries.
> +For file-mapped memory \(em including tmpfs (see
> +.BR tmpfs (2))
> +\(em the mapping must also be naturally hugepage-aligned within the file.
> +Additionally,
> +for file-backed,
> +non-tmpfs memory,
> +the file must not be open for write and the mapping must be executable.
> +.IP
> +The VMA must not be marked
> +.BR VM_NOHUGEPAGE ,
> +.BR VM_HUGETLB ,
> +.BR VM_IO ,
> +.BR VM_DONTEXPAND ,
> +.BR VM_MIXEDMAP ,
> +or
> +.BR VM_PFNMAP ,
> +nor can it be stack memory or backed by a DAX-enabled device
> +(unless the DAX device is hot-plugged as System RAM).
> +The process must also not have
> +.B PR_SET_THP_DISABLE
> +set (see
> +.BR prctl (2) ).
> +.IP
>   The
>   .B MADV_HUGEPAGE
>   and
>   .B MADV_NOHUGEPAGE
>   operations are available only if the kernel was configured with
> -.BR CONFIG_TRANSPARENT_HUGEPAGE .
> +.B CONFIG_TRANSPARENT_HUGEPAGE
> +and file/shmem memory is only supported if the kernel was configured with
> +.BR CONFIG_READ_ONLY_THP_FOR_FS .
>   .TP
>   .BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
>   Ensures that memory in the address range specified by

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support
  2022-10-21 22:32 ` [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
@ 2022-10-30 11:44   ` Alejandro Colomar
  2022-10-31 16:33     ` Zach O'Keefe
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-30 11:44 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 1167 bytes --]

Hi Zach,

On 10/22/22 00:32, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> EINVAL is an overloaded error code for madvise(2) and it's not clear
> under what context it means "advice is not valid" vs another error.
> 
> Explicitly document that madvise(0, 0, advice) can reliably be used to
> probe for kernel support for "advice", returning zero iff "advice" is
> supported by the kernel.
> 
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Patch applied.

Thanks,

Alex

> ---
>   man2/madvise.2 | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index 64f788ace..df3413cc8 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -790,6 +790,11 @@ that are not mapped, the Linux version of
>   ignores them and applies the call to the rest (but returns
>   .B ENOMEM
>   from the system call, as it should).
> +.PP
> +.IR madvise(0,\ 0,\ advice)
> +will return zero iff
> +.I advice
> +is supported by the kernel and can be relied on to probe for support.
>   .\" .SH HISTORY
>   .\" The
>   .\" .BR madvise ()

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements
  2022-10-21 22:32 ` [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements Zach OKeefe
@ 2022-10-30 11:50   ` Alejandro Colomar
  2022-10-31 19:13     ` Suren Baghdasaryan
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-30 11:50 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Yang Shi, linux-mm, linux-man, Minchan Kim, Zach OKeefe, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 3007 bytes --]

Hi Suren,

On 10/22/22 00:32, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> ("mm/madvise: introduce process_madvise() syscall: an external memory
> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> attach requirement for process_madvise") which replaced this with a
> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> performance).
> 
> The initial commit of process_madvise(2) to man-pages project, made
> after the second patch, included two errors:
> 
> 1) CAP_SYS_ADMIN instead of CAP_SYS_NICE
> 2) PTRACE_MODE_READ_REALCREDS instead of PTRACE_MODE_READ_FSCREDS
> 
> Correct this in the man-page for process_madvise(2).
> 
> Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

You added your Reviewed-by tag to v2 of this patch.  I guess you'd like to put 
it in this one too, but since it changed slightly, I'd like you to confirm.

Thanks,

Alex

> ---
>   man2/process_madvise.2 | 21 +++++++++++++++++----
>   1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 6208206e4..44d3b94e8 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -105,16 +105,20 @@ remote process.
>   No further elements will be processed beyond that point.
>   (See the discussion regarding partial advice in RETURN VALUE.)
>   .PP
> -Permission to apply advice to another process is governed by a
> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> +Starting in Linux 5.12,
> +permission to apply advice to another process is governed by
>   ptrace access mode
> -.B PTRACE_MODE_READ_REALCREDS
> +.B PTRACE_MODE_READ_FSCREDS
>   check (see
>   .BR ptrace (2));
>   in addition,
>   because of the performance implications of applying the advice,
>   the caller must have the
> -.B CAP_SYS_ADMIN
> -capability.
> +.B CAP_SYS_NICE
> +capability
> +(see
> +.BR capabilities (7)).
>   .SH RETURN VALUE
>   On success,
>   .BR process_madvise ()
> @@ -180,6 +184,15 @@ configuration option.
>   The
>   .BR process_madvise ()
>   system call is Linux-specific.
> +.SH NOTES
> +When this system call first appeared in Linux 5.10,
> +permission to apply advice to another process was entirely governed by
> +ptrace access mode
> +.B PTRACE_MODE_ATTACH_FSCREDS
> +check (see
> +.BR ptrace (2)).
> +This requirement was relaxed in Linux 5.12 so that the caller didn't require
> +full control over the target process.
>   .SH SEE ALSO
>   .BR madvise (2),
>   .BR pidfd_open (2),

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-30 11:41   ` Alejandro Colomar
@ 2022-10-31 16:33     ` Zach O'Keefe
  2022-10-31 20:11       ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zach O'Keefe @ 2022-10-31 16:33 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk

On Sun, Oct 30, 2022 at 4:42 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Zach!
>
> On 10/22/22 00:32, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Since Linux 5.4, Transparent Huge Pages now support both file-backed
> > memory and shmem memory. Update MADV_HUGEPAGE advice description to
> > reflect this.
> >
> > Additionally, expand the description of requirements for memory to be
> > considered eligible for THP: alignment / mapping requirements, VMA
> > flags, prctl(2) settings, inode status, etc.
> >
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Patch applied!  Thanks.
> Since you were interested, I amended it with the following diff:
>
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index 64f788ace..48bda703c 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -361,9 +361,10 @@ .SS Linux-specific advice values
>   and file-backed pages.
>   For all memory types,
>   memory may only be replaced by huge pages on hugepage-aligned boundaries.
> -For file-mapped memory \(em including tmpfs (see
> -.BR tmpfs (2))
> -\(em the mapping must also be naturally hugepage-aligned within the file.
> +For file-mapped memory
> +\(emincluding tmpfs (see
> +.BR tmpfs (2))\(em
> +the mapping must also be naturally hugepage-aligned within the file.
>   Additionally,
>   for file-backed,
>   non-tmpfs memory,
> @@ -382,7 +383,7 @@ .SS Linux-specific advice values
>   The process must also not have
>   .B PR_SET_THP_DISABLE
>   set (see
> -.BR prctl (2) ).
> +.BR prctl (2)).
>   .IP
>   The
>   .B MADV_HUGEPAGE
>
>
> - The em dashes you used were correct with spaces in both sides; there are those
> who use them with spaces in both sides and those who use them with no spaces at
> all.  And then there are those who put spaces as if they were parentheses
> (admittedly this is much less common).  I prefer this latter case, since it
> makes technical texts more parseable.
>
> - I used a semantic newline break for the em dashes, since they are similar to a
> coma.
>
> - Removed a spurious space in BR that would have made the formatting weird.
>
> Cheers,
>

Thanks for taking the patch, and thanks for taking the time with the
explanation! Glad I wasn't *too* far off :)

Changes look good to me - thank you!

Best,
Zach

> Alex
>
>
> > ---
> >   man2/madvise.2 | 38 +++++++++++++++++++++++++++++++++++---
> >   1 file changed, 35 insertions(+), 3 deletions(-)
> >
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index 81cce56af..64f788ace 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
> >   .I addr
> >   and
> >   .IR length .
> > -Currently, Transparent Huge Pages work only with private anonymous pages (see
> > -.BR mmap (2)).
> >   The kernel will regularly scan the areas marked as huge page candidates
> >   to replace them with huge pages.
> >   The kernel will also allocate huge pages directly when the region is
> > @@ -354,12 +352,46 @@ an access pattern that the developer knows in advance won't risk
> >   to increase the memory footprint of the application when transparent
> >   hugepages are enabled.
> >   .IP
> > +.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
> > +Since Linux 5.4,
> > +automatic scan of eligible areas and replacement by huge pages works with
> > +private anonymous pages (see
> > +.BR mmap (2)),
> > +shmem pages,
> > +and file-backed pages.
> > +For all memory types,
> > +memory may only be replaced by huge pages on hugepage-aligned boundaries.
> > +For file-mapped memory \(em including tmpfs (see
> > +.BR tmpfs (2))
> > +\(em the mapping must also be naturally hugepage-aligned within the file.
> > +Additionally,
> > +for file-backed,
> > +non-tmpfs memory,
> > +the file must not be open for write and the mapping must be executable.
> > +.IP
> > +The VMA must not be marked
> > +.BR VM_NOHUGEPAGE ,
> > +.BR VM_HUGETLB ,
> > +.BR VM_IO ,
> > +.BR VM_DONTEXPAND ,
> > +.BR VM_MIXEDMAP ,
> > +or
> > +.BR VM_PFNMAP ,
> > +nor can it be stack memory or backed by a DAX-enabled device
> > +(unless the DAX device is hot-plugged as System RAM).
> > +The process must also not have
> > +.B PR_SET_THP_DISABLE
> > +set (see
> > +.BR prctl (2) ).
> > +.IP
> >   The
> >   .B MADV_HUGEPAGE
> >   and
> >   .B MADV_NOHUGEPAGE
> >   operations are available only if the kernel was configured with
> > -.BR CONFIG_TRANSPARENT_HUGEPAGE .
> > +.B CONFIG_TRANSPARENT_HUGEPAGE
> > +and file/shmem memory is only supported if the kernel was configured with
> > +.BR CONFIG_READ_ONLY_THP_FOR_FS .
> >   .TP
> >   .BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
> >   Ensures that memory in the address range specified by
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support
  2022-10-30 11:44   ` Alejandro Colomar
@ 2022-10-31 16:33     ` Zach O'Keefe
  2022-10-31 20:21       ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zach O'Keefe @ 2022-10-31 16:33 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk

On Sun, Oct 30, 2022 at 4:44 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Zach,
>
> On 10/22/22 00:32, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > EINVAL is an overloaded error code for madvise(2) and it's not clear
> > under what context it means "advice is not valid" vs another error.
> >
> > Explicitly document that madvise(0, 0, advice) can reliably be used to
> > probe for kernel support for "advice", returning zero iff "advice" is
> > supported by the kernel.
> >
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Patch applied.

Thank you!

Best,
Zach

> Thanks,
>
> Alex
>
> > ---
> >   man2/madvise.2 | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index 64f788ace..df3413cc8 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -790,6 +790,11 @@ that are not mapped, the Linux version of
> >   ignores them and applies the call to the rest (but returns
> >   .B ENOMEM
> >   from the system call, as it should).
> > +.PP
> > +.IR madvise(0,\ 0,\ advice)
> > +will return zero iff
> > +.I advice
> > +is supported by the kernel and can be relied on to probe for support.
> >   .\" .SH HISTORY
> >   .\" The
> >   .\" .BR madvise ()
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements
  2022-10-30 11:50   ` Alejandro Colomar
@ 2022-10-31 19:13     ` Suren Baghdasaryan
  2022-10-31 20:24       ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Suren Baghdasaryan @ 2022-10-31 19:13 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Yang Shi, linux-mm, linux-man, Minchan Kim, Zach OKeefe, Michael Kerrisk

Hi Alex,

On Sun, Oct 30, 2022 at 4:50 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Suren,
>
> On 10/22/22 00:32, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> > ("mm/madvise: introduce process_madvise() syscall: an external memory
> > hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> > but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> > attach requirement for process_madvise") which replaced this with a
> > combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> > prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> > performance).
> >
> > The initial commit of process_madvise(2) to man-pages project, made
> > after the second patch, included two errors:
> >
> > 1) CAP_SYS_ADMIN instead of CAP_SYS_NICE
> > 2) PTRACE_MODE_READ_REALCREDS instead of PTRACE_MODE_READ_FSCREDS
> >
> > Correct this in the man-page for process_madvise(2).
> >
> > Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

>
> You added your Reviewed-by tag to v2 of this patch.  I guess you'd like to put
> it in this one too, but since it changed slightly, I'd like you to confirm.

Thanks for the reminder!
Suren.

>
> Thanks,
>
> Alex
>
> > ---
> >   man2/process_madvise.2 | 21 +++++++++++++++++----
> >   1 file changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> > index 6208206e4..44d3b94e8 100644
> > --- a/man2/process_madvise.2
> > +++ b/man2/process_madvise.2
> > @@ -105,16 +105,20 @@ remote process.
> >   No further elements will be processed beyond that point.
> >   (See the discussion regarding partial advice in RETURN VALUE.)
> >   .PP
> > -Permission to apply advice to another process is governed by a
> > +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> > +Starting in Linux 5.12,
> > +permission to apply advice to another process is governed by
> >   ptrace access mode
> > -.B PTRACE_MODE_READ_REALCREDS
> > +.B PTRACE_MODE_READ_FSCREDS
> >   check (see
> >   .BR ptrace (2));
> >   in addition,
> >   because of the performance implications of applying the advice,
> >   the caller must have the
> > -.B CAP_SYS_ADMIN
> > -capability.
> > +.B CAP_SYS_NICE
> > +capability
> > +(see
> > +.BR capabilities (7)).
> >   .SH RETURN VALUE
> >   On success,
> >   .BR process_madvise ()
> > @@ -180,6 +184,15 @@ configuration option.
> >   The
> >   .BR process_madvise ()
> >   system call is Linux-specific.
> > +.SH NOTES
> > +When this system call first appeared in Linux 5.10,
> > +permission to apply advice to another process was entirely governed by
> > +ptrace access mode
> > +.B PTRACE_MODE_ATTACH_FSCREDS
> > +check (see
> > +.BR ptrace (2)).
> > +This requirement was relaxed in Linux 5.12 so that the caller didn't require
> > +full control over the target process.
> >   .SH SEE ALSO
> >   .BR madvise (2),
> >   .BR pidfd_open (2),
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-31 16:33     ` Zach O'Keefe
@ 2022-10-31 20:11       ` Alejandro Colomar
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-31 20:11 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 657 bytes --]

Hi Zach!

On 10/31/22 17:33, Zach O'Keefe wrote:
> 
> Thanks for taking the patch, and thanks for taking the time with the
> explanation! Glad I wasn't *too* far off :)

:)

BTW, I just documented the linters that are available in the build system.
It would have caught an issue an patch 2/4 that I didn't (although it requires a 
yet-unreleased groff version; so it won't be available to general public until 
it is released in a few months).
You may want to run it in future patches ;)

I'll tell more about it in the thread for 2/4

> 
> Changes look good to me - thank you!

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support
  2022-10-31 16:33     ` Zach O'Keefe
@ 2022-10-31 20:21       ` Alejandro Colomar
  2022-10-31 21:24         ` Zach O'Keefe
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-31 20:21 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 2789 bytes --]

Hi Zach,

On 10/31/22 17:33, Zach O'Keefe wrote:
>> Patch applied.
> 
> Thank you!
> 
> Best,
> Zach
> 
>> Thanks,
>>
>> Alex
>>

I just caught a small formatting issue by running some linters through that file:

$ make -ij lint >/dev/null 2>&1
$ touch man2/madvise.2
$ make -k lint-man
LINT (groff)	tmp/lint/man2/madvise.2.lint-man.groff.touch
an.tmac:man2/madvise.2:795: style: .IR expects at least 2 arguments, got 1
found style problems; aborting
make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1
LINT (mandoc)	tmp/lint/man2/madvise.2.lint-man.mandoc.touch
mandoc: man2/madvise.2:15:5: STYLE: lower case character in document title: TH 
madvise
mandoc: man2/madvise.2:15:15: WARNING: cannot parse date, using it verbatim: TH 
(date)
make: *** [lib/lint-man.mk:87: tmp/lint/man2/madvise.2.lint-man.mandoc.touch] 
Error 2
make: Target 'lint-man' not remade because of errors.


The issues reported by mandoc(1) are expected; but the issue reported by 
groff(1) is legit.  The offending line is:

$ sed -n 795p man2/madvise.2
.IR madvise(0,\ 0,\ advice)

Since all of the line is formatted in italics (no roman part), it should be .I 
and not .IR.
After fixing that line, the linter now says:

$ make lint-man -k
LINT (groff)	tmp/lint/man2/madvise.2.lint-man.groff.touch
LINT (mandoc)	tmp/lint/man2/madvise.2.lint-man.mandoc.touch
mandoc: man2/madvise.2:15:5: STYLE: lower case character in document title: TH 
madvise
mandoc: man2/madvise.2:15:15: WARNING: cannot parse date, using it verbatim: TH 
(date)
make: *** [lib/lint-man.mk:87: tmp/lint/man2/madvise.2.lint-man.mandoc.touch] 
Error 2
make: Target 'lint-man' not remade because of errors.


Ignoring the spurious mandoc(1) warnings, it's all good.

Documentation about how to use this feature is here (written today, so no way 
you could have run it; I should have, though :D):
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING#n133>

Cheers,

Alex

>>> ---
>>>    man2/madvise.2 | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index 64f788ace..df3413cc8 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -790,6 +790,11 @@ that are not mapped, the Linux version of
>>>    ignores them and applies the call to the rest (but returns
>>>    .B ENOMEM
>>>    from the system call, as it should).
>>> +.PP
>>> +.IR madvise(0,\ 0,\ advice)
>>> +will return zero iff
>>> +.I advice
>>> +is supported by the kernel and can be relied on to probe for support.
>>>    .\" .SH HISTORY
>>>    .\" The
>>>    .\" .BR madvise ()
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements
  2022-10-31 19:13     ` Suren Baghdasaryan
@ 2022-10-31 20:24       ` Alejandro Colomar
  2022-10-31 21:25         ` Zach O'Keefe
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-31 20:24 UTC (permalink / raw)
  To: Suren Baghdasaryan, Zach OKeefe
  Cc: Yang Shi, linux-mm, linux-man, Minchan Kim, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 3580 bytes --]

Hi Suren, Zach,

On 10/31/22 20:13, Suren Baghdasaryan wrote:
> Hi Alex,
> 
> On Sun, Oct 30, 2022 at 4:50 AM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Hi Suren,
>>
>> On 10/22/22 00:32, Zach OKeefe wrote:
>>> From: Zach O'Keefe <zokeefe@google.com>
>>>
>>> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
>>> ("mm/madvise: introduce process_madvise() syscall: an external memory
>>> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
>>> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
>>> attach requirement for process_madvise") which replaced this with a
>>> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
>>> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
>>> performance).
>>>
>>> The initial commit of process_madvise(2) to man-pages project, made
>>> after the second patch, included two errors:
>>>
>>> 1) CAP_SYS_ADMIN instead of CAP_SYS_NICE
>>> 2) PTRACE_MODE_READ_REALCREDS instead of PTRACE_MODE_READ_FSCREDS
>>>
>>> Correct this in the man-page for process_madvise(2).
>>>
>>> Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
>>> Cc: Suren Baghdasaryan <surenb@google.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> 
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks!  Patch applied.

> 
>>
>> You added your Reviewed-by tag to v2 of this patch.  I guess you'd like to put
>> it in this one too, but since it changed slightly, I'd like you to confirm.
> 
> Thanks for the reminder!

:)

Cheers,

Alex

> Suren.
> 
>>
>> Thanks,
>>
>> Alex
>>
>>> ---
>>>    man2/process_madvise.2 | 21 +++++++++++++++++----
>>>    1 file changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
>>> index 6208206e4..44d3b94e8 100644
>>> --- a/man2/process_madvise.2
>>> +++ b/man2/process_madvise.2
>>> @@ -105,16 +105,20 @@ remote process.
>>>    No further elements will be processed beyond that point.
>>>    (See the discussion regarding partial advice in RETURN VALUE.)
>>>    .PP
>>> -Permission to apply advice to another process is governed by a
>>> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
>>> +Starting in Linux 5.12,
>>> +permission to apply advice to another process is governed by
>>>    ptrace access mode
>>> -.B PTRACE_MODE_READ_REALCREDS
>>> +.B PTRACE_MODE_READ_FSCREDS
>>>    check (see
>>>    .BR ptrace (2));
>>>    in addition,
>>>    because of the performance implications of applying the advice,
>>>    the caller must have the
>>> -.B CAP_SYS_ADMIN
>>> -capability.
>>> +.B CAP_SYS_NICE
>>> +capability
>>> +(see
>>> +.BR capabilities (7)).
>>>    .SH RETURN VALUE
>>>    On success,
>>>    .BR process_madvise ()
>>> @@ -180,6 +184,15 @@ configuration option.
>>>    The
>>>    .BR process_madvise ()
>>>    system call is Linux-specific.
>>> +.SH NOTES
>>> +When this system call first appeared in Linux 5.10,
>>> +permission to apply advice to another process was entirely governed by
>>> +ptrace access mode
>>> +.B PTRACE_MODE_ATTACH_FSCREDS
>>> +check (see
>>> +.BR ptrace (2)).
>>> +This requirement was relaxed in Linux 5.12 so that the caller didn't require
>>> +full control over the target process.
>>>    .SH SEE ALSO
>>>    .BR madvise (2),
>>>    .BR pidfd_open (2),
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-21 22:33 ` [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
@ 2022-10-31 21:15   ` Alejandro Colomar
  2022-10-31 23:00     ` Zach O'Keefe
  2022-11-01  1:51     ` G. Branden Robinson
  2022-12-11 17:59   ` Alejandro Colomar
  1 sibling, 2 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-10-31 21:15 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 8043 bytes --]

Hi Zach!

On 10/22/22 00:33, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> process_madvise(2).
> 
> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

There are a few issues with this patch:

alx@asus5775:~/src/linux/man-pages/man-pages$ make lint-man-groff
LINT (groff)	tmp/lint/man2/madvise.2.lint-man.groff.touch
eqn:man2/madvise.2:473: error: invalid input character code '128'
eqn:man2/madvise.2:473: error: invalid input character code '153'
an.tmac:man2/madvise.2:445: style: .BR expects at least 2 arguments, got 1
an.tmac:man2/madvise.2:456: style: .BR expects at least 2 arguments, got 1
an.tmac:man2/madvise.2:463: style: .BR expects at least 2 arguments, got 1
found style problems; aborting
make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1


Let's investigate them:

alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 473p man2/madvise.2
this operation will be deemed successful.

This one was a bit difficult to track, since the line count seems to be off by one:

alx@asus5775:~/src/linux/man-pages/man-pages$ tbl man2/madvise.2 | hd | grep -C1 
' 80 '
00003d40  63 65 73 73 66 75 6c 2e  0a 4e 6f 74 65 20 74 68  |cessful..Note th|
00003d50  61 74 20 74 68 69 73 20  64 6f 65 73 6e e2 80 99  |at this doesn...|
00003d60  74 20 67 75 61 72 61 6e  74 65 65 20 61 6e 79 74  |t guarantee anyt|
alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 474p man2/madvise.2
Note that this doesn’t guarantee anything about other possible mappings of

The issue was in line 474, and the issue is that it uses a weird single quote. 
Please use the foillowing ASCII character for the single quote (see ascii(7)):
047   39    27    '

The rest of issues seems trivial:
Use .B instead of .BR because there's no "roman" (i.e., non-bold) part.

alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 445p man2/madvise.2
.BR MADV_COLLAPSE
alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 456p man2/madvise.2
.BR MADV_COLLAPSE
alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 463p man2/madvise.2
.BR VM_NOHUGEPAGE


I'll report a bug to groff(1) about the issue with the line count.

Cheers,

Alex

> ---
>   man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
>   man2/process_madvise.2 | 10 +++++
>   2 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index df3413cc8..b03fc731d 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -385,9 +385,10 @@ set (see
>   .BR prctl (2) ).
>   .IP
>   The
> -.B MADV_HUGEPAGE
> +.BR MADV_HUGEPAGE ,
> +.BR MADV_NOHUGEPAGE ,
>   and
> -.B MADV_NOHUGEPAGE
> +.B MADV_COLLAPSE
>   operations are available only if the kernel was configured with
>   .B CONFIG_TRANSPARENT_HUGEPAGE
>   and file/shmem memory is only supported if the kernel was configured with
> @@ -400,6 +401,81 @@ and
>   .I length
>   will not be backed by transparent hugepages.
>   .TP
> +.BR MADV_COLLAPSE " (since Linux 6.1)"
> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> +Perform a best-effort synchronous collapse of the native pages mapped by the
> +memory range into Transparent Huge Pages (THPs).
> +.B MADV_COLLAPSE
> +operates on the current state of memory of the calling process and makes no
> +persistent changes or guarantees on how pages will be mapped,
> +constructed,
> +or faulted in the future.
> +.IP
> +.B MADV_COLLAPSE
> +supports private anonymous pages (see
> +.BR mmap (2)),
> +shmem pages,
> +and file-backed pages.
> +See
> +.B MADV_HUGEPAGE
> +for general information on memory requirements for THP.
> +If the range provided spans multiple VMAs,
> +the semantics of the collapse over each VMA is independent from the others.
> +If collapse of a given huge page-aligned/sized region fails,
> +the operation may continue to attempt collapsing the remainder of the
> +specified memory.
> +.B MADV_COLLAPSE
> +will automatically clamp the provided range to be hugepage-aligned.
> +.IP
> +All non-resident pages covered by the range will first be
> +swapped/faulted-in,
> +before being copied onto a freshly allocated hugepage.
> +If the native pages compose the same PTE-mapped hugepage,
> +and are suitably aligned,
> +allocation of a new hugepage may be elided and collapse may happen
> +in-place.
> +Unmapped pages will have their data directly initialized to 0 in the new
> +hugepage.
> +However,
> +for every eligible hugepage-aligned/sized region to be collapsed,
> +at least one page must currently be backed by physical memory.
> +.IP
> +.BR MADV_COLLAPSE
> +is independent of any sysfs
> +(see
> +.BR sysfs (5))
> +setting under
> +.IR /sys/kernel/mm/transparent_hugepage ,
> +both in terms of determining THP eligibility,
> +and allocation semantics.
> +See Linux kernel source file
> +.I Documentation/admin\-guide/mm/transhuge.rst
> +for more information.
> +.BR MADV_COLLAPSE
> +also ignores
> +.B huge=
> +tmpfs mount when operating on tmpfs files.
> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> +regardless of VMA flags
> +(though
> +.BR VM_NOHUGEPAGE
> +is still respected).
> +.IP
> +When the system has multiple NUMA nodes,
> +the hugepage will be allocated from the node providing the most native
> +pages.
> +.IP
> +If all hugepage-sized/aligned regions covered by the provided range were
> +either successfully collapsed,
> +or were already PMD-mapped THPs,
> +this operation will be deemed successful.
> +Note that this doesn’t guarantee anything about other possible mappings of
> +the memory.
> +Also note that many failures might have occurred since the operation may
> +continue to collapse in the event collapse of a single hugepage-sized/aligned
> +region fails.
> +.TP
>   .BR MADV_DONTDUMP " (since Linux 3.4)"
>   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
>   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
>   .B EBADF
>   The map exists, but the area maps something that isn't a file.
>   .TP
> +.B EBUSY
> +(for
> +.BR MADV_COLLAPSE )
> +Could not charge hugepage to cgroup: cgroup limit exceeded.
> +.TP
>   .B EFAULT
>   .I advice
>   is
> @@ -716,6 +797,11 @@ maximum resident set size.
>   Not enough memory: paging in failed.
>   .TP
>   .B ENOMEM
> +(for
> +.BR MADV_COLLAPSE )
> +Not enough memory: could not allocate hugepage.
> +.TP
> +.B ENOMEM
>   Addresses in the specified range are not currently
>   mapped, or are outside the address space of the process.
>   .TP
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 44d3b94e8..8b0ddccdd 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -73,6 +73,10 @@ argument is one of the following values:
>   See
>   .BR madvise (2).
>   .TP
> +.B MADV_COLLAPSE
> +See
> +.BR madvise (2).
> +.TP
>   .B MADV_PAGEOUT
>   See
>   .BR madvise (2).
> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
>   .TP
>   .B ESRCH
>   The target process does not exist (i.e., it has terminated and been waited on).
> +.PP
> +See
> +.BR madvise (2)
> +for
> +.IR advice -specific
> +errors.
>   .SH VERSIONS
>   This system call first appeared in Linux 5.10.
>   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support
  2022-10-31 20:21       ` Alejandro Colomar
@ 2022-10-31 21:24         ` Zach O'Keefe
  0 siblings, 0 replies; 25+ messages in thread
From: Zach O'Keefe @ 2022-10-31 21:24 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk

Hey Alex,

On Mon, Oct 31, 2022 at 1:21 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Zach,
>
> On 10/31/22 17:33, Zach O'Keefe wrote:
> >> Patch applied.
> >
> > Thank you!
> >
> > Best,
> > Zach
> >
> >> Thanks,
> >>
> >> Alex
> >>
>
> I just caught a small formatting issue by running some linters through that file:
>
> $ make -ij lint >/dev/null 2>&1
> $ touch man2/madvise.2
> $ make -k lint-man
> LINT (groff)    tmp/lint/man2/madvise.2.lint-man.groff.touch
> an.tmac:man2/madvise.2:795: style: .IR expects at least 2 arguments, got 1
> found style problems; aborting
> make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1
> LINT (mandoc)   tmp/lint/man2/madvise.2.lint-man.mandoc.touch
> mandoc: man2/madvise.2:15:5: STYLE: lower case character in document title: TH
> madvise
> mandoc: man2/madvise.2:15:15: WARNING: cannot parse date, using it verbatim: TH
> (date)
> make: *** [lib/lint-man.mk:87: tmp/lint/man2/madvise.2.lint-man.mandoc.touch]
> Error 2
> make: Target 'lint-man' not remade because of errors.
>
>
> The issues reported by mandoc(1) are expected; but the issue reported by
> groff(1) is legit.  The offending line is:
>
> $ sed -n 795p man2/madvise.2
> .IR madvise(0,\ 0,\ advice)
>
> Since all of the line is formatted in italics (no roman part), it should be .I
> and not .IR.
> After fixing that line, the linter now says:
>
> $ make lint-man -k
> LINT (groff)    tmp/lint/man2/madvise.2.lint-man.groff.touch
> LINT (mandoc)   tmp/lint/man2/madvise.2.lint-man.mandoc.touch
> mandoc: man2/madvise.2:15:5: STYLE: lower case character in document title: TH
> madvise
> mandoc: man2/madvise.2:15:15: WARNING: cannot parse date, using it verbatim: TH
> (date)
> make: *** [lib/lint-man.mk:87: tmp/lint/man2/madvise.2.lint-man.mandoc.touch]
> Error 2
> make: Target 'lint-man' not remade because of errors.
>
>
> Ignoring the spurious mandoc(1) warnings, it's all good.
>
> Documentation about how to use this feature is here (written today, so no way
> you could have run it; I should have, though :D):
> <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING#n133>
>

Thanks for taking a look here and thanks for adding the documentation
to CONTRIBUTING -- was easy to follow.

As you mentioned in 1/4, I don't have the groff(1) version (GNU groff
version 1.22.4) to see the error, but it seems like a real fix (& I've
taken a look at your fix patch - thank you :) )

Thanks again,

Best,
Zach


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements
  2022-10-31 20:24       ` Alejandro Colomar
@ 2022-10-31 21:25         ` Zach O'Keefe
  0 siblings, 0 replies; 25+ messages in thread
From: Zach O'Keefe @ 2022-10-31 21:25 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Suren Baghdasaryan, Yang Shi, linux-mm, linux-man, Minchan Kim,
	Michael Kerrisk

On Mon, Oct 31, 2022 at 1:24 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Suren, Zach,
>
> On 10/31/22 20:13, Suren Baghdasaryan wrote:
> > Hi Alex,
> >
> > On Sun, Oct 30, 2022 at 4:50 AM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi Suren,
> >>
> >> On 10/22/22 00:32, Zach OKeefe wrote:
> >>> From: Zach O'Keefe <zokeefe@google.com>
> >>>
> >>> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> >>> ("mm/madvise: introduce process_madvise() syscall: an external memory
> >>> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> >>> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> >>> attach requirement for process_madvise") which replaced this with a
> >>> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> >>> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> >>> performance).
> >>>
> >>> The initial commit of process_madvise(2) to man-pages project, made
> >>> after the second patch, included two errors:
> >>>
> >>> 1) CAP_SYS_ADMIN instead of CAP_SYS_NICE
> >>> 2) PTRACE_MODE_READ_REALCREDS instead of PTRACE_MODE_READ_FSCREDS
> >>>
> >>> Correct this in the man-page for process_madvise(2).
> >>>
> >>> Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
> >>> Cc: Suren Baghdasaryan <surenb@google.com>
> >>> Cc: Minchan Kim <minchan@kernel.org>
> >>> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> >
> > Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>
> Thanks!  Patch applied.

Thanks Suren & Alex!

Best,
Zach

> >
> >>
> >> You added your Reviewed-by tag to v2 of this patch.  I guess you'd like to put
> >> it in this one too, but since it changed slightly, I'd like you to confirm.
> >
> > Thanks for the reminder!
>
> :)
>
> Cheers,
>
> Alex
>
> > Suren.
> >
> >>
> >> Thanks,
> >>
> >> Alex
> >>
> >>> ---
> >>>    man2/process_madvise.2 | 21 +++++++++++++++++----
> >>>    1 file changed, 17 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> >>> index 6208206e4..44d3b94e8 100644
> >>> --- a/man2/process_madvise.2
> >>> +++ b/man2/process_madvise.2
> >>> @@ -105,16 +105,20 @@ remote process.
> >>>    No further elements will be processed beyond that point.
> >>>    (See the discussion regarding partial advice in RETURN VALUE.)
> >>>    .PP
> >>> -Permission to apply advice to another process is governed by a
> >>> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> >>> +Starting in Linux 5.12,
> >>> +permission to apply advice to another process is governed by
> >>>    ptrace access mode
> >>> -.B PTRACE_MODE_READ_REALCREDS
> >>> +.B PTRACE_MODE_READ_FSCREDS
> >>>    check (see
> >>>    .BR ptrace (2));
> >>>    in addition,
> >>>    because of the performance implications of applying the advice,
> >>>    the caller must have the
> >>> -.B CAP_SYS_ADMIN
> >>> -capability.
> >>> +.B CAP_SYS_NICE
> >>> +capability
> >>> +(see
> >>> +.BR capabilities (7)).
> >>>    .SH RETURN VALUE
> >>>    On success,
> >>>    .BR process_madvise ()
> >>> @@ -180,6 +184,15 @@ configuration option.
> >>>    The
> >>>    .BR process_madvise ()
> >>>    system call is Linux-specific.
> >>> +.SH NOTES
> >>> +When this system call first appeared in Linux 5.10,
> >>> +permission to apply advice to another process was entirely governed by
> >>> +ptrace access mode
> >>> +.B PTRACE_MODE_ATTACH_FSCREDS
> >>> +check (see
> >>> +.BR ptrace (2)).
> >>> +This requirement was relaxed in Linux 5.12 so that the caller didn't require
> >>> +full control over the target process.
> >>>    .SH SEE ALSO
> >>>    .BR madvise (2),
> >>>    .BR pidfd_open (2),
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-31 21:15   ` Alejandro Colomar
@ 2022-10-31 23:00     ` Zach O'Keefe
  2022-11-01  1:51     ` G. Branden Robinson
  1 sibling, 0 replies; 25+ messages in thread
From: Zach O'Keefe @ 2022-10-31 23:00 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man, Michael Kerrisk

Hey Alex!

On Mon, Oct 31, 2022 at 2:15 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Zach!
>
> On 10/22/22 00:33, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> > MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> > process_madvise(2).
> >
> > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> There are a few issues with this patch:
>
> alx@asus5775:~/src/linux/man-pages/man-pages$ make lint-man-groff
> LINT (groff)    tmp/lint/man2/madvise.2.lint-man.groff.touch
> eqn:man2/madvise.2:473: error: invalid input character code '128'
> eqn:man2/madvise.2:473: error: invalid input character code '153'
> an.tmac:man2/madvise.2:445: style: .BR expects at least 2 arguments, got 1
> an.tmac:man2/madvise.2:456: style: .BR expects at least 2 arguments, got 1
> an.tmac:man2/madvise.2:463: style: .BR expects at least 2 arguments, got 1
> found style problems; aborting
> make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1
>
>
> Let's investigate them:
>

Thank you :)

> alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 473p man2/madvise.2
> this operation will be deemed successful.
>
> This one was a bit difficult to track, since the line count seems to be off by one:
>
> alx@asus5775:~/src/linux/man-pages/man-pages$ tbl man2/madvise.2 | hd | grep -C1
> ' 80 '
> 00003d40  63 65 73 73 66 75 6c 2e  0a 4e 6f 74 65 20 74 68  |cessful..Note th|
> 00003d50  61 74 20 74 68 69 73 20  64 6f 65 73 6e e2 80 99  |at this doesn...|
> 00003d60  74 20 67 75 61 72 61 6e  74 65 65 20 61 6e 79 74  |t guarantee anyt|
> alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 474p man2/madvise.2
> Note that this doesn’t guarantee anything about other possible mappings of
>
> The issue was in line 474, and the issue is that it uses a weird single quote.
> Please use the foillowing ASCII character for the single quote (see ascii(7)):
> 047   39    27    '
>

Very weird and good find! Honestly, I had prototyped this in Google
Docs and copy-pasta'd this over as the basis. I tried testing this
again - and same thing - Google Docs uses some other character.
Anyways - glad you caught this.

> The rest of issues seems trivial:
> Use .B instead of .BR because there's no "roman" (i.e., non-bold) part.
>

This was the first time it clicked what ".BR" meant: "bold followed by roman".

> alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 445p man2/madvise.2
> .BR MADV_COLLAPSE
> alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 456p man2/madvise.2
> .BR MADV_COLLAPSE
> alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 463p man2/madvise.2
> .BR VM_NOHUGEPAGE
>

These didn't show up with my version of groff (as in 1/2), but I've
applied the fixes and sent out a v4 for this patch. Again, thank you
for all your help here!

Best,
Zach

>
> I'll report a bug to groff(1) about the issue with the line count.
>

Ya that's an odd one. Sorry for having to encounter this - must have
been quite confusing. Thank you!

> Cheers,
>
> Alex
>
> > ---
> >   man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
> >   man2/process_madvise.2 | 10 +++++
> >   2 files changed, 98 insertions(+), 2 deletions(-)
> >
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index df3413cc8..b03fc731d 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -385,9 +385,10 @@ set (see
> >   .BR prctl (2) ).
> >   .IP
> >   The
> > -.B MADV_HUGEPAGE
> > +.BR MADV_HUGEPAGE ,
> > +.BR MADV_NOHUGEPAGE ,
> >   and
> > -.B MADV_NOHUGEPAGE
> > +.B MADV_COLLAPSE
> >   operations are available only if the kernel was configured with
> >   .B CONFIG_TRANSPARENT_HUGEPAGE
> >   and file/shmem memory is only supported if the kernel was configured with
> > @@ -400,6 +401,81 @@ and
> >   .I length
> >   will not be backed by transparent hugepages.
> >   .TP
> > +.BR MADV_COLLAPSE " (since Linux 6.1)"
> > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> > +Perform a best-effort synchronous collapse of the native pages mapped by the
> > +memory range into Transparent Huge Pages (THPs).
> > +.B MADV_COLLAPSE
> > +operates on the current state of memory of the calling process and makes no
> > +persistent changes or guarantees on how pages will be mapped,
> > +constructed,
> > +or faulted in the future.
> > +.IP
> > +.B MADV_COLLAPSE
> > +supports private anonymous pages (see
> > +.BR mmap (2)),
> > +shmem pages,
> > +and file-backed pages.
> > +See
> > +.B MADV_HUGEPAGE
> > +for general information on memory requirements for THP.
> > +If the range provided spans multiple VMAs,
> > +the semantics of the collapse over each VMA is independent from the others.
> > +If collapse of a given huge page-aligned/sized region fails,
> > +the operation may continue to attempt collapsing the remainder of the
> > +specified memory.
> > +.B MADV_COLLAPSE
> > +will automatically clamp the provided range to be hugepage-aligned.
> > +.IP
> > +All non-resident pages covered by the range will first be
> > +swapped/faulted-in,
> > +before being copied onto a freshly allocated hugepage.
> > +If the native pages compose the same PTE-mapped hugepage,
> > +and are suitably aligned,
> > +allocation of a new hugepage may be elided and collapse may happen
> > +in-place.
> > +Unmapped pages will have their data directly initialized to 0 in the new
> > +hugepage.
> > +However,
> > +for every eligible hugepage-aligned/sized region to be collapsed,
> > +at least one page must currently be backed by physical memory.
> > +.IP
> > +.BR MADV_COLLAPSE
> > +is independent of any sysfs
> > +(see
> > +.BR sysfs (5))
> > +setting under
> > +.IR /sys/kernel/mm/transparent_hugepage ,
> > +both in terms of determining THP eligibility,
> > +and allocation semantics.
> > +See Linux kernel source file
> > +.I Documentation/admin\-guide/mm/transhuge.rst
> > +for more information.
> > +.BR MADV_COLLAPSE
> > +also ignores
> > +.B huge=
> > +tmpfs mount when operating on tmpfs files.
> > +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> > +regardless of VMA flags
> > +(though
> > +.BR VM_NOHUGEPAGE
> > +is still respected).
> > +.IP
> > +When the system has multiple NUMA nodes,
> > +the hugepage will be allocated from the node providing the most native
> > +pages.
> > +.IP
> > +If all hugepage-sized/aligned regions covered by the provided range were
> > +either successfully collapsed,
> > +or were already PMD-mapped THPs,
> > +this operation will be deemed successful.
> > +Note that this doesn’t guarantee anything about other possible mappings of
> > +the memory.
> > +Also note that many failures might have occurred since the operation may
> > +continue to collapse in the event collapse of a single hugepage-sized/aligned
> > +region fails.
> > +.TP
> >   .BR MADV_DONTDUMP " (since Linux 3.4)"
> >   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
> >   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
> >   .B EBADF
> >   The map exists, but the area maps something that isn't a file.
> >   .TP
> > +.B EBUSY
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Could not charge hugepage to cgroup: cgroup limit exceeded.
> > +.TP
> >   .B EFAULT
> >   .I advice
> >   is
> > @@ -716,6 +797,11 @@ maximum resident set size.
> >   Not enough memory: paging in failed.
> >   .TP
> >   .B ENOMEM
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Not enough memory: could not allocate hugepage.
> > +.TP
> > +.B ENOMEM
> >   Addresses in the specified range are not currently
> >   mapped, or are outside the address space of the process.
> >   .TP
> > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> > index 44d3b94e8..8b0ddccdd 100644
> > --- a/man2/process_madvise.2
> > +++ b/man2/process_madvise.2
> > @@ -73,6 +73,10 @@ argument is one of the following values:
> >   See
> >   .BR madvise (2).
> >   .TP
> > +.B MADV_COLLAPSE
> > +See
> > +.BR madvise (2).
> > +.TP
> >   .B MADV_PAGEOUT
> >   See
> >   .BR madvise (2).
> > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
> >   .TP
> >   .B ESRCH
> >   The target process does not exist (i.e., it has terminated and been waited on).
> > +.PP
> > +See
> > +.BR madvise (2)
> > +for
> > +.IR advice -specific
> > +errors.
> >   .SH VERSIONS
> >   This system call first appeared in Linux 5.10.
> >   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-31 21:15   ` Alejandro Colomar
  2022-10-31 23:00     ` Zach O'Keefe
@ 2022-11-01  1:51     ` G. Branden Robinson
  2022-11-01 12:12       ` Alejandro Colomar
  1 sibling, 1 reply; 25+ messages in thread
From: G. Branden Robinson @ 2022-11-01  1:51 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Zach OKeefe, Yang Shi, linux-mm, linux-man, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

At 2022-10-31T22:15:09+0100, Alejandro Colomar wrote:
> The issue was in line 474, and the issue is that it uses a weird single
> quote. Please use the foillowing ASCII character for the single quote (see
> ascii(7)):
> 047   39    27    '
[...]
> I'll report a bug to groff(1) about the issue with the line count.

Thanks, Alex.  There appear to be some very old bugs around input line
number tracking in GNU eqn, possibly going back 30+ years.

I've committed a regression test[1] and fix.[2]  The fix can be expected
(along with literally hundreds of others) in groff 1.23.

And now I see I managed to sneak a cosmetic indentation error into the
commit message for the second (but not the ChangeLog), where I'm stuck
with it forever.  Oh well.

Regards,
Branden

[1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=7e23e1342077a6d7c0b02c3d666f131d95f2b510
[2] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=dc98a8b09e7f3dcfe968b978eb210f468db78cc9

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-11-01  1:51     ` G. Branden Robinson
@ 2022-11-01 12:12       ` Alejandro Colomar
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-11-01 12:12 UTC (permalink / raw)
  To: G. Branden Robinson
  Cc: Zach OKeefe, Yang Shi, linux-mm, linux-man, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 1517 bytes --]

Hey Branden!

On 11/1/22 02:51, G. Branden Robinson wrote:
> At 2022-10-31T22:15:09+0100, Alejandro Colomar wrote:
>> The issue was in line 474, and the issue is that it uses a weird single
>> quote. Please use the foillowing ASCII character for the single quote (see
>> ascii(7)):
>> 047   39    27    '
> [...]
>> I'll report a bug to groff(1) about the issue with the line count.
> 
> Thanks, Alex.  There appear to be some very old bugs around input line
> number tracking in GNU eqn, possibly going back 30+ years.

Heh!  It feels good finding 30-yr-old bugs :)

> 
> I've committed a regression test[1] and fix.[2]  The fix can be expected
> (along with literally hundreds of others) in groff 1.23.

Yeah, I'm waiting[1] for that release to have 'make lint' be usable by 
contributors.  But it's been already proved in this patch set that it can be 
useful to catch things that I miss, even if I have to run it.

[1]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/INSTALL#n98>

> 
> And now I see I managed to sneak a cosmetic indentation error into the
> commit message for the second (but not the ChangeLog), where I'm stuck
> with it forever.  Oh well.

:)

Cheers,
Alex

> 
> Regards,
> Branden
> 
> [1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=7e23e1342077a6d7c0b02c3d666f131d95f2b510
> [2] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=dc98a8b09e7f3dcfe968b978eb210f468db78cc9

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-21 22:33 ` [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
  2022-10-31 21:15   ` Alejandro Colomar
@ 2022-12-11 17:59   ` Alejandro Colomar
  2022-12-11 21:51     ` Zach O'Keefe
  1 sibling, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-11 17:59 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 7073 bytes --]

Hi Zach,

On 10/22/22 00:33, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> process_madvise(2).
> 
> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Please see a few comments below.

Cheers,

Alex

> ---
>   man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
>   man2/process_madvise.2 | 10 +++++
>   2 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index df3413cc8..b03fc731d 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -385,9 +385,10 @@ set (see
>   .BR prctl (2) ).
>   .IP
>   The
> -.B MADV_HUGEPAGE
> +.BR MADV_HUGEPAGE ,
> +.BR MADV_NOHUGEPAGE ,
>   and
> -.B MADV_NOHUGEPAGE
> +.B MADV_COLLAPSE
>   operations are available only if the kernel was configured with
>   .B CONFIG_TRANSPARENT_HUGEPAGE
>   and file/shmem memory is only supported if the kernel was configured with
> @@ -400,6 +401,81 @@ and
>   .I length
>   will not be backed by transparent hugepages.
>   .TP
> +.BR MADV_COLLAPSE " (since Linux 6.1)"
> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> +Perform a best-effort synchronous collapse of the native pages mapped by the

Please use semantic line breaks.  In this case, I'd break after "pages".

man-pages(7):
    Use semantic newlines
        In  the source of a manual page, new sentences should be started on new
        lines, long sentences should be split into lines at clause breaks (com‐
        mas, semicolons, colons, and so on), and long clauses should  be  split
        at  phrase  boundaries.   This convention, sometimes known as "semantic
        newlines", makes it easier to see the effect of  patches,  which  often
        operate at the level of individual sentences, clauses, or phrases.

> +memory range into Transparent Huge Pages (THPs).
> +.B MADV_COLLAPSE
> +operates on the current state of memory of the calling process and makes no

Here I'd break after "and".

> +persistent changes or guarantees on how pages will be mapped,
> +constructed,
> +or faulted in the future.
> +.IP
> +.B MADV_COLLAPSE
> +supports private anonymous pages (see
> +.BR mmap (2)),
> +shmem pages,
> +and file-backed pages.
> +See
> +.B MADV_HUGEPAGE
> +for general information on memory requirements for THP.
> +If the range provided spans multiple VMAs,
> +the semantics of the collapse over each VMA is independent from the others.
> +If collapse of a given huge page-aligned/sized region fails,
> +the operation may continue to attempt collapsing the remainder of the

Break after "collapsing".

> +specified memory.
> +.B MADV_COLLAPSE
> +will automatically clamp the provided range to be hugepage-aligned.
> +.IP
> +All non-resident pages covered by the range will first be

Break after "range".

> +swapped/faulted-in,
> +before being copied onto a freshly allocated hugepage.
> +If the native pages compose the same PTE-mapped hugepage,
> +and are suitably aligned,
> +allocation of a new hugepage may be elided and collapse may happen

Break before or after "and".

> +in-place.
> +Unmapped pages will have their data directly initialized to 0 in the new

Break after "0".

> +hugepage.
> +However,
> +for every eligible hugepage-aligned/sized region to be collapsed,
> +at least one page must currently be backed by physical memory.
> +.IP
> +.BR MADV_COLLAPSE

s/BR/B/

> +is independent of any sysfs
> +(see
> +.BR sysfs (5))
> +setting under
> +.IR /sys/kernel/mm/transparent_hugepage ,
> +both in terms of determining THP eligibility,
> +and allocation semantics.
> +See Linux kernel source file
> +.I Documentation/admin\-guide/mm/transhuge.rst
> +for more information.
> +.BR MADV_COLLAPSE

s/BR/B/

> +also ignores
> +.B huge=
> +tmpfs mount when operating on tmpfs files.
> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> +regardless of VMA flags
> +(though
> +.BR VM_NOHUGEPAGE

s/BR/B/

> +is still respected).
> +.IP
> +When the system has multiple NUMA nodes,
> +the hugepage will be allocated from the node providing the most native

Break after "from".

> +pages.
> +.IP
> +If all hugepage-sized/aligned regions covered by the provided range were

Prefer English rather than "/".

> +either successfully collapsed,
> +or were already PMD-mapped THPs,
> +this operation will be deemed successful.
> +Note that this doesn’t guarantee anything about other possible mappings of

Break after "about".

> +the memory.
> +Also note that many failures might have occurred since the operation may
> +continue to collapse in the event collapse of a single hugepage-sized/aligned

Add some omitted "that" or something that will help readability to 
non-native-English readers.

And break at a better place.

> +region fails.
> +.TP
>   .BR MADV_DONTDUMP " (since Linux 3.4)"
>   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
>   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
>   .B EBADF
>   The map exists, but the area maps something that isn't a file.
>   .TP
> +.B EBUSY
> +(for
> +.BR MADV_COLLAPSE )
> +Could not charge hugepage to cgroup: cgroup limit exceeded.
> +.TP
>   .B EFAULT
>   .I advice
>   is
> @@ -716,6 +797,11 @@ maximum resident set size.
>   Not enough memory: paging in failed.
>   .TP
>   .B ENOMEM
> +(for
> +.BR MADV_COLLAPSE )
> +Not enough memory: could not allocate hugepage.
> +.TP
> +.B ENOMEM
>   Addresses in the specified range are not currently
>   mapped, or are outside the address space of the process.
>   .TP
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 44d3b94e8..8b0ddccdd 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -73,6 +73,10 @@ argument is one of the following values:
>   See
>   .BR madvise (2).
>   .TP
> +.B MADV_COLLAPSE
> +See
> +.BR madvise (2).
> +.TP
>   .B MADV_PAGEOUT
>   See
>   .BR madvise (2).
> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
>   .TP
>   .B ESRCH
>   The target process does not exist (i.e., it has terminated and been waited on).
> +.PP
> +See
> +.BR madvise (2)
> +for
> +.IR advice -specific
> +errors.
>   .SH VERSIONS
>   This system call first appeared in Linux 5.10.
>   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-12-11 17:59   ` Alejandro Colomar
@ 2022-12-11 21:51     ` Zach O'Keefe
  2022-12-11 21:55       ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zach O'Keefe @ 2022-12-11 21:51 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man

On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hi Zach,

Hey Alex,

> On 10/22/22 00:33, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> > MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> > process_madvise(2).
> >
> > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Please see a few comments below.
>

Thanks for the mail. So, this patch was taken as commit b106cd5bf
("madvise.2: add documentation for MADV_COLLAPSE"). Some of your
comments below were
applied (I think, by you) as fixes pre-commit. However, there are some
new comments (or ones
that address the same lines, but in different ways). Is this mail to
log ~ what changes were done,
or is there anything actionable here on my side?

Best,
Zach

Thanks for this.
> Cheers,
>
> Alex
>
> > ---
> >   man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
> >   man2/process_madvise.2 | 10 +++++
> >   2 files changed, 98 insertions(+), 2 deletions(-)
> >
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index df3413cc8..b03fc731d 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -385,9 +385,10 @@ set (see
> >   .BR prctl (2) ).
> >   .IP
> >   The
> > -.B MADV_HUGEPAGE
> > +.BR MADV_HUGEPAGE ,
> > +.BR MADV_NOHUGEPAGE ,
> >   and
> > -.B MADV_NOHUGEPAGE
> > +.B MADV_COLLAPSE
> >   operations are available only if the kernel was configured with
> >   .B CONFIG_TRANSPARENT_HUGEPAGE
> >   and file/shmem memory is only supported if the kernel was configured with
> > @@ -400,6 +401,81 @@ and
> >   .I length
> >   will not be backed by transparent hugepages.
> >   .TP
> > +.BR MADV_COLLAPSE " (since Linux 6.1)"
> > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> > +Perform a best-effort synchronous collapse of the native pages mapped by the
>
> Please use semantic line breaks.  In this case, I'd break after "pages".
>
> man-pages(7):
>     Use semantic newlines
>         In  the source of a manual page, new sentences should be started on new
>         lines, long sentences should be split into lines at clause breaks (com‐
>         mas, semicolons, colons, and so on), and long clauses should  be  split
>         at  phrase  boundaries.   This convention, sometimes known as "semantic
>         newlines", makes it easier to see the effect of  patches,  which  often
>         operate at the level of individual sentences, clauses, or phrases.
>
> > +memory range into Transparent Huge Pages (THPs).
> > +.B MADV_COLLAPSE
> > +operates on the current state of memory of the calling process and makes no
>
> Here I'd break after "and".
>
> > +persistent changes or guarantees on how pages will be mapped,
> > +constructed,
> > +or faulted in the future.
> > +.IP
> > +.B MADV_COLLAPSE
> > +supports private anonymous pages (see
> > +.BR mmap (2)),
> > +shmem pages,
> > +and file-backed pages.
> > +See
> > +.B MADV_HUGEPAGE
> > +for general information on memory requirements for THP.
> > +If the range provided spans multiple VMAs,
> > +the semantics of the collapse over each VMA is independent from the others.
> > +If collapse of a given huge page-aligned/sized region fails,
> > +the operation may continue to attempt collapsing the remainder of the
>
> Break after "collapsing".
>
> > +specified memory.
> > +.B MADV_COLLAPSE
> > +will automatically clamp the provided range to be hugepage-aligned.
> > +.IP
> > +All non-resident pages covered by the range will first be
>
> Break after "range".
>
> > +swapped/faulted-in,
> > +before being copied onto a freshly allocated hugepage.
> > +If the native pages compose the same PTE-mapped hugepage,
> > +and are suitably aligned,
> > +allocation of a new hugepage may be elided and collapse may happen
>
> Break before or after "and".
>
> > +in-place.
> > +Unmapped pages will have their data directly initialized to 0 in the new
>
> Break after "0".
>
> > +hugepage.
> > +However,
> > +for every eligible hugepage-aligned/sized region to be collapsed,
> > +at least one page must currently be backed by physical memory.
> > +.IP
> > +.BR MADV_COLLAPSE
>
> s/BR/B/
>
> > +is independent of any sysfs
> > +(see
> > +.BR sysfs (5))
> > +setting under
> > +.IR /sys/kernel/mm/transparent_hugepage ,
> > +both in terms of determining THP eligibility,
> > +and allocation semantics.
> > +See Linux kernel source file
> > +.I Documentation/admin\-guide/mm/transhuge.rst
> > +for more information.
> > +.BR MADV_COLLAPSE
>
> s/BR/B/
>
> > +also ignores
> > +.B huge=
> > +tmpfs mount when operating on tmpfs files.
> > +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> > +regardless of VMA flags
> > +(though
> > +.BR VM_NOHUGEPAGE
>
> s/BR/B/
>
> > +is still respected).
> > +.IP
> > +When the system has multiple NUMA nodes,
> > +the hugepage will be allocated from the node providing the most native
>
> Break after "from".
>
> > +pages.
> > +.IP
> > +If all hugepage-sized/aligned regions covered by the provided range were
>
> Prefer English rather than "/".
>
> > +either successfully collapsed,
> > +or were already PMD-mapped THPs,
> > +this operation will be deemed successful.
> > +Note that this doesn’t guarantee anything about other possible mappings of
>
> Break after "about".
>
> > +the memory.
> > +Also note that many failures might have occurred since the operation may
> > +continue to collapse in the event collapse of a single hugepage-sized/aligned
>
> Add some omitted "that" or something that will help readability to
> non-native-English readers.
>
> And break at a better place.
>
> > +region fails.
> > +.TP
> >   .BR MADV_DONTDUMP " (since Linux 3.4)"
> >   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
> >   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
> >   .B EBADF
> >   The map exists, but the area maps something that isn't a file.
> >   .TP
> > +.B EBUSY
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Could not charge hugepage to cgroup: cgroup limit exceeded.
> > +.TP
> >   .B EFAULT
> >   .I advice
> >   is
> > @@ -716,6 +797,11 @@ maximum resident set size.
> >   Not enough memory: paging in failed.
> >   .TP
> >   .B ENOMEM
> > +(for
> > +.BR MADV_COLLAPSE )
> > +Not enough memory: could not allocate hugepage.
> > +.TP
> > +.B ENOMEM
> >   Addresses in the specified range are not currently
> >   mapped, or are outside the address space of the process.
> >   .TP
> > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> > index 44d3b94e8..8b0ddccdd 100644
> > --- a/man2/process_madvise.2
> > +++ b/man2/process_madvise.2
> > @@ -73,6 +73,10 @@ argument is one of the following values:
> >   See
> >   .BR madvise (2).
> >   .TP
> > +.B MADV_COLLAPSE
> > +See
> > +.BR madvise (2).
> > +.TP
> >   .B MADV_PAGEOUT
> >   See
> >   .BR madvise (2).
> > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
> >   .TP
> >   .B ESRCH
> >   The target process does not exist (i.e., it has terminated and been waited on).
> > +.PP
> > +See
> > +.BR madvise (2)
> > +for
> > +.IR advice -specific
> > +errors.
> >   .SH VERSIONS
> >   This system call first appeared in Linux 5.10.
> >   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-12-11 21:51     ` Zach O'Keefe
@ 2022-12-11 21:55       ` Alejandro Colomar
  2022-12-11 22:37         ` Zach O'Keefe
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-11 21:55 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 8659 bytes --]

Hey Zach,

On 12/11/22 22:51, Zach O'Keefe wrote:
> On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> Hi Zach,
> 
> Hey Alex,
> 
>> On 10/22/22 00:33, Zach OKeefe wrote:
>>> From: Zach O'Keefe <zokeefe@google.com>
>>>
>>> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
>>> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
>>> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
>>> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
>>> process_madvise(2).
>>>
>>> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
>>> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
>>> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>>
>> Please see a few comments below.
>>
> 
> Thanks for the mail. So, this patch was taken as commit b106cd5bf
> ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your
> comments below were
> applied (I think, by you) as fixes pre-commit. However, there are some
> new comments (or ones
> that address the same lines, but in different ways). Is this mail to
> log ~ what changes were done,
> or is there anything actionable here on my side?

Ah no, it's just that I had it marked as unread for some reason, so I thought I 
had forgotten to respond (and I forgot that I had applied it).  :-)

So, no action required.

Regarding different suggestions, heh, it demonstrates that it's not exactly 
deterministic :P

Cheers,

Alex

P.S.: Do you know if I have anything missing from you or any of your collegues?

> 
> Best,
> Zach
> 
> Thanks for this.
>> Cheers,
>>
>> Alex
>>
>>> ---
>>>    man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
>>>    man2/process_madvise.2 | 10 +++++
>>>    2 files changed, 98 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index df3413cc8..b03fc731d 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -385,9 +385,10 @@ set (see
>>>    .BR prctl (2) ).
>>>    .IP
>>>    The
>>> -.B MADV_HUGEPAGE
>>> +.BR MADV_HUGEPAGE ,
>>> +.BR MADV_NOHUGEPAGE ,
>>>    and
>>> -.B MADV_NOHUGEPAGE
>>> +.B MADV_COLLAPSE
>>>    operations are available only if the kernel was configured with
>>>    .B CONFIG_TRANSPARENT_HUGEPAGE
>>>    and file/shmem memory is only supported if the kernel was configured with
>>> @@ -400,6 +401,81 @@ and
>>>    .I length
>>>    will not be backed by transparent hugepages.
>>>    .TP
>>> +.BR MADV_COLLAPSE " (since Linux 6.1)"
>>> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
>>> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
>>> +Perform a best-effort synchronous collapse of the native pages mapped by the
>>
>> Please use semantic line breaks.  In this case, I'd break after "pages".
>>
>> man-pages(7):
>>      Use semantic newlines
>>          In  the source of a manual page, new sentences should be started on new
>>          lines, long sentences should be split into lines at clause breaks (com‐
>>          mas, semicolons, colons, and so on), and long clauses should  be  split
>>          at  phrase  boundaries.   This convention, sometimes known as "semantic
>>          newlines", makes it easier to see the effect of  patches,  which  often
>>          operate at the level of individual sentences, clauses, or phrases.
>>
>>> +memory range into Transparent Huge Pages (THPs).
>>> +.B MADV_COLLAPSE
>>> +operates on the current state of memory of the calling process and makes no
>>
>> Here I'd break after "and".
>>
>>> +persistent changes or guarantees on how pages will be mapped,
>>> +constructed,
>>> +or faulted in the future.
>>> +.IP
>>> +.B MADV_COLLAPSE
>>> +supports private anonymous pages (see
>>> +.BR mmap (2)),
>>> +shmem pages,
>>> +and file-backed pages.
>>> +See
>>> +.B MADV_HUGEPAGE
>>> +for general information on memory requirements for THP.
>>> +If the range provided spans multiple VMAs,
>>> +the semantics of the collapse over each VMA is independent from the others.
>>> +If collapse of a given huge page-aligned/sized region fails,
>>> +the operation may continue to attempt collapsing the remainder of the
>>
>> Break after "collapsing".
>>
>>> +specified memory.
>>> +.B MADV_COLLAPSE
>>> +will automatically clamp the provided range to be hugepage-aligned.
>>> +.IP
>>> +All non-resident pages covered by the range will first be
>>
>> Break after "range".
>>
>>> +swapped/faulted-in,
>>> +before being copied onto a freshly allocated hugepage.
>>> +If the native pages compose the same PTE-mapped hugepage,
>>> +and are suitably aligned,
>>> +allocation of a new hugepage may be elided and collapse may happen
>>
>> Break before or after "and".
>>
>>> +in-place.
>>> +Unmapped pages will have their data directly initialized to 0 in the new
>>
>> Break after "0".
>>
>>> +hugepage.
>>> +However,
>>> +for every eligible hugepage-aligned/sized region to be collapsed,
>>> +at least one page must currently be backed by physical memory.
>>> +.IP
>>> +.BR MADV_COLLAPSE
>>
>> s/BR/B/
>>
>>> +is independent of any sysfs
>>> +(see
>>> +.BR sysfs (5))
>>> +setting under
>>> +.IR /sys/kernel/mm/transparent_hugepage ,
>>> +both in terms of determining THP eligibility,
>>> +and allocation semantics.
>>> +See Linux kernel source file
>>> +.I Documentation/admin\-guide/mm/transhuge.rst
>>> +for more information.
>>> +.BR MADV_COLLAPSE
>>
>> s/BR/B/
>>
>>> +also ignores
>>> +.B huge=
>>> +tmpfs mount when operating on tmpfs files.
>>> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
>>> +regardless of VMA flags
>>> +(though
>>> +.BR VM_NOHUGEPAGE
>>
>> s/BR/B/
>>
>>> +is still respected).
>>> +.IP
>>> +When the system has multiple NUMA nodes,
>>> +the hugepage will be allocated from the node providing the most native
>>
>> Break after "from".
>>
>>> +pages.
>>> +.IP
>>> +If all hugepage-sized/aligned regions covered by the provided range were
>>
>> Prefer English rather than "/".
>>
>>> +either successfully collapsed,
>>> +or were already PMD-mapped THPs,
>>> +this operation will be deemed successful.
>>> +Note that this doesn’t guarantee anything about other possible mappings of
>>
>> Break after "about".
>>
>>> +the memory.
>>> +Also note that many failures might have occurred since the operation may
>>> +continue to collapse in the event collapse of a single hugepage-sized/aligned
>>
>> Add some omitted "that" or something that will help readability to
>> non-native-English readers.
>>
>> And break at a better place.
>>
>>> +region fails.
>>> +.TP
>>>    .BR MADV_DONTDUMP " (since Linux 3.4)"
>>>    .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
>>>    .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
>>> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
>>>    .B EBADF
>>>    The map exists, but the area maps something that isn't a file.
>>>    .TP
>>> +.B EBUSY
>>> +(for
>>> +.BR MADV_COLLAPSE )
>>> +Could not charge hugepage to cgroup: cgroup limit exceeded.
>>> +.TP
>>>    .B EFAULT
>>>    .I advice
>>>    is
>>> @@ -716,6 +797,11 @@ maximum resident set size.
>>>    Not enough memory: paging in failed.
>>>    .TP
>>>    .B ENOMEM
>>> +(for
>>> +.BR MADV_COLLAPSE )
>>> +Not enough memory: could not allocate hugepage.
>>> +.TP
>>> +.B ENOMEM
>>>    Addresses in the specified range are not currently
>>>    mapped, or are outside the address space of the process.
>>>    .TP
>>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
>>> index 44d3b94e8..8b0ddccdd 100644
>>> --- a/man2/process_madvise.2
>>> +++ b/man2/process_madvise.2
>>> @@ -73,6 +73,10 @@ argument is one of the following values:
>>>    See
>>>    .BR madvise (2).
>>>    .TP
>>> +.B MADV_COLLAPSE
>>> +See
>>> +.BR madvise (2).
>>> +.TP
>>>    .B MADV_PAGEOUT
>>>    See
>>>    .BR madvise (2).
>>> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
>>>    .TP
>>>    .B ESRCH
>>>    The target process does not exist (i.e., it has terminated and been waited on).
>>> +.PP
>>> +See
>>> +.BR madvise (2)
>>> +for
>>> +.IR advice -specific
>>> +errors.
>>>    .SH VERSIONS
>>>    This system call first appeared in Linux 5.10.
>>>    .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
>>
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-12-11 21:55       ` Alejandro Colomar
@ 2022-12-11 22:37         ` Zach O'Keefe
  2022-12-11 22:39           ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zach O'Keefe @ 2022-12-11 22:37 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Yang Shi, linux-mm, linux-man

On Sun, Dec 11, 2022 at 1:55 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hey Zach,
>
> On 12/11/22 22:51, Zach O'Keefe wrote:
> > On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar
> > <alx.manpages@gmail.com> wrote:
> >>
> >> Hi Zach,
> >
> > Hey Alex,
> >
> >> On 10/22/22 00:33, Zach OKeefe wrote:
> >>> From: Zach O'Keefe <zokeefe@google.com>
> >>>
> >>> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> >>> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> >>> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> >>> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> >>> process_madvise(2).
> >>>
> >>> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> >>> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> >>> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> >>
> >> Please see a few comments below.
> >>
> >
> > Thanks for the mail. So, this patch was taken as commit b106cd5bf
> > ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your
> > comments below were
> > applied (I think, by you) as fixes pre-commit. However, there are some
> > new comments (or ones
> > that address the same lines, but in different ways). Is this mail to
> > log ~ what changes were done,
> > or is there anything actionable here on my side?
>
> Ah no, it's just that I had it marked as unread for some reason, so I thought I
> had forgotten to respond (and I forgot that I had applied it).  :-)
>
> So, no action required.
>
> Regarding different suggestions, heh, it demonstrates that it's not exactly
> deterministic :P
>

Heh -- no worries :) Thanks for following up!

> Cheers,
>
> Alex
>
> P.S.: Do you know if I have anything missing from you or any of your collegues?

At least on my part, I think you've taken all my patches (with help &
edits -- thank you!). I can't speak for anyone else at Google, however
(though, just a very hasty cross reference between git log and
lore.kernel.org/linux-man seems to indicate patches sent from
*@google.com since man-pages-6.00 have previously made it into
man-pages-6.01, and nothing afterwards).

Have a great rest of your weekend,

Best,
Zach

>
> >
> > Best,
> > Zach
> >
> > Thanks for this.
> >> Cheers,
> >>
> >> Alex
> >>
> >>> ---
> >>>    man2/madvise.2         | 90 +++++++++++++++++++++++++++++++++++++++++-
> >>>    man2/process_madvise.2 | 10 +++++
> >>>    2 files changed, 98 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/man2/madvise.2 b/man2/madvise.2
> >>> index df3413cc8..b03fc731d 100644
> >>> --- a/man2/madvise.2
> >>> +++ b/man2/madvise.2
> >>> @@ -385,9 +385,10 @@ set (see
> >>>    .BR prctl (2) ).
> >>>    .IP
> >>>    The
> >>> -.B MADV_HUGEPAGE
> >>> +.BR MADV_HUGEPAGE ,
> >>> +.BR MADV_NOHUGEPAGE ,
> >>>    and
> >>> -.B MADV_NOHUGEPAGE
> >>> +.B MADV_COLLAPSE
> >>>    operations are available only if the kernel was configured with
> >>>    .B CONFIG_TRANSPARENT_HUGEPAGE
> >>>    and file/shmem memory is only supported if the kernel was configured with
> >>> @@ -400,6 +401,81 @@ and
> >>>    .I length
> >>>    will not be backed by transparent hugepages.
> >>>    .TP
> >>> +.BR MADV_COLLAPSE " (since Linux 6.1)"
> >>> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> >>> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> >>> +Perform a best-effort synchronous collapse of the native pages mapped by the
> >>
> >> Please use semantic line breaks.  In this case, I'd break after "pages".
> >>
> >> man-pages(7):
> >>      Use semantic newlines
> >>          In  the source of a manual page, new sentences should be started on new
> >>          lines, long sentences should be split into lines at clause breaks (com‐
> >>          mas, semicolons, colons, and so on), and long clauses should  be  split
> >>          at  phrase  boundaries.   This convention, sometimes known as "semantic
> >>          newlines", makes it easier to see the effect of  patches,  which  often
> >>          operate at the level of individual sentences, clauses, or phrases.
> >>
> >>> +memory range into Transparent Huge Pages (THPs).
> >>> +.B MADV_COLLAPSE
> >>> +operates on the current state of memory of the calling process and makes no
> >>
> >> Here I'd break after "and".
> >>
> >>> +persistent changes or guarantees on how pages will be mapped,
> >>> +constructed,
> >>> +or faulted in the future.
> >>> +.IP
> >>> +.B MADV_COLLAPSE
> >>> +supports private anonymous pages (see
> >>> +.BR mmap (2)),
> >>> +shmem pages,
> >>> +and file-backed pages.
> >>> +See
> >>> +.B MADV_HUGEPAGE
> >>> +for general information on memory requirements for THP.
> >>> +If the range provided spans multiple VMAs,
> >>> +the semantics of the collapse over each VMA is independent from the others.
> >>> +If collapse of a given huge page-aligned/sized region fails,
> >>> +the operation may continue to attempt collapsing the remainder of the
> >>
> >> Break after "collapsing".
> >>
> >>> +specified memory.
> >>> +.B MADV_COLLAPSE
> >>> +will automatically clamp the provided range to be hugepage-aligned.
> >>> +.IP
> >>> +All non-resident pages covered by the range will first be
> >>
> >> Break after "range".
> >>
> >>> +swapped/faulted-in,
> >>> +before being copied onto a freshly allocated hugepage.
> >>> +If the native pages compose the same PTE-mapped hugepage,
> >>> +and are suitably aligned,
> >>> +allocation of a new hugepage may be elided and collapse may happen
> >>
> >> Break before or after "and".
> >>
> >>> +in-place.
> >>> +Unmapped pages will have their data directly initialized to 0 in the new
> >>
> >> Break after "0".
> >>
> >>> +hugepage.
> >>> +However,
> >>> +for every eligible hugepage-aligned/sized region to be collapsed,
> >>> +at least one page must currently be backed by physical memory.
> >>> +.IP
> >>> +.BR MADV_COLLAPSE
> >>
> >> s/BR/B/
> >>
> >>> +is independent of any sysfs
> >>> +(see
> >>> +.BR sysfs (5))
> >>> +setting under
> >>> +.IR /sys/kernel/mm/transparent_hugepage ,
> >>> +both in terms of determining THP eligibility,
> >>> +and allocation semantics.
> >>> +See Linux kernel source file
> >>> +.I Documentation/admin\-guide/mm/transhuge.rst
> >>> +for more information.
> >>> +.BR MADV_COLLAPSE
> >>
> >> s/BR/B/
> >>
> >>> +also ignores
> >>> +.B huge=
> >>> +tmpfs mount when operating on tmpfs files.
> >>> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> >>> +regardless of VMA flags
> >>> +(though
> >>> +.BR VM_NOHUGEPAGE
> >>
> >> s/BR/B/
> >>
> >>> +is still respected).
> >>> +.IP
> >>> +When the system has multiple NUMA nodes,
> >>> +the hugepage will be allocated from the node providing the most native
> >>
> >> Break after "from".
> >>
> >>> +pages.
> >>> +.IP
> >>> +If all hugepage-sized/aligned regions covered by the provided range were
> >>
> >> Prefer English rather than "/".
> >>
> >>> +either successfully collapsed,
> >>> +or were already PMD-mapped THPs,
> >>> +this operation will be deemed successful.
> >>> +Note that this doesn’t guarantee anything about other possible mappings of
> >>
> >> Break after "about".
> >>
> >>> +the memory.
> >>> +Also note that many failures might have occurred since the operation may
> >>> +continue to collapse in the event collapse of a single hugepage-sized/aligned
> >>
> >> Add some omitted "that" or something that will help readability to
> >> non-native-English readers.
> >>
> >> And break at a better place.
> >>
> >>> +region fails.
> >>> +.TP
> >>>    .BR MADV_DONTDUMP " (since Linux 3.4)"
> >>>    .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
> >>>    .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> >>> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
> >>>    .B EBADF
> >>>    The map exists, but the area maps something that isn't a file.
> >>>    .TP
> >>> +.B EBUSY
> >>> +(for
> >>> +.BR MADV_COLLAPSE )
> >>> +Could not charge hugepage to cgroup: cgroup limit exceeded.
> >>> +.TP
> >>>    .B EFAULT
> >>>    .I advice
> >>>    is
> >>> @@ -716,6 +797,11 @@ maximum resident set size.
> >>>    Not enough memory: paging in failed.
> >>>    .TP
> >>>    .B ENOMEM
> >>> +(for
> >>> +.BR MADV_COLLAPSE )
> >>> +Not enough memory: could not allocate hugepage.
> >>> +.TP
> >>> +.B ENOMEM
> >>>    Addresses in the specified range are not currently
> >>>    mapped, or are outside the address space of the process.
> >>>    .TP
> >>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> >>> index 44d3b94e8..8b0ddccdd 100644
> >>> --- a/man2/process_madvise.2
> >>> +++ b/man2/process_madvise.2
> >>> @@ -73,6 +73,10 @@ argument is one of the following values:
> >>>    See
> >>>    .BR madvise (2).
> >>>    .TP
> >>> +.B MADV_COLLAPSE
> >>> +See
> >>> +.BR madvise (2).
> >>> +.TP
> >>>    .B MADV_PAGEOUT
> >>>    See
> >>>    .BR madvise (2).
> >>> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process
> >>>    .TP
> >>>    .B ESRCH
> >>>    The target process does not exist (i.e., it has terminated and been waited on).
> >>> +.PP
> >>> +See
> >>> +.BR madvise (2)
> >>> +for
> >>> +.IR advice -specific
> >>> +errors.
> >>>    .SH VERSIONS
> >>>    This system call first appeared in Linux 5.10.
> >>>    .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-12-11 22:37         ` Zach O'Keefe
@ 2022-12-11 22:39           ` Alejandro Colomar
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-11 22:39 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 858 bytes --]

Hi Zach!

On 12/11/22 23:37, Zach O'Keefe wrote:
> Heh -- no worries :) Thanks for following up!

:)

> 
>> Cheers,
>>
>> Alex
>>
>> P.S.: Do you know if I have anything missing from you or any of your collegues?
> 
> At least on my part, I think you've taken all my patches (with help &
> edits -- thank you!). I can't speak for anyone else at Google, however
> (though, just a very hasty cross reference between git log and
> lore.kernel.org/linux-man seems to indicate patches sent from
> *@google.com since man-pages-6.00 have previously made it into
> man-pages-6.01, and nothing afterwards).

Makes sense.  6.01 is very recent, and I don't remember any patches since then.


> 
> Have a great rest of your weekend,

Have a nice weekend!

Cheers,

Alex
> 
> Best,
> Zach
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-11 22:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 22:32 [PATCH man-pages v3 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
2022-10-21 22:32 ` [PATCH man-pages v3 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
2022-10-30 11:41   ` Alejandro Colomar
2022-10-31 16:33     ` Zach O'Keefe
2022-10-31 20:11       ` Alejandro Colomar
2022-10-21 22:32 ` [PATCH man-pages v3 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
2022-10-30 11:44   ` Alejandro Colomar
2022-10-31 16:33     ` Zach O'Keefe
2022-10-31 20:21       ` Alejandro Colomar
2022-10-31 21:24         ` Zach O'Keefe
2022-10-21 22:32 ` [PATCH man-pages v3 3/4] process_madvise.2: fix capability and ptrace requirements Zach OKeefe
2022-10-30 11:50   ` Alejandro Colomar
2022-10-31 19:13     ` Suren Baghdasaryan
2022-10-31 20:24       ` Alejandro Colomar
2022-10-31 21:25         ` Zach O'Keefe
2022-10-21 22:33 ` [PATCH man-pages v3 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
2022-10-31 21:15   ` Alejandro Colomar
2022-10-31 23:00     ` Zach O'Keefe
2022-11-01  1:51     ` G. Branden Robinson
2022-11-01 12:12       ` Alejandro Colomar
2022-12-11 17:59   ` Alejandro Colomar
2022-12-11 21:51     ` Zach O'Keefe
2022-12-11 21:55       ` Alejandro Colomar
2022-12-11 22:37         ` Zach O'Keefe
2022-12-11 22:39           ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox