linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Add MADV_COLLAPSE documentation
@ 2022-10-17 17:55 Zach OKeefe
  2022-10-17 17:55 ` [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Zach OKeefe @ 2022-10-17 17:55 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man

From: Zach O'Keefe <zokeefe@google.com>

Hey Alex / Michael,

This series adds MADV_COLLAPSE (currently in 6.1-rc1 and expected for Linux 6.1)
documentation to madvise(2) and process_madvise(2).  A few prerequisite patches
are included to fix up existing MADV_HUGEPAGE and process_madvise(2)
documentation, as well as add some additional clarity for madvise(2) "advice"
probing.

Series applies on man-pages-6.00, and though I've attempted to use semantic
newlines, I can't claim I've made the right choices everywhere.

Thanks,
Zach


Zach O'Keefe (4):
  madvise.2: update THP file/shmem documentation for +5.4
  madvise.2: document reliable probe for advice support
  process_madvise.2: CAP_SYS_ADMIN cleanup
  madvise.2: add documentation for MADV_COLLAPSE

 man2/madvise.2         | 135 +++++++++++++++++++++++++++++++++++++++--
 man2/process_madvise.2 |  13 +++-
 2 files changed, 142 insertions(+), 6 deletions(-)

-- 
2.38.0.413.g74048e4d9e-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-17 17:55 [PATCH 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
@ 2022-10-17 17:55 ` Zach OKeefe
  2022-10-18 10:32   ` Alex Colomar
  2022-10-17 17:55 ` [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Zach OKeefe @ 2022-10-17 17:55 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man

From: Zach O'Keefe <zokeefe@google.com>

Since Linux 5.4, Transparent Huge Pages now support both file-backed
memory and shmem memory. Update MADV_HUGEPAGE advice description to
reflect this.

Additionally, expand the description of requirements for memory to be
considered eligible for THP: alignment / mapping requirements, VMA
flags, prctl(2) settings, inode status, etc.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2 | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/man2/madvise.2 b/man2/madvise.2
index 81cce56af..e14e0f7fb 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
 .I addr
 and
 .IR length .
-Currently, Transparent Huge Pages work only with private anonymous pages (see
-.BR mmap (2)).
 The kernel will regularly scan the areas marked as huge page candidates
 to replace them with huge pages.
 The kernel will also allocate huge pages directly when the region is
@@ -354,12 +352,45 @@ an access pattern that the developer knows in advance won't risk
 to increase the memory footprint of the application when transparent
 hugepages are enabled.
 .IP
+.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
+Since Linux 5.4,
+automatic scan of eligible areas and replacement by huge pages works with
+private anonymous pages (see
+.BR mmap (2)),
+shmem-backed pages (including tmpfs (see
+.BR tmpfs (5)),
+and file-backed pages.
+For all memory types,
+memory may only be replaced by huge pages on hugepage-aligned boundaries.
+For file-mapped memory (including tmpfs) the mapping must also be naturally
+hugepage-aligned within the file.
+Additionally,
+for file-backed (not tmpfs) memory,
+the file must not be open for write and the mapping must be executable.
+.IP
+The VMA must not be marked
+.BR VM_NOHUGEPAGE ,
+.BR VM_HUGETLB ,
+.BR VM_IO ,
+.BR VM_DONTEXPAND ,
+.BR VM_MIXEDMAP ,
+or
+.BR VM_PFNMAP ,
+nor can it be stack memory or backed by a DAX-enabled device
+(unless the DAX device is hot-plugged as System RAM).
+The process must also not have
+.B PR_SET_THP_DISABLE
+set (see
+.BR prctl (2) ).
+.IP
 The
 .B MADV_HUGEPAGE
 and
 .B MADV_NOHUGEPAGE
 operations are available only if the kernel was configured with
-.BR CONFIG_TRANSPARENT_HUGEPAGE .
+.B CONFIG_TRANSPARENT_HUGEPAGE
+and file/shmem memory is only supported if the kernel was configured with
+.BR CONFIG_READ_ONLY_THP_FOR_FS .
 .TP
 .BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
 Ensures that memory in the address range specified by
-- 
2.38.0.413.g74048e4d9e-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support
  2022-10-17 17:55 [PATCH 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
  2022-10-17 17:55 ` [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
@ 2022-10-17 17:55 ` Zach OKeefe
  2022-10-18 10:36   ` Alex Colomar
  2022-10-17 17:55 ` [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
  2022-10-17 17:55 ` [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
  3 siblings, 1 reply; 15+ messages in thread
From: Zach OKeefe @ 2022-10-17 17:55 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man

From: Zach O'Keefe <zokeefe@google.com>

EINVAL is an overloaded error code for madvise(2) and it's not clear
under what context it means "advice is not valid" vs another error.

Explicitly document that madvise(0, 0, advice) can reliably be used to
probe for kernel support for "advice", returning zero iff "advice" is
supported by the kernel.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2 | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/man2/madvise.2 b/man2/madvise.2
index e14e0f7fb..adfe24c24 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -789,6 +789,13 @@ that are not mapped, the Linux version of
 ignores them and applies the call to the rest (but returns
 .B ENOMEM
 from the system call, as it should).
+.PP
+.BR madvise (0,
+0,
+.IR advice )
+will return zero iff
+.I advice
+is supported by the kernel and can be relied on to probe for support.
 .\" .SH HISTORY
 .\" The
 .\" .BR madvise ()
-- 
2.38.0.413.g74048e4d9e-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
  2022-10-17 17:55 [PATCH 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
  2022-10-17 17:55 ` [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
  2022-10-17 17:55 ` [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
@ 2022-10-17 17:55 ` Zach OKeefe
  2022-10-18 10:38   ` Alex Colomar
  2022-10-17 17:55 ` [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
  3 siblings, 1 reply; 15+ messages in thread
From: Zach OKeefe @ 2022-10-17 17:55 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man

From: Zach O'Keefe <zokeefe@google.com>

Since commit 96cfe2c0fd23 ("mm/madvise: replace ptrace attach
requirement for process_madvise"), process_madvise(2) has only
required CAP_SYS_NICE capability.  Update the man page to reflect this.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/process_madvise.2 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 6208206e4..7bee1a098 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -113,7 +113,8 @@ check (see
 in addition,
 because of the performance implications of applying the advice,
 the caller must have the
-.B CAP_SYS_ADMIN
+.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
+.B CAP_SYS_NICE
 capability.
 .SH RETURN VALUE
 On success,
-- 
2.38.0.413.g74048e4d9e-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-17 17:55 [PATCH 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
                   ` (2 preceding siblings ...)
  2022-10-17 17:55 ` [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
@ 2022-10-17 17:55 ` Zach OKeefe
  2022-10-18 10:47   ` Alex Colomar
  3 siblings, 1 reply; 15+ messages in thread
From: Zach OKeefe @ 2022-10-17 17:55 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man

From: Zach O'Keefe <zokeefe@google.com>

Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE").  Update the man-pages for madvise(2) and
process_madvise(2).

Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 man2/madvise.2         | 91 +++++++++++++++++++++++++++++++++++++++++-
 man2/process_madvise.2 | 10 +++++
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/man2/madvise.2 b/man2/madvise.2
index adfe24c24..7da44fac4 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -384,9 +384,10 @@ set (see
 .BR prctl (2) ).
 .IP
 The
-.B MADV_HUGEPAGE
-and
+.BR MADV_HUGEPAGE ,
 .B MADV_NOHUGEPAGE
+and
+.B MADV_COLLAPSE
 operations are available only if the kernel was configured with
 .B CONFIG_TRANSPARENT_HUGEPAGE
 and file/shmem memory is only supported if the kernel was configured with
@@ -399,6 +400,82 @@ and
 .I length
 will not be backed by transparent hugepages.
 .TP
+.BR MADV_COLLAPSE " (since Linux 6.1)"
+.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
+.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
+Perform a best-effort synchronous collapse of the native pages mapped by the
+memory range into Transparent Huge Pages (THPs).
+.B MADV_COLLAPSE
+operates on the current state of memory of the calling process and makes no
+persistent changes or guarantees on how pages will be mapped,
+constructed,
+or faulted in the future.
+.IP
+.B MADV_COLLAPSE
+supports private anonymous pages (see
+.BR mmap (2)),
+shmem-backed pages
+(including tmpfs (see
+.BR tmpfs (5)),
+and file-backed pages. See
+.B MADV_HUGEPAGE
+for general information on memory requirements for THP.
+If the range provided spans multiple VMAs,
+the semantics of the collapse over each VMA is independent from the others.
+If collapse of a given huge page-aligned/sized region fails,
+the operation may continue to attempt collapsing the remainder of the
+specified memory.
+.B MADV_COLLAPSE
+will automatically clamp the provided range to be hugepage-aligned.
+.IP
+All non-resident pages covered by the range will first be
+swapped/faulted-in,
+before being copied onto a freshly allocated hugepage.
+If the native pages compose the same PTE-mapped hugepage,
+and are suitably aligned,
+allocation of a new hugepage may be elided and collapse may happen
+in-place.
+Unmapped pages will have their data directly initialized to 0 in the new
+hugepage.
+However,
+for every eligible hugepage-aligned/sized region to be collapsed,
+at least one page must currently be backed by physical memory.
+.IP
+.BR MADV_COLLAPSE
+is independent of any sysfs
+(see
+.BR sysfs (5) )
+setting under
+.IR /sys/kernel/mm/transparent_hugepage ,
+both in terms of determining THP eligibility,
+and allocation semantics.
+See Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more information.
+.BR MADV_COLLAPSE
+also ignores
+.B huge=
+tmpfs mount when operating on tmpfs files.
+Allocation for the new hugepage may enter direct reclaim and/or compaction,
+regardless of VMA flags
+(though
+.BR VM_NOHUGEPAGE
+is still respected).
+.IP
+When the system has multiple NUMA nodes,
+the hugepage will be allocated from the node providing the most native
+pages.
+.IP
+If all hugepage-sized/aligned regions covered by the provided range were
+either successfully collapsed,
+or were already PMD-mapped THPs,
+this operation will be deemed successful.
+Note that this doesn’t guarantee anything about other possible mappings of
+the memory.
+Also note that many failures might have occurred since the operation may
+continue to collapse in the event collapse of a single hugepage-sized/aligned
+region fails.
+.TP
 .BR MADV_DONTDUMP " (since Linux 3.4)"
 .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
 .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
@@ -618,6 +695,11 @@ A kernel resource was temporarily unavailable.
 .B EBADF
 The map exists, but the area maps something that isn't a file.
 .TP
+.B EBUSY
+(for
+.BR MADV_COLLAPSE )
+Could not charge hugepage to cgroup: cgroup limit exceeded.
+.TP
 .B EFAULT
 .I advice
 is
@@ -715,6 +797,11 @@ maximum resident set size.
 Not enough memory: paging in failed.
 .TP
 .B ENOMEM
+(for
+.BR MADV_COLLAPSE )
+Not enough memory: could not allocate hugepage.
+.TP
+.B ENOMEM
 Addresses in the specified range are not currently
 mapped, or are outside the address space of the process.
 .TP
diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 7bee1a098..900210106 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -73,6 +73,10 @@ argument is one of the following values:
 See
 .BR madvise (2).
 .TP
+.B MADV_COLLAPSE
+See
+.BR madvise (2).
+.TP
 .B MADV_PAGEOUT
 See
 .BR madvise (2).
@@ -170,6 +174,12 @@ The caller does not have permission to access the address space of the process
 .TP
 .B ESRCH
 The target process does not exist (i.e., it has terminated and been waited on).
+.PP
+See
+.BR madvise (2)
+for
+.IR advice -specific
+errors.
 .SH VERSIONS
 This system call first appeared in Linux 5.10.
 .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
-- 
2.38.0.413.g74048e4d9e-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-17 17:55 ` [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
@ 2022-10-18 10:32   ` Alex Colomar
  2022-10-18 16:52     ` Zach O'Keefe
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Colomar @ 2022-10-18 10:32 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2308 bytes --]

Hi Zach,

On 10/17/22 19:55, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Since Linux 5.4, Transparent Huge Pages now support both file-backed
> memory and shmem memory. Update MADV_HUGEPAGE advice description to
> reflect this.
> 
> Additionally, expand the description of requirements for memory to be
> considered eligible for THP: alignment / mapping requirements, VMA
> flags, prctl(2) settings, inode status, etc.
> 
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
>   man2/madvise.2 | 37 ++++++++++++++++++++++++++++++++++---
>   1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index 81cce56af..e14e0f7fb 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
>   .I addr
>   and
>   .IR length .
> -Currently, Transparent Huge Pages work only with private anonymous pages (see
> -.BR mmap (2)).
>   The kernel will regularly scan the areas marked as huge page candidates
>   to replace them with huge pages.
>   The kernel will also allocate huge pages directly when the region is
> @@ -354,12 +352,45 @@ an access pattern that the developer knows in advance won't risk
>   to increase the memory footprint of the application when transparent
>   hugepages are enabled.
>   .IP
> +.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
> +Since Linux 5.4,
> +automatic scan of eligible areas and replacement by huge pages works with
> +private anonymous pages (see
> +.BR mmap (2)),
> +shmem-backed pages (including tmpfs (see
> +.BR tmpfs (5)),

I think here's missing a closing parenthesis.  Please check.  Maybe you 
prefer combining em dashes or commas with parentheses to improve 
visually discerning subclauses.

Cheers,
Alex

> +and file-backed pages.
> +For all memory types,
> +memory may only be replaced by huge pages on hugepage-aligned boundaries.
> +For file-mapped memory (including tmpfs) the mapping must also be naturally
> +hugepage-aligned within the file.
> +Additionally,
> +for file-backed (not tmpfs) memory,
> +the file must not be open for write and the mapping must be executable.
> +.IP

[...]

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support
  2022-10-17 17:55 ` [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
@ 2022-10-18 10:36   ` Alex Colomar
  2022-10-18 17:53     ` Zach O'Keefe
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Colomar @ 2022-10-18 10:36 UTC (permalink / raw)
  To: Zach OKeefe, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1562 bytes --]

Hi Zach,

On 10/17/22 19:55, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> EINVAL is an overloaded error code for madvise(2) and it's not clear
> under what context it means "advice is not valid" vs another error.
> 
> Explicitly document that madvise(0, 0, advice) can reliably be used to
> probe for kernel support for "advice", returning zero iff "advice" is
> supported by the kernel.
> 
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
>   man2/madvise.2 | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index e14e0f7fb..adfe24c24 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -789,6 +789,13 @@ that are not mapped, the Linux version of
>   ignores them and applies the call to the rest (but returns
>   .B ENOMEM
>   from the system call, as it should).
> +.PP
> +.BR madvise (0,
> +0,
> +.IR advice )

For expressions, we don't follow the same highlighting rules as in 
identifiers and man-page references.  Instead we use all italics.  See 
man-pages(7):

        Expressions, if not written on a separate indented  line,
        should  be  specified in italics.  Again, the use of non‐
        breaking spaces may be appropriate if the  expression  is
        inlined with normal text.

Cheers,
Alex

> +will return zero iff
> +.I advice
> +is supported by the kernel and can be relied on to probe for support.
>   .\" .SH HISTORY
>   .\" The
>   .\" .BR madvise ()

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
  2022-10-17 17:55 ` [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
@ 2022-10-18 10:38   ` Alex Colomar
  2022-10-18 21:22     ` Zach O'Keefe
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Colomar @ 2022-10-18 10:38 UTC (permalink / raw)
  To: Zach OKeefe, Michael Kerrisk; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1110 bytes --]

Hi Zach,

On 10/17/22 19:55, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Since commit 96cfe2c0fd23 ("mm/madvise: replace ptrace attach
> requirement for process_madvise"), process_madvise(2) has only
> required CAP_SYS_NICE capability.  Update the man page to reflect this.
> 
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
>   man2/process_madvise.2 | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 6208206e4..7bee1a098 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -113,7 +113,8 @@ check (see
>   in addition,
>   because of the performance implications of applying the advice,
>   the caller must have the
> -.B CAP_SYS_ADMIN
> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> +.B CAP_SYS_NICE

Would it make sense to keep some parentheses specifying that in old 
kernels CAP_SYS_ADMIN was requiring instead?

Cheers,
Alex

>   capability.
>   .SH RETURN VALUE
>   On success,

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-17 17:55 ` [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
@ 2022-10-18 10:47   ` Alex Colomar
  2022-10-18 21:54     ` Zach O'Keefe
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Colomar @ 2022-10-18 10:47 UTC (permalink / raw)
  To: Zach OKeefe; +Cc: Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 6426 bytes --]

Hi Zach,

On 10/17/22 19:55, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
> 
> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> process_madvise(2).
> 
> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Please see some comments below.
There are a few more cases were I'd break the lines at different points, 
but there are few, so I'll apply them with an amend.

Thanks!

Alex

> ---
>   man2/madvise.2         | 91 +++++++++++++++++++++++++++++++++++++++++-
>   man2/process_madvise.2 | 10 +++++
>   2 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index adfe24c24..7da44fac4 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -384,9 +384,10 @@ set (see
>   .BR prctl (2) ).
>   .IP
>   The
> -.B MADV_HUGEPAGE
> -and
> +.BR MADV_HUGEPAGE ,
>   .B MADV_NOHUGEPAGE

Please add a comma before 'and' (Oxford comma).

> +and
> +.B MADV_COLLAPSE
>   operations are available only if the kernel was configured with
>   .B CONFIG_TRANSPARENT_HUGEPAGE
>   and file/shmem memory is only supported if the kernel was configured with
> @@ -399,6 +400,82 @@ and
>   .I length
>   will not be backed by transparent hugepages.
>   .TP
> +.BR MADV_COLLAPSE " (since Linux 6.1)"
> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
> +Perform a best-effort synchronous collapse of the native pages mapped by the
> +memory range into Transparent Huge Pages (THPs).
> +.B MADV_COLLAPSE
> +operates on the current state of memory of the calling process and makes no
> +persistent changes or guarantees on how pages will be mapped,
> +constructed,
> +or faulted in the future.
> +.IP
> +.B MADV_COLLAPSE
> +supports private anonymous pages (see
> +.BR mmap (2)),
> +shmem-backed pages
> +(including tmpfs (see
> +.BR tmpfs (5)),

s/))/)))/

probably, but maybe you want to reword using commas or em dashes. 
Please check.

> +and file-backed pages. See

No continuation after '.'.  :)

> +.B MADV_HUGEPAGE
> +for general information on memory requirements for THP.
> +If the range provided spans multiple VMAs,
> +the semantics of the collapse over each VMA is independent from the others.
> +If collapse of a given huge page-aligned/sized region fails,
> +the operation may continue to attempt collapsing the remainder of the
> +specified memory.
> +.B MADV_COLLAPSE
> +will automatically clamp the provided range to be hugepage-aligned.
> +.IP
> +All non-resident pages covered by the range will first be
> +swapped/faulted-in,
> +before being copied onto a freshly allocated hugepage.
> +If the native pages compose the same PTE-mapped hugepage,
> +and are suitably aligned,
> +allocation of a new hugepage may be elided and collapse may happen
> +in-place.
> +Unmapped pages will have their data directly initialized to 0 in the new
> +hugepage.
> +However,
> +for every eligible hugepage-aligned/sized region to be collapsed,
> +at least one page must currently be backed by physical memory.
> +.IP
> +.BR MADV_COLLAPSE
> +is independent of any sysfs
> +(see
> +.BR sysfs (5) )

s/) )/))/

> +setting under
> +.IR /sys/kernel/mm/transparent_hugepage ,
> +both in terms of determining THP eligibility,
> +and allocation semantics.
> +See Linux kernel source file
> +.I Documentation/admin\-guide/mm/transhuge.rst
> +for more information.
> +.BR MADV_COLLAPSE
> +also ignores
> +.B huge=
> +tmpfs mount when operating on tmpfs files.
> +Allocation for the new hugepage may enter direct reclaim and/or compaction,
> +regardless of VMA flags
> +(though
> +.BR VM_NOHUGEPAGE
> +is still respected).
> +.IP
> +When the system has multiple NUMA nodes,
> +the hugepage will be allocated from the node providing the most native
> +pages.
> +.IP
> +If all hugepage-sized/aligned regions covered by the provided range were
> +either successfully collapsed,
> +or were already PMD-mapped THPs,
> +this operation will be deemed successful.
> +Note that this doesn’t guarantee anything about other possible mappings of
> +the memory.
> +Also note that many failures might have occurred since the operation may
> +continue to collapse in the event collapse of a single hugepage-sized/aligned
> +region fails.
> +.TP
>   .BR MADV_DONTDUMP " (since Linux 3.4)"
>   .\" commit 909af768e88867016f427264ae39d27a57b6a8ed
>   .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
> @@ -618,6 +695,11 @@ A kernel resource was temporarily unavailable.
>   .B EBADF
>   The map exists, but the area maps something that isn't a file.
>   .TP
> +.B EBUSY
> +(for
> +.BR MADV_COLLAPSE )
> +Could not charge hugepage to cgroup: cgroup limit exceeded.
> +.TP
>   .B EFAULT
>   .I advice
>   is
> @@ -715,6 +797,11 @@ maximum resident set size.
>   Not enough memory: paging in failed.
>   .TP
>   .B ENOMEM
> +(for
> +.BR MADV_COLLAPSE )
> +Not enough memory: could not allocate hugepage.
> +.TP
> +.B ENOMEM
>   Addresses in the specified range are not currently
>   mapped, or are outside the address space of the process.
>   .TP
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 7bee1a098..900210106 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -73,6 +73,10 @@ argument is one of the following values:
>   See
>   .BR madvise (2).
>   .TP
> +.B MADV_COLLAPSE
> +See
> +.BR madvise (2).
> +.TP
>   .B MADV_PAGEOUT
>   See
>   .BR madvise (2).
> @@ -170,6 +174,12 @@ The caller does not have permission to access the address space of the process
>   .TP
>   .B ESRCH
>   The target process does not exist (i.e., it has terminated and been waited on).
> +.PP
> +See
> +.BR madvise (2)
> +for
> +.IR advice -specific
> +errors.
>   .SH VERSIONS
>   This system call first appeared in Linux 5.10.
>   .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc

-- 
<http://www.alejandro-colomar.es/>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4
  2022-10-18 10:32   ` Alex Colomar
@ 2022-10-18 16:52     ` Zach O'Keefe
  0 siblings, 0 replies; 15+ messages in thread
From: Zach O'Keefe @ 2022-10-18 16:52 UTC (permalink / raw)
  To: Alex Colomar; +Cc: Yang Shi, linux-mm, linux-man

Hey Alex, thanks for taking a look!

> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index 81cce56af..e14e0f7fb 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
> >   .I addr
> >   and
> >   .IR length .
> > -Currently, Transparent Huge Pages work only with private anonymous pages (see
> > -.BR mmap (2)).
> >   The kernel will regularly scan the areas marked as huge page candidates
> >   to replace them with huge pages.
> >   The kernel will also allocate huge pages directly when the region is
> > @@ -354,12 +352,45 @@ an access pattern that the developer knows in advance won't risk
> >   to increase the memory footprint of the application when transparent
> >   hugepages are enabled.
> >   .IP
> > +.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
> > +Since Linux 5.4,
> > +automatic scan of eligible areas and replacement by huge pages works with
> > +private anonymous pages (see
> > +.BR mmap (2)),
> > +shmem-backed pages (including tmpfs (see
> > +.BR tmpfs (5)),
>
> I think here's missing a closing parenthesis.  Please check.  Maybe you
> prefer combining em dashes or commas with parentheses to improve
> visually discerning subclauses.

Not only a good catch on the parenthesis, but I didn't actually know
about the "em dash" previously (though I'm a frequent user of it in
spirit). Thank you! With that in-hand, there are few places that could
be cleaned up to avoid nested paracentesis.

Best,
Zach


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support
  2022-10-18 10:36   ` Alex Colomar
@ 2022-10-18 17:53     ` Zach O'Keefe
  2022-10-18 18:04       ` Alejandro Colomar
  0 siblings, 1 reply; 15+ messages in thread
From: Zach O'Keefe @ 2022-10-18 17:53 UTC (permalink / raw)
  To: Alex Colomar; +Cc: Michael Kerrisk, Yang Shi, linux-mm, linux-man

Hey Alex,

> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index e14e0f7fb..adfe24c24 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -789,6 +789,13 @@ that are not mapped, the Linux version of
> >   ignores them and applies the call to the rest (but returns
> >   .B ENOMEM
> >   from the system call, as it should).
> > +.PP
> > +.BR madvise (0,
> > +0,
> > +.IR advice )
>
> For expressions, we don't follow the same highlighting rules as in
> identifiers and man-page references.  Instead we use all italics.  See
> man-pages(7):
>
>         Expressions, if not written on a separate indented  line,
>         should  be  specified in italics.  Again, the use of non‐
>         breaking spaces may be appropriate if the  expression  is
>         inlined with normal text.

Just to confirm, by "expression", you mean "madvise(0, 0, advice)"? If
so, to be consistent with the other note, perhaps best to break this
into a phrase such as:

--8<---
.BR madvise ()
called with zero for both
.IR addr
and
.IR length
will return zero iff
.I advice
is supported by the kernel and can be relied on to probe for support.
--8<---

Thanks,
Zach


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support
  2022-10-18 17:53     ` Zach O'Keefe
@ 2022-10-18 18:04       ` Alejandro Colomar
  2022-10-18 18:30         ` Zach O'Keefe
  0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Colomar @ 2022-10-18 18:04 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Michael Kerrisk, Yang Shi, linux-mm, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1446 bytes --]

Hey Zach!

On 10/18/22 19:53, Zach O'Keefe wrote:
> Hey Alex,
> 
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index e14e0f7fb..adfe24c24 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -789,6 +789,13 @@ that are not mapped, the Linux version of
>>>    ignores them and applies the call to the rest (but returns
>>>    .B ENOMEM
>>>    from the system call, as it should).
>>> +.PP
>>> +.BR madvise (0,
>>> +0,
>>> +.IR advice )
>>
>> For expressions, we don't follow the same highlighting rules as in
>> identifiers and man-page references.  Instead we use all italics.  See
>> man-pages(7):
>>
>>          Expressions, if not written on a separate indented  line,
>>          should  be  specified in italics.  Again, the use of non‐
>>          breaking spaces may be appropriate if the  expression  is
>>          inlined with normal text.
> 
> Just to confirm, by "expression", you mean "madvise(0, 0, advice)"?

Yes, I meant that.

> If
> so, to be consistent with the other note, perhaps best to break this
> into a phrase such as:
> 
> --8<---
> .BR madvise ()
> called with zero for both
> .IR addr
> and
> .IR length
> will return zero iff
> .I advice
> is supported by the kernel and can be relied on to probe for support.
> --8<---

I think the C expression was more readable.

Cheers,
Alex

> 
> Thanks,
> Zach

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support
  2022-10-18 18:04       ` Alejandro Colomar
@ 2022-10-18 18:30         ` Zach O'Keefe
  0 siblings, 0 replies; 15+ messages in thread
From: Zach O'Keefe @ 2022-10-18 18:30 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Michael Kerrisk, Yang Shi, linux-mm, linux-man

On Tue, Oct 18, 2022 at 11:04 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> Hey Zach!
>
> On 10/18/22 19:53, Zach O'Keefe wrote:
> > Hey Alex,
> >
> >>> diff --git a/man2/madvise.2 b/man2/madvise.2
> >>> index e14e0f7fb..adfe24c24 100644
> >>> --- a/man2/madvise.2
> >>> +++ b/man2/madvise.2
> >>> @@ -789,6 +789,13 @@ that are not mapped, the Linux version of
> >>>    ignores them and applies the call to the rest (but returns
> >>>    .B ENOMEM
> >>>    from the system call, as it should).
> >>> +.PP
> >>> +.BR madvise (0,
> >>> +0,
> >>> +.IR advice )
> >>
> >> For expressions, we don't follow the same highlighting rules as in
> >> identifiers and man-page references.  Instead we use all italics.  See
> >> man-pages(7):
> >>
> >>          Expressions, if not written on a separate indented  line,
> >>          should  be  specified in italics.  Again, the use of non‐
> >>          breaking spaces may be appropriate if the  expression  is
> >>          inlined with normal text.
> >
> > Just to confirm, by "expression", you mean "madvise(0, 0, advice)"?
>
> Yes, I meant that.
>
> > If
> > so, to be consistent with the other note, perhaps best to break this
> > into a phrase such as:
> >
> > --8<---
> > .BR madvise ()
> > called with zero for both
> > .IR addr
> > and
> > .IR length
> > will return zero iff
> > .I advice
> > is supported by the kernel and can be relied on to probe for support.
> > --8<---
>
> I think the C expression was more readable.

SGTM - will update for v2.  Appreciate your help here!

Best,
Zach


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
  2022-10-18 10:38   ` Alex Colomar
@ 2022-10-18 21:22     ` Zach O'Keefe
  0 siblings, 0 replies; 15+ messages in thread
From: Zach O'Keefe @ 2022-10-18 21:22 UTC (permalink / raw)
  To: Alex Colomar
  Cc: Michael Kerrisk, Yang Shi, linux-mm, linux-man, Suren Baghdasaryan

Hey Alex,

On Tue, Oct 18, 2022 at 3:38 AM Alex Colomar <alx.manpages@gmail.com> wrote:
>
> Hi Zach,
>
> On 10/17/22 19:55, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Since commit 96cfe2c0fd23 ("mm/madvise: replace ptrace attach
> > requirement for process_madvise"), process_madvise(2) has only
> > required CAP_SYS_NICE capability.  Update the man page to reflect this.
> >
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > ---
> >   man2/process_madvise.2 | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> > index 6208206e4..7bee1a098 100644
> > --- a/man2/process_madvise.2
> > +++ b/man2/process_madvise.2
> > @@ -113,7 +113,8 @@ check (see
> >   in addition,
> >   because of the performance implications of applying the advice,
> >   the caller must have the
> > -.B CAP_SYS_ADMIN
> > +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> > +.B CAP_SYS_NICE
>
> Would it make sense to keep some parentheses specifying that in old
> kernels CAP_SYS_ADMIN was requiring instead?

Thanks for the suggestion. I've chatted with Suren on this
(process_madvise(2) author and contributor of initial
process_madvise(2) man-page) and the initial CAP_SYS_ADMIN appears to
have been a mistake; it was CAP_SYS_NICE from the beginning. I'll
reword the commit description and resend it as part of v2.

Thanks,
Zach

> Cheers,
> Alex
>
> >   capability.
> >   .SH RETURN VALUE
> >   On success,
>
> --
> <http://www.alejandro-colomar.es/>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE
  2022-10-18 10:47   ` Alex Colomar
@ 2022-10-18 21:54     ` Zach O'Keefe
  0 siblings, 0 replies; 15+ messages in thread
From: Zach O'Keefe @ 2022-10-18 21:54 UTC (permalink / raw)
  To: Alex Colomar; +Cc: Yang Shi, linux-mm, linux-man

Hey Alex,

On Tue, Oct 18, 2022 at 3:47 AM Alex Colomar <alx.manpages@gmail.com> wrote:
>
> Hi Zach,
>
> On 10/17/22 19:55, Zach OKeefe wrote:
> > From: Zach O'Keefe <zokeefe@google.com>
> >
> > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
> > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
> > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
> > MADV_COLLAPSE").  Update the man-pages for madvise(2) and
> > process_madvise(2).
> >
> > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
> > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>
> Please see some comments below.
> There are a few more cases were I'd break the lines at different points,
> but there are few, so I'll apply them with an amend.
>
> Thanks!
>
> Alex

Thank you :) Greatly appreciated. I'll take a look at the patch
post-amend to see what I could have done. All the mentioned fixes
(thanks for pointing them out) will be included in v2.

Best,
Zach


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-18 21:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17 17:55 [PATCH 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
2022-10-17 17:55 ` [PATCH man-pages 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
2022-10-18 10:32   ` Alex Colomar
2022-10-18 16:52     ` Zach O'Keefe
2022-10-17 17:55 ` [PATCH man-pages 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
2022-10-18 10:36   ` Alex Colomar
2022-10-18 17:53     ` Zach O'Keefe
2022-10-18 18:04       ` Alejandro Colomar
2022-10-18 18:30         ` Zach O'Keefe
2022-10-17 17:55 ` [PATCH man-pages 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
2022-10-18 10:38   ` Alex Colomar
2022-10-18 21:22     ` Zach O'Keefe
2022-10-17 17:55 ` [PATCH man-pages 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
2022-10-18 10:47   ` Alex Colomar
2022-10-18 21:54     ` Zach O'Keefe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox