* [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation
@ 2022-10-18 23:50 Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Zach OKeefe @ 2022-10-18 23:50 UTC (permalink / raw)
To: Alejandro Colomar, Michael Kerrisk
Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe
From: Zach O'Keefe <zokeefe@google.com>
v2 Forward
Changes from v1[1]:
- All patches have seen some reformatting and/or typo corrections
(Alex Colomar).
- Patch 3 has had the commit description reworded to reflect that this commit
is a "fix" rather than an "update" (Alex Colomar).
- Rebased on man-pages-6.01
[1] https://lore.kernel.org/linux-man/20221017175523.2048887-1-zokeefe@google.com/T/#m8e9e94ed52c99e7cf4969bd992492359c59a0faa
----------------------------------------------
Hey Alex / Michael,
This series adds MADV_COLLAPSE (expected for Linux 6.1) documentation to
madvise(2) and process_madvise(2). A few prerequisite patches are included to
fix up existing MADV_HUGEPAGE and process_madvise(2) documentation, as well as
add some additional clarity for madvise(2) "advice" probing.
Series applies on man-pages-6.00, and I've attempted to use semantic newlines,
though I can't claim I've made the right choices everywhere for long clauses.
Thanks,
Zach
Zach O'Keefe (4):
madvise.2: update THP file/shmem documentation for +5.4
madvise.2: document reliable probe for advice support
process_madvise.2: CAP_SYS_ADMIN cleanup
madvise.2: add documentation for MADV_COLLAPSE
man2/madvise.2 | 133 +++++++++++++++++++++++++++++++++++++++--
man2/process_madvise.2 | 13 +++-
2 files changed, 140 insertions(+), 6 deletions(-)
--
2.38.0.413.g74048e4d9e-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH man-pages v2 1/4] madvise.2: update THP file/shmem documentation for +5.4
2022-10-18 23:50 [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
@ 2022-10-18 23:50 ` Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Zach OKeefe @ 2022-10-18 23:50 UTC (permalink / raw)
To: Alejandro Colomar, Michael Kerrisk
Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe
From: Zach O'Keefe <zokeefe@google.com>
Since Linux 5.4, Transparent Huge Pages now support both file-backed
memory and shmem memory. Update MADV_HUGEPAGE advice description to
reflect this.
Additionally, expand the description of requirements for memory to be
considered eligible for THP: alignment / mapping requirements, VMA
flags, prctl(2) settings, inode status, etc.
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
man2/madvise.2 | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/man2/madvise.2 b/man2/madvise.2
index 81cce56af..64f788ace 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -320,8 +320,6 @@ Enable Transparent Huge Pages (THP) for pages in the range specified by
.I addr
and
.IR length .
-Currently, Transparent Huge Pages work only with private anonymous pages (see
-.BR mmap (2)).
The kernel will regularly scan the areas marked as huge page candidates
to replace them with huge pages.
The kernel will also allocate huge pages directly when the region is
@@ -354,12 +352,46 @@ an access pattern that the developer knows in advance won't risk
to increase the memory footprint of the application when transparent
hugepages are enabled.
.IP
+.\" commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
+Since Linux 5.4,
+automatic scan of eligible areas and replacement by huge pages works with
+private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+For all memory types,
+memory may only be replaced by huge pages on hugepage-aligned boundaries.
+For file-mapped memory \(em including tmpfs (see
+.BR tmpfs (2))
+\(em the mapping must also be naturally hugepage-aligned within the file.
+Additionally,
+for file-backed,
+non-tmpfs memory,
+the file must not be open for write and the mapping must be executable.
+.IP
+The VMA must not be marked
+.BR VM_NOHUGEPAGE ,
+.BR VM_HUGETLB ,
+.BR VM_IO ,
+.BR VM_DONTEXPAND ,
+.BR VM_MIXEDMAP ,
+or
+.BR VM_PFNMAP ,
+nor can it be stack memory or backed by a DAX-enabled device
+(unless the DAX device is hot-plugged as System RAM).
+The process must also not have
+.B PR_SET_THP_DISABLE
+set (see
+.BR prctl (2) ).
+.IP
The
.B MADV_HUGEPAGE
and
.B MADV_NOHUGEPAGE
operations are available only if the kernel was configured with
-.BR CONFIG_TRANSPARENT_HUGEPAGE .
+.B CONFIG_TRANSPARENT_HUGEPAGE
+and file/shmem memory is only supported if the kernel was configured with
+.BR CONFIG_READ_ONLY_THP_FOR_FS .
.TP
.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
Ensures that memory in the address range specified by
--
2.38.0.413.g74048e4d9e-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH man-pages v2 2/4] madvise.2: document reliable probe for advice support
2022-10-18 23:50 [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
@ 2022-10-18 23:50 ` Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
3 siblings, 0 replies; 10+ messages in thread
From: Zach OKeefe @ 2022-10-18 23:50 UTC (permalink / raw)
To: Alejandro Colomar, Michael Kerrisk
Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe
From: Zach O'Keefe <zokeefe@google.com>
EINVAL is an overloaded error code for madvise(2) and it's not clear
under what context it means "advice is not valid" vs another error.
Explicitly document that madvise(0, 0, advice) can reliably be used to
probe for kernel support for "advice", returning zero iff "advice" is
supported by the kernel.
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
man2/madvise.2 | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/man2/madvise.2 b/man2/madvise.2
index 64f788ace..df3413cc8 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -790,6 +790,11 @@ that are not mapped, the Linux version of
ignores them and applies the call to the rest (but returns
.B ENOMEM
from the system call, as it should).
+.PP
+.IR madvise(0,\ 0,\ advice)
+will return zero iff
+.I advice
+is supported by the kernel and can be relied on to probe for support.
.\" .SH HISTORY
.\" The
.\" .BR madvise ()
--
2.38.0.413.g74048e4d9e-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-18 23:50 [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
@ 2022-10-18 23:50 ` Zach OKeefe
2022-10-19 0:11 ` Suren Baghdasaryan
2022-10-21 12:37 ` Alejandro Colomar
2022-10-18 23:50 ` [PATCH man-pages v2 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
3 siblings, 2 replies; 10+ messages in thread
From: Zach OKeefe @ 2022-10-18 23:50 UTC (permalink / raw)
To: Alejandro Colomar, Michael Kerrisk
Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe,
Suren Baghdasaryan, Minchan Kim
From: Zach O'Keefe <zokeefe@google.com>
The initial commit of process_madvise(2) to man-pages project included
an error, indicating that CAP_SYS_ADMIN capability was required when, in
fact, CAP_SYS_NICE was the required capability.
The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
("mm/madvise: introduce process_madvise() syscall: an external memory
hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
attach requirement for process_madvise") which replaced this with a
combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
performance).
Correct this in the man-page for process_madvise(2).
Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
man2/process_madvise.2 | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 6208206e4..7bee1a098 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -113,7 +113,8 @@ check (see
in addition,
because of the performance implications of applying the advice,
the caller must have the
-.B CAP_SYS_ADMIN
+.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
+.B CAP_SYS_NICE
capability.
.SH RETURN VALUE
On success,
--
2.38.0.413.g74048e4d9e-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH man-pages v2 4/4] madvise.2: add documentation for MADV_COLLAPSE
2022-10-18 23:50 [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
` (2 preceding siblings ...)
2022-10-18 23:50 ` [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
@ 2022-10-18 23:50 ` Zach OKeefe
3 siblings, 0 replies; 10+ messages in thread
From: Zach OKeefe @ 2022-10-18 23:50 UTC (permalink / raw)
To: Alejandro Colomar, Michael Kerrisk
Cc: Yang Shi, linux-mm, linux-man, Zach O'Keefe
From: Zach O'Keefe <zokeefe@google.com>
Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545
("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and
upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE"). Update the man-pages for madvise(2) and
process_madvise(2).
Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/
Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++-
man2/process_madvise.2 | 10 +++++
2 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/man2/madvise.2 b/man2/madvise.2
index df3413cc8..b03fc731d 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -385,9 +385,10 @@ set (see
.BR prctl (2) ).
.IP
The
-.B MADV_HUGEPAGE
+.BR MADV_HUGEPAGE ,
+.BR MADV_NOHUGEPAGE ,
and
-.B MADV_NOHUGEPAGE
+.B MADV_COLLAPSE
operations are available only if the kernel was configured with
.B CONFIG_TRANSPARENT_HUGEPAGE
and file/shmem memory is only supported if the kernel was configured with
@@ -400,6 +401,81 @@ and
.I length
will not be backed by transparent hugepages.
.TP
+.BR MADV_COLLAPSE " (since Linux 6.1)"
+.\" commit 7d8faaf155454f8798ec56404faca29a82689c77
+.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321
+Perform a best-effort synchronous collapse of the native pages mapped by the
+memory range into Transparent Huge Pages (THPs).
+.B MADV_COLLAPSE
+operates on the current state of memory of the calling process and makes no
+persistent changes or guarantees on how pages will be mapped,
+constructed,
+or faulted in the future.
+.IP
+.B MADV_COLLAPSE
+supports private anonymous pages (see
+.BR mmap (2)),
+shmem pages,
+and file-backed pages.
+See
+.B MADV_HUGEPAGE
+for general information on memory requirements for THP.
+If the range provided spans multiple VMAs,
+the semantics of the collapse over each VMA is independent from the others.
+If collapse of a given huge page-aligned/sized region fails,
+the operation may continue to attempt collapsing the remainder of the
+specified memory.
+.B MADV_COLLAPSE
+will automatically clamp the provided range to be hugepage-aligned.
+.IP
+All non-resident pages covered by the range will first be
+swapped/faulted-in,
+before being copied onto a freshly allocated hugepage.
+If the native pages compose the same PTE-mapped hugepage,
+and are suitably aligned,
+allocation of a new hugepage may be elided and collapse may happen
+in-place.
+Unmapped pages will have their data directly initialized to 0 in the new
+hugepage.
+However,
+for every eligible hugepage-aligned/sized region to be collapsed,
+at least one page must currently be backed by physical memory.
+.IP
+.BR MADV_COLLAPSE
+is independent of any sysfs
+(see
+.BR sysfs (5))
+setting under
+.IR /sys/kernel/mm/transparent_hugepage ,
+both in terms of determining THP eligibility,
+and allocation semantics.
+See Linux kernel source file
+.I Documentation/admin\-guide/mm/transhuge.rst
+for more information.
+.BR MADV_COLLAPSE
+also ignores
+.B huge=
+tmpfs mount when operating on tmpfs files.
+Allocation for the new hugepage may enter direct reclaim and/or compaction,
+regardless of VMA flags
+(though
+.BR VM_NOHUGEPAGE
+is still respected).
+.IP
+When the system has multiple NUMA nodes,
+the hugepage will be allocated from the node providing the most native
+pages.
+.IP
+If all hugepage-sized/aligned regions covered by the provided range were
+either successfully collapsed,
+or were already PMD-mapped THPs,
+this operation will be deemed successful.
+Note that this doesn’t guarantee anything about other possible mappings of
+the memory.
+Also note that many failures might have occurred since the operation may
+continue to collapse in the event collapse of a single hugepage-sized/aligned
+region fails.
+.TP
.BR MADV_DONTDUMP " (since Linux 3.4)"
.\" commit 909af768e88867016f427264ae39d27a57b6a8ed
.\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
@@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable.
.B EBADF
The map exists, but the area maps something that isn't a file.
.TP
+.B EBUSY
+(for
+.BR MADV_COLLAPSE )
+Could not charge hugepage to cgroup: cgroup limit exceeded.
+.TP
.B EFAULT
.I advice
is
@@ -716,6 +797,11 @@ maximum resident set size.
Not enough memory: paging in failed.
.TP
.B ENOMEM
+(for
+.BR MADV_COLLAPSE )
+Not enough memory: could not allocate hugepage.
+.TP
+.B ENOMEM
Addresses in the specified range are not currently
mapped, or are outside the address space of the process.
.TP
diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
index 7bee1a098..900210106 100644
--- a/man2/process_madvise.2
+++ b/man2/process_madvise.2
@@ -73,6 +73,10 @@ argument is one of the following values:
See
.BR madvise (2).
.TP
+.B MADV_COLLAPSE
+See
+.BR madvise (2).
+.TP
.B MADV_PAGEOUT
See
.BR madvise (2).
@@ -170,6 +174,12 @@ The caller does not have permission to access the address space of the process
.TP
.B ESRCH
The target process does not exist (i.e., it has terminated and been waited on).
+.PP
+See
+.BR madvise (2)
+for
+.IR advice -specific
+errors.
.SH VERSIONS
This system call first appeared in Linux 5.10.
.\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
--
2.38.0.413.g74048e4d9e-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-18 23:50 ` [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
@ 2022-10-19 0:11 ` Suren Baghdasaryan
2022-10-21 12:37 ` Alejandro Colomar
1 sibling, 0 replies; 10+ messages in thread
From: Suren Baghdasaryan @ 2022-10-19 0:11 UTC (permalink / raw)
To: Zach OKeefe
Cc: Alejandro Colomar, Michael Kerrisk, Yang Shi, linux-mm,
linux-man, Minchan Kim
On Tue, Oct 18, 2022 at 4:51 PM Zach OKeefe <zokeefe@google.com> wrote:
>
> From: Zach O'Keefe <zokeefe@google.com>
>
> The initial commit of process_madvise(2) to man-pages project included
> an error, indicating that CAP_SYS_ADMIN capability was required when, in
> fact, CAP_SYS_NICE was the required capability.
>
> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> ("mm/madvise: introduce process_madvise() syscall: an external memory
> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> attach requirement for process_madvise") which replaced this with a
> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> performance).
>
> Correct this in the man-page for process_madvise(2).
Thanks for fixing my mistake!
>
> Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> man2/process_madvise.2 | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 6208206e4..7bee1a098 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -113,7 +113,8 @@ check (see
> in addition,
> because of the performance implications of applying the advice,
> the caller must have the
> -.B CAP_SYS_ADMIN
> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> +.B CAP_SYS_NICE
> capability.
> .SH RETURN VALUE
> On success,
> --
> 2.38.0.413.g74048e4d9e-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-18 23:50 ` [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
2022-10-19 0:11 ` Suren Baghdasaryan
@ 2022-10-21 12:37 ` Alejandro Colomar
2022-10-21 12:41 ` Alejandro Colomar
1 sibling, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2022-10-21 12:37 UTC (permalink / raw)
To: Zach OKeefe
Cc: Yang Shi, linux-mm, linux-man, Suren Baghdasaryan, Minchan Kim,
Michael Kerrisk
[-- Attachment #1.1: Type: text/plain, Size: 2092 bytes --]
Hi Zach!
Thanks for the work! :-)
On 10/19/22 01:50, Zach OKeefe wrote:
> From: Zach O'Keefe <zokeefe@google.com>
>
> The initial commit of process_madvise(2) to man-pages project included
> an error, indicating that CAP_SYS_ADMIN capability was required when, in
> fact, CAP_SYS_NICE was the required capability.
>
> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> ("mm/madvise: introduce process_madvise() syscall: an external memory
> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> attach requirement for process_madvise") which replaced this with a
> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> performance).
Those two commits are several versions apart:
alx@asus5775:~/src/linux/linux$ git describe --contains ecb8ac8b1f14
v5.10-rc1~87^2~14
alx@asus5775:~/src/linux/linux$ git describe --contains 96cfe2c0fd23
v5.12-rc3~12^2~9
If I understand the paragraph above, from 5.10 to 5.12 the capability required
was CAP_SYS_ADMIN?
Cheers,
Alex
>
> Correct this in the man-page for process_madvise(2).
>
> Fixes: a144f458b ("process_madvise.2: Document process_madvise(2)")
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
> man2/process_madvise.2 | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> index 6208206e4..7bee1a098 100644
> --- a/man2/process_madvise.2
> +++ b/man2/process_madvise.2
> @@ -113,7 +113,8 @@ check (see
> in addition,
> because of the performance implications of applying the advice,
> the caller must have the
> -.B CAP_SYS_ADMIN
> +.\" commit 96cfe2c0fd23ea7c2368d14f769d287e7ae1082e
> +.B CAP_SYS_NICE
> capability.
> .SH RETURN VALUE
> On success,
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-21 12:37 ` Alejandro Colomar
@ 2022-10-21 12:41 ` Alejandro Colomar
2022-10-21 16:16 ` Zach O'Keefe
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2022-10-21 12:41 UTC (permalink / raw)
To: Zach OKeefe
Cc: Yang Shi, linux-mm, linux-man, Suren Baghdasaryan, Minchan Kim,
Michael Kerrisk
[-- Attachment #1.1: Type: text/plain, Size: 1063 bytes --]
On 10/21/22 14:37, Alejandro Colomar wrote:
> On 10/19/22 01:50, Zach OKeefe wrote:
>> From: Zach O'Keefe <zokeefe@google.com>
>>
>> The initial commit of process_madvise(2) to man-pages project included
>> an error, indicating that CAP_SYS_ADMIN capability was required when, in
>> fact, CAP_SYS_NICE was the required capability.
>>
>> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
>> ("mm/madvise: introduce process_madvise() syscall: an external memory
>> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
>> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
>> attach requirement for process_madvise") which replaced this with a
>> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
>> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
>> performance).
[...]
> If I understand the paragraph above, from 5.10 to 5.12 the capability required
> was CAP_SYS_ADMIN?
Or was it CAP_SYS_PTRACE?
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-21 12:41 ` Alejandro Colomar
@ 2022-10-21 16:16 ` Zach O'Keefe
2022-10-21 16:30 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Zach O'Keefe @ 2022-10-21 16:16 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Yang Shi, linux-mm, linux-man, Suren Baghdasaryan, Minchan Kim,
Michael Kerrisk
Hey Alex!
Thanks for taking the time to review!
On Fri, Oct 21, 2022 at 5:41 AM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
>
> On 10/21/22 14:37, Alejandro Colomar wrote:
> > On 10/19/22 01:50, Zach OKeefe wrote:
> >> From: Zach O'Keefe <zokeefe@google.com>
> >>
> >> The initial commit of process_madvise(2) to man-pages project included
> >> an error, indicating that CAP_SYS_ADMIN capability was required when, in
> >> fact, CAP_SYS_NICE was the required capability.
> >>
> >> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
> >> ("mm/madvise: introduce process_madvise() syscall: an external memory
> >> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
> >> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
> >> attach requirement for process_madvise") which replaced this with a
> >> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
> >> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
> >> performance).
>
> [...]
>
> > If I understand the paragraph above, from 5.10 to 5.12 the capability required
> > was CAP_SYS_ADMIN?
>
> Or was it CAP_SYS_PTRACE?
Starting in 5.10, there was no CAP_* capability requirement - only
PTRACE_MODE_ATTACH_FSCREDS (aka PTRACE_MODE_ATTACH |
PTRACE_MODE_REALCREDS). Now, my understanding of the algorithm
employed for ptrace access mode checking isn't to be trusted, but
AFAIK, a caller having CAP_SYS_PTRACE in the target's user namespace
(directly or transitively) isn't required to pass this (though it
makes it easier). ptrace(2) has an overview of the algorithm.
Starting in 5.12, CAP_SYS_NICE was added as a requirement, and the
ptrace algorithm used changed to PTRACE_MODE_READ.
If you think recording the differences in kernel versions in the
man-page is important, let me know and I can amend this patch.
Thanks,
Zcah
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup
2022-10-21 16:16 ` Zach O'Keefe
@ 2022-10-21 16:30 ` Alejandro Colomar
0 siblings, 0 replies; 10+ messages in thread
From: Alejandro Colomar @ 2022-10-21 16:30 UTC (permalink / raw)
To: Zach O'Keefe
Cc: Yang Shi, linux-mm, linux-man, Suren Baghdasaryan, Minchan Kim,
Michael Kerrisk
[-- Attachment #1.1: Type: text/plain, Size: 2298 bytes --]
Hey Zach,
On 10/21/22 18:16, Zach O'Keefe wrote:
> Hey Alex!
>
> Thanks for taking the time to review!
>
> On Fri, Oct 21, 2022 at 5:41 AM Alejandro Colomar
> <alx.manpages@gmail.com> wrote:
>>
>> On 10/21/22 14:37, Alejandro Colomar wrote:
>>> On 10/19/22 01:50, Zach OKeefe wrote:
>>>> From: Zach O'Keefe <zokeefe@google.com>
>>>>
>>>> The initial commit of process_madvise(2) to man-pages project included
>>>> an error, indicating that CAP_SYS_ADMIN capability was required when, in
>>>> fact, CAP_SYS_NICE was the required capability.
>>>>
>>>> The initial commit of process_madvise(2) to Linux, commit ecb8ac8b1f14
>>>> ("mm/madvise: introduce process_madvise() syscall: an external memory
>>>> hinting API"), relied on PTRACE_MODE_ATTACH_FSCREDS (see ptrace(2)),
>>>> but was amended by commit 96cfe2c0fd23 ("mm/madvise: replace ptrace
>>>> attach requirement for process_madvise") which replaced this with a
>>>> combination of PTRACE_MODE_READ and CAP_SYS_NICE (PTRACE_MODE_READ to
>>>> prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process
>>>> performance).
>>
>> [...]
>>
>>> If I understand the paragraph above, from 5.10 to 5.12 the capability required
>>> was CAP_SYS_ADMIN?
>>
>> Or was it CAP_SYS_PTRACE?
>
> Starting in 5.10, there was no CAP_* capability requirement - only
> PTRACE_MODE_ATTACH_FSCREDS (aka PTRACE_MODE_ATTACH |
> PTRACE_MODE_REALCREDS). Now, my understanding of the algorithm
> employed for ptrace access mode checking isn't to be trusted, but
> AFAIK, a caller having CAP_SYS_PTRACE in the target's user namespace
> (directly or transitively) isn't required to pass this (though it
> makes it easier). ptrace(2) has an overview of the algorithm.
>
> Starting in 5.12, CAP_SYS_NICE was added as a requirement, and the
> ptrace algorithm used changed to PTRACE_MODE_READ.
Understood.
>
> If you think recording the differences in kernel versions in the
> man-page is important, let me know and I can amend this patch.
Yes; since it was live during 2 versions, I think we should at least mention it.
A couple of lines in NOTES might be enough.
Thanks,
Alex
>
> Thanks,
> Zcah
>
>> --
>> <http://www.alejandro-colomar.es/>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-10-21 16:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18 23:50 [PATCH man-pages v2 0/4] Add MADV_COLLAPSE documentation Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 1/4] madvise.2: update THP file/shmem documentation for +5.4 Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 2/4] madvise.2: document reliable probe for advice support Zach OKeefe
2022-10-18 23:50 ` [PATCH man-pages v2 3/4] process_madvise.2: CAP_SYS_ADMIN cleanup Zach OKeefe
2022-10-19 0:11 ` Suren Baghdasaryan
2022-10-21 12:37 ` Alejandro Colomar
2022-10-21 12:41 ` Alejandro Colomar
2022-10-21 16:16 ` Zach O'Keefe
2022-10-21 16:30 ` Alejandro Colomar
2022-10-18 23:50 ` [PATCH man-pages v2 4/4] madvise.2: add documentation for MADV_COLLAPSE Zach OKeefe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox