linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description
@ 2024-11-29 15:59 Lorenzo Stoakes
  2024-11-29 18:13 ` Jann Horn
  0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Stoakes @ 2024-11-29 15:59 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, Suren Baghdasaryan, Liam R . Howlett, Matthew Wilcox,
	Vlastimil Babka, Jann Horn, linux-mm

Lightweight guard region support has been added to Linux 6.13, which adds
MADV_GUARD_INSTALL and MADV_GUARD_REMOVE flags to the madvise() system
call. Therefore, update the manpage for madvise() and describe these
operations.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
v2:
* Updated to use semantic newlines as suggested by Alejandro.
* Avoided emboldening parens as suggested by Alejandro.
* One very minor grammatical fix.

v1:
https://lore.kernel.org/all/20241129093205.8664-1-lorenzo.stoakes@oracle.com

 man/man2/madvise.2 | 93 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
index 4f2210ee2..4cb5e7302 100644
--- a/man/man2/madvise.2
+++ b/man/man2/madvise.2
@@ -676,6 +676,91 @@ or secret memory regions created using
 Note that with
 .BR MADV_POPULATE_WRITE ,
 the process can be killed at any moment when the system runs out of memory.
+.TP
+.BR MADV_GUARD_INSTALL " (since Linux 6.13)"
+Install a lightweight guard region into the range specified by
+.I addr
+and
+.IR size ,
+causing any read or write in the range to result in a fatal
+.B SIGSEGV
+signal being raised.
+.IP
+If the region maps memory pages they will be cleared as part of the operation,
+though if
+.B MADV_GUARD_INSTALL
+is applied to regions containing pre-existing lightweight guard regions,
+they are left in place.
+.IP
+This operation is only supported for writable anonymous private mappings which
+have not been mlock'd.
+An
+.B EINVAL
+error is returned if it is attempted on any other kind of mapping.
+.IP
+This operation is more efficient than mapping a new region of memory
+.BR PROT_NONE ,
+as it does not require the establishment of new mappings,
+instead regions of an existing mapping simply have their page tables
+manipulated to establish the desired behavior.
+No additional memory is used.
+.IP
+Lightweight guard regions remain on fork
+(except for any parts which have had
+.B MADV_WIPEONFORK
+applied to them),
+and are not removed by
+.BR MADV_DONTNEED ,
+.BR MADV_FREE ,
+.BR MADV_PAGEOUT ,
+or
+.BR MADV_COLD .
+.IP
+Attempting to
+.BR mlock ()
+lightweight guard regions will fail,
+as will
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE .
+.IP
+If the mapping has its attributes changed,
+or is split or partially unmapped,
+any existing guard regions remain in place
+(except if they are unmapped).
+.IP
+If a mapping is moved using
+.BR mremap (),
+lightweight guard regions are moved with it.
+.IP
+Lightweight guard regions are removed when unmapped,
+on process teardown,
+or when the
+.B MADV_GUARD_REMOVE
+operation is applied to them.
+.TP
+.BR MADV_GUARD_REMOVE " (since Linux 6.13)"
+Remove any lightweight guard regions which exist in the range specified by
+.I addr
+and
+.IR size .
+.IP
+All mappings in the range other than lightweight guard regions are left in place
+(including mlock'd mappings).
+The operation is,
+however,
+only valid for writable anonymous private mappings,
+returning an
+.B EINVAL
+error otherwise.
+.IP
+When lightweight guard regions are removed,
+they act as empty regions of the containing mapping.
+Since only writable anonymous private mappings are supported,
+they therefore become zero-fill-on-demand pages.
+.IP
+If any transparent huge pages are encountered in the operation,
+they are left in place.
 .SH RETURN VALUE
 On success,
 .BR madvise ()
@@ -787,6 +872,14 @@ or
 or secret memory regions created using
 .BR memfd_secret(2) .
 .TP
+.B EINVAL
+.I advice
+is
+.B MADV_GUARD_INSTALL
+or
+.BR MADV_GUARD_REMOVE ,
+but the specified address range contains an unsupported mapping.
+.TP
 .B EIO
 (for
 .BR MADV_WILLNEED )
--
2.47.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description
  2024-11-29 15:59 [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description Lorenzo Stoakes
@ 2024-11-29 18:13 ` Jann Horn
  2024-12-02 14:05   ` Lorenzo Stoakes
  0 siblings, 1 reply; 4+ messages in thread
From: Jann Horn @ 2024-11-29 18:13 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Alejandro Colomar, linux-man, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Vlastimil Babka, linux-mm

On Fri, Nov 29, 2024 at 4:59 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> Lightweight guard region support has been added to Linux 6.13, which adds
> MADV_GUARD_INSTALL and MADV_GUARD_REMOVE flags to the madvise() system
> call. Therefore, update the manpage for madvise() and describe these
> operations.
[...]
> +.TP
> +.BR MADV_GUARD_INSTALL " (since Linux 6.13)"
> +Install a lightweight guard region into the range specified by
> +.I addr
> +and
> +.IR size ,
> +causing any read or write in the range to result in a fatal
> +.B SIGSEGV
> +signal being raised.

Single-word nitpick: Maybe remove the word "fatal"?

I think the term "fatal signal" normally refers to a signal that is
guaranteed to terminate the task (that's how the signal handling code
uses the term, more or less); but a SIGSEGV caused by VM_FAULT_SIGSEGV
can AFAIK be handled by a userspace signal handler.

SIGKILL is the one signal that is always fatal; the kernel can also
send other signals in an always-fatal way, like with force_fatal_sig()
or force_exit_sig(), but those are not used for VM_FAULT_SIGSEGV.
(Those functions are mostly for cases where we can't continue because
something is in an unsafe state, like if a signal return failed and
the register state might now be messed up.)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description
  2024-11-29 18:13 ` Jann Horn
@ 2024-12-02 14:05   ` Lorenzo Stoakes
  2024-12-02 14:21     ` Alejandro Colomar
  0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Stoakes @ 2024-12-02 14:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alejandro Colomar, linux-man, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Vlastimil Babka, linux-mm

On Fri, Nov 29, 2024 at 07:13:22PM +0100, Jann Horn wrote:
> On Fri, Nov 29, 2024 at 4:59 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > Lightweight guard region support has been added to Linux 6.13, which adds
> > MADV_GUARD_INSTALL and MADV_GUARD_REMOVE flags to the madvise() system
> > call. Therefore, update the manpage for madvise() and describe these
> > operations.
> [...]
> > +.TP
> > +.BR MADV_GUARD_INSTALL " (since Linux 6.13)"
> > +Install a lightweight guard region into the range specified by
> > +.I addr
> > +and
> > +.IR size ,
> > +causing any read or write in the range to result in a fatal
> > +.B SIGSEGV
> > +signal being raised.
>
> Single-word nitpick: Maybe remove the word "fatal"?
>
> I think the term "fatal signal" normally refers to a signal that is
> guaranteed to terminate the task (that's how the signal handling code
> uses the term, more or less); but a SIGSEGV caused by VM_FAULT_SIGSEGV
> can AFAIK be handled by a userspace signal handler.
>
> SIGKILL is the one signal that is always fatal; the kernel can also
> send other signals in an always-fatal way, like with force_fatal_sig()
> or force_exit_sig(), but those are not used for VM_FAULT_SIGSEGV.
> (Those functions are mostly for cases where we can't continue because
> something is in an unsafe state, like if a signal return failed and
> the register state might now be messed up.)

I think there's a bit of a disconnect between the meaning of a fatal signal
in userland and the kernel, from the kerne's perspective as per
fatal_signal_pending(), it is, as you say, SIGKILL.

From a user's persepctive, and as per sig_fatal(), it is one that is, by
default, fatal if not handled.

So I think here it's fine to say 'fatal' in the latter sense, and the fact
we immediately mention SIGSEGV clarifies in what sense we mean 'fatal'.

The intent here also is that a user would treat this as a fatal event, a
thread that accesses a guard area is accessing memory that it shouldn't.

However I also see it from your perspective, I mean we say what signal
we're sending so it's not hugely necessary and eliminates a possible
confusion.

Not sure if Alejandro has any objection to this turn of phrase?

From my perspective I don't think it's too problematic to leave it in, but
if it's easy for Alejandro to pull out I have no objection.

If people feel strongly + Alejandro would find it easier, I could just send
a v3 with it removed.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description
  2024-12-02 14:05   ` Lorenzo Stoakes
@ 2024-12-02 14:21     ` Alejandro Colomar
  0 siblings, 0 replies; 4+ messages in thread
From: Alejandro Colomar @ 2024-12-02 14:21 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jann Horn, linux-man, Suren Baghdasaryan, Liam R . Howlett,
	Matthew Wilcox, Vlastimil Babka, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3099 bytes --]

Hi Lorenzo, Jann,

On Mon, Dec 02, 2024 at 02:05:54PM +0000, Lorenzo Stoakes wrote:
> On Fri, Nov 29, 2024 at 07:13:22PM +0100, Jann Horn wrote:
> > On Fri, Nov 29, 2024 at 4:59 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > Lightweight guard region support has been added to Linux 6.13, which adds
> > > MADV_GUARD_INSTALL and MADV_GUARD_REMOVE flags to the madvise() system
> > > call. Therefore, update the manpage for madvise() and describe these
> > > operations.
> > [...]
> > > +.TP
> > > +.BR MADV_GUARD_INSTALL " (since Linux 6.13)"
> > > +Install a lightweight guard region into the range specified by
> > > +.I addr
> > > +and
> > > +.IR size ,
> > > +causing any read or write in the range to result in a fatal
> > > +.B SIGSEGV
> > > +signal being raised.
> >
> > Single-word nitpick: Maybe remove the word "fatal"?
> >
> > I think the term "fatal signal" normally refers to a signal that is
> > guaranteed to terminate the task (that's how the signal handling code
> > uses the term, more or less); but a SIGSEGV caused by VM_FAULT_SIGSEGV
> > can AFAIK be handled by a userspace signal handler.
> >
> > SIGKILL is the one signal that is always fatal; the kernel can also
> > send other signals in an always-fatal way, like with force_fatal_sig()
> > or force_exit_sig(), but those are not used for VM_FAULT_SIGSEGV.
> > (Those functions are mostly for cases where we can't continue because
> > something is in an unsafe state, like if a signal return failed and
> > the register state might now be messed up.)
> 
> I think there's a bit of a disconnect between the meaning of a fatal signal
> in userland and the kernel, from the kerne's perspective as per
> fatal_signal_pending(), it is, as you say, SIGKILL.
> 
> From a user's persepctive, and as per sig_fatal(), it is one that is, by
> default, fatal if not handled.
> 
> So I think here it's fine to say 'fatal' in the latter sense, and the fact
> we immediately mention SIGSEGV clarifies in what sense we mean 'fatal'.
> 
> The intent here also is that a user would treat this as a fatal event, a
> thread that accesses a guard area is accessing memory that it shouldn't.
> 
> However I also see it from your perspective, I mean we say what signal
> we're sending so it's not hugely necessary and eliminates a possible
> confusion.
> 
> Not sure if Alejandro has any objection to this turn of phrase?

I agree with Jann.

With your interpretation, fatal SIGSEGV is redundant, as SIGSEGV is
always "fatal" in that sense.  It's better to just say SIGSEGV.

In Jann's more formal interpretation, fatal SIGSEGV means a different
thing.

I prefer just SIGSEGV.

> 
> From my perspective I don't think it's too problematic to leave it in, but
> if it's easy for Alejandro to pull out I have no objection.
> 
> If people feel strongly + Alejandro would find it easier, I could just send
> a v3 with it removed.

Yeah, please send a v3.  Thanks!

Have a lovely day!
Alex

> 
> Thanks, Lorenzo

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-12-02 14:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-29 15:59 [PATCH man-pages v2] madvise.2: add MADV_GUARD_INSTALL, MADV_GUARD_REMOVE description Lorenzo Stoakes
2024-11-29 18:13 ` Jann Horn
2024-12-02 14:05   ` Lorenzo Stoakes
2024-12-02 14:21     ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox