linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
@ 2024-11-30  4:54 Kees Cook
  2024-11-30  5:55 ` Aleksa Sarai
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Kees Cook @ 2024-11-30  4:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Zbigniew Jędrzejewski-Szmek, Tycho Andersen,
	Linus Torvalds, Aleksa Sarai, Eric Biederman, Christian Brauner,
	Jan Kara, linux-mm, linux-fsdevel, linux-kernel, linux-hardening

Zbigniew mentioned at Linux Plumber's that systemd is interested in
switching to execveat() for service execution, but can't, because the
contents of /proc/pid/comm are the file descriptor which was used,
instead of the path to the binary. This makes the output of tools like
top and ps useless, especially in a world where most fds are opened
CLOEXEC so the number is truly meaningless.

When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
dentry's filename for "comm" instead of using the useless numeral from
the synthetic fdpath construction. This way the actual exec machinery
is unchanged, but cosmetically the comm looks reasonable to admins
investigating things.

Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
flag bits to indicate that we need to set "comm" from the dentry.

Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Suggested-by: Tycho Andersen <tandersen@netflix.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: Aleksa Sarai <cyphar@cyphar.com>
Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
Signed-off-by: Kees Cook <kees@kernel.org>
---
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org

Here's what I've put together from the various suggestions. I didn't
want to needlessly grow bprm, so I just added a flag instead. Otherwise,
this is very similar to what Linus and Al suggested.
---
 fs/exec.c               | 22 +++++++++++++++++++---
 include/linux/binfmts.h |  4 +++-
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 5f16500ac325..d897d60ca5c2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
 	perf_event_exec();
-	__set_task_comm(me, kbasename(bprm->filename), true);
+
+	/*
+	 * If the original filename was empty, alloc_bprm() made up a path
+	 * that will probably not be useful to admins running ps or similar.
+	 * Let's fix it up to be something reasonable.
+	 */
+	if (bprm->comm_from_dentry) {
+		rcu_read_lock();
+		/* The dentry name won't change while we hold the rcu read lock. */
+		__set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
+				true);
+		rcu_read_unlock();
+	} else {
+		__set_task_comm(me, kbasename(bprm->filename), true);
+	}
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
@@ -1521,11 +1535,13 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
 	if (fd == AT_FDCWD || filename->name[0] == '/') {
 		bprm->filename = filename->name;
 	} else {
-		if (filename->name[0] == '\0')
+		if (filename->name[0] == '\0') {
 			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd);
-		else
+			bprm->comm_from_dentry = 1;
+		} else {
 			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s",
 						  fd, filename->name);
+		}
 		if (!bprm->fdpath)
 			goto out_free;
 
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index e6c00e860951..3305c849abd6 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -42,7 +42,9 @@ struct linux_binprm {
 		 * Set when errors can no longer be returned to the
 		 * original userspace.
 		 */
-		point_of_no_return:1;
+		point_of_no_return:1,
+		/* Set when "comm" must come from the dentry. */
+		comm_from_dentry:1;
 	struct file *executable; /* Executable to pass to the interpreter */
 	struct file *interpreter;
 	struct file *file;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30  4:54 [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case Kees Cook
@ 2024-11-30  5:55 ` Aleksa Sarai
  2024-12-04 23:50   ` Zbigniew Jędrzejewski-Szmek
  2024-11-30 12:29 ` Christian Brauner
  2024-11-30 20:28 ` Mateusz Guzik
  2 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2024-11-30  5:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Zbigniew Jędrzejewski-Szmek, Tycho Andersen,
	Linus Torvalds, Eric Biederman, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, linux-kernel, linux-hardening

[-- Attachment #1: Type: text/plain, Size: 4396 bytes --]

On 2024-11-29, Kees Cook <kees@kernel.org> wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> dentry's filename for "comm" instead of using the useless numeral from
> the synthetic fdpath construction. This way the actual exec machinery
> is unchanged, but cosmetically the comm looks reasonable to admins
> investigating things.
> 
> Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> flag bits to indicate that we need to set "comm" from the dentry.

Looks reasonable to me, feel free to take my

Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>

> 
> Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> Suggested-by: Tycho Andersen <tandersen@netflix.com>
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Aleksa Sarai <cyphar@cyphar.com>
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-fsdevel@vger.kernel.org
> 
> Here's what I've put together from the various suggestions. I didn't
> want to needlessly grow bprm, so I just added a flag instead. Otherwise,
> this is very similar to what Linus and Al suggested.
> ---
>  fs/exec.c               | 22 +++++++++++++++++++---
>  include/linux/binfmts.h |  4 +++-
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5f16500ac325..d897d60ca5c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
>  		set_dumpable(current->mm, SUID_DUMP_USER);
>  
>  	perf_event_exec();
> -	__set_task_comm(me, kbasename(bprm->filename), true);
> +
> +	/*
> +	 * If the original filename was empty, alloc_bprm() made up a path
> +	 * that will probably not be useful to admins running ps or similar.
> +	 * Let's fix it up to be something reasonable.
> +	 */
> +	if (bprm->comm_from_dentry) {
> +		rcu_read_lock();
> +		/* The dentry name won't change while we hold the rcu read lock. */
> +		__set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> +				true);
> +		rcu_read_unlock();
> +	} else {
> +		__set_task_comm(me, kbasename(bprm->filename), true);
> +	}
>  
>  	/* An exec changes our domain. We are no longer part of the thread
>  	   group */
> @@ -1521,11 +1535,13 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
>  	if (fd == AT_FDCWD || filename->name[0] == '/') {
>  		bprm->filename = filename->name;
>  	} else {
> -		if (filename->name[0] == '\0')
> +		if (filename->name[0] == '\0') {
>  			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd);
> -		else
> +			bprm->comm_from_dentry = 1;
> +		} else {
>  			bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s",
>  						  fd, filename->name);
> +		}
>  		if (!bprm->fdpath)
>  			goto out_free;
>  
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index e6c00e860951..3305c849abd6 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -42,7 +42,9 @@ struct linux_binprm {
>  		 * Set when errors can no longer be returned to the
>  		 * original userspace.
>  		 */
> -		point_of_no_return:1;
> +		point_of_no_return:1,
> +		/* Set when "comm" must come from the dentry. */
> +		comm_from_dentry:1;
>  	struct file *executable; /* Executable to pass to the interpreter */
>  	struct file *interpreter;
>  	struct file *file;
> -- 
> 2.34.1
> 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30  4:54 [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case Kees Cook
  2024-11-30  5:55 ` Aleksa Sarai
@ 2024-11-30 12:29 ` Christian Brauner
  2024-11-30 18:02   ` Linus Torvalds
  2024-11-30 20:28 ` Mateusz Guzik
  2 siblings, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2024-11-30 12:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Zbigniew Jędrzejewski-Szmek, Tycho Andersen,
	Linus Torvalds, Aleksa Sarai, Eric Biederman, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> dentry's filename for "comm" instead of using the useless numeral from
> the synthetic fdpath construction. This way the actual exec machinery
> is unchanged, but cosmetically the comm looks reasonable to admins
> investigating things.
> 
> Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> flag bits to indicate that we need to set "comm" from the dentry.
> 
> Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> Suggested-by: Tycho Andersen <tandersen@netflix.com>
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Aleksa Sarai <cyphar@cyphar.com>
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-fsdevel@vger.kernel.org
> 
> Here's what I've put together from the various suggestions. I didn't
> want to needlessly grow bprm, so I just added a flag instead. Otherwise,
> this is very similar to what Linus and Al suggested.
> ---
>  fs/exec.c               | 22 +++++++++++++++++++---
>  include/linux/binfmts.h |  4 +++-
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5f16500ac325..d897d60ca5c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
>  		set_dumpable(current->mm, SUID_DUMP_USER);
>  
>  	perf_event_exec();
> -	__set_task_comm(me, kbasename(bprm->filename), true);
> +
> +	/*
> +	 * If the original filename was empty, alloc_bprm() made up a path
> +	 * that will probably not be useful to admins running ps or similar.
> +	 * Let's fix it up to be something reasonable.
> +	 */
> +	if (bprm->comm_from_dentry) {
> +		rcu_read_lock();
> +		/* The dentry name won't change while we hold the rcu read lock. */
> +		__set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),

What does the smp_load_acquire() pair with?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30 12:29 ` Christian Brauner
@ 2024-11-30 18:02   ` Linus Torvalds
  2024-12-01 14:17     ` Christian Brauner
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2024-11-30 18:02 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Kees Cook, Al Viro, Zbigniew Jędrzejewski-Szmek,
	Tycho Andersen, Aleksa Sarai, Eric Biederman, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Sat, 30 Nov 2024 at 04:30, Christian Brauner <brauner@kernel.org> wrote:
>
> What does the smp_load_acquire() pair with?

I'm not sure we have them everywhere, but at least this one at dentry
creation time.

__d_alloc():
        /* Make sure we always see the terminating NUL character */
        smp_store_release(&dentry->d_name.name, dname); /* ^^^ */

so even at rename time, when we swap the d_name.name pointers
(*without* using a store-release at that time), both of the dentry
names had memory orderings before.

That said, looking at swap_name() at the non-"swap just the pointers"
case, there we do just "memcpy()" the name, and it would probably be
good to update the target d_name.name with a smp_store_release.

In practice, none of this ever matters. Anybody who uses the dentry
name without locking either doesn't care enough (like comm[]) or will
use the sequence number thing to serialize at a much higher level. So
the smp_load_acquire() could probably be a READ_ONCE(), and nobody
would ever see the difference.

            Linus


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30  4:54 [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case Kees Cook
  2024-11-30  5:55 ` Aleksa Sarai
  2024-11-30 12:29 ` Christian Brauner
@ 2024-11-30 20:28 ` Mateusz Guzik
  2024-11-30 21:34   ` Linus Torvalds
  2 siblings, 1 reply; 10+ messages in thread
From: Mateusz Guzik @ 2024-11-30 20:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Zbigniew Jędrzejewski-Szmek, Tycho Andersen,
	Linus Torvalds, Aleksa Sarai, Eric Biederman, Christian Brauner,
	Jan Kara, linux-mm, linux-fsdevel, linux-kernel, linux-hardening

On Fri, Nov 29, 2024 at 08:54:38PM -0800, Kees Cook wrote:
> Zbigniew mentioned at Linux Plumber's that systemd is interested in
> switching to execveat() for service execution, but can't, because the
> contents of /proc/pid/comm are the file descriptor which was used,
> instead of the path to the binary. This makes the output of tools like
> top and ps useless, especially in a world where most fds are opened
> CLOEXEC so the number is truly meaningless.
> 
> When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> dentry's filename for "comm" instead of using the useless numeral from
> the synthetic fdpath construction. This way the actual exec machinery
> is unchanged, but cosmetically the comm looks reasonable to admins
> investigating things.
> 
> Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> flag bits to indicate that we need to set "comm" from the dentry.
> 
> Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
> Suggested-by: Tycho Andersen <tandersen@netflix.com>
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Aleksa Sarai <cyphar@cyphar.com>
> Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-fsdevel@vger.kernel.org
> 
> Here's what I've put together from the various suggestions. I didn't
> want to needlessly grow bprm, so I just added a flag instead. Otherwise,
> this is very similar to what Linus and Al suggested.
> ---
>  fs/exec.c               | 22 +++++++++++++++++++---
>  include/linux/binfmts.h |  4 +++-
>  2 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5f16500ac325..d897d60ca5c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
>  		set_dumpable(current->mm, SUID_DUMP_USER);
>  
>  	perf_event_exec();
> -	__set_task_comm(me, kbasename(bprm->filename), true);
> +
> +	/*
> +	 * If the original filename was empty, alloc_bprm() made up a path
> +	 * that will probably not be useful to admins running ps or similar.
> +	 * Let's fix it up to be something reasonable.
> +	 */
> +	if (bprm->comm_from_dentry) {
> +		rcu_read_lock();
> +		/* The dentry name won't change while we hold the rcu read lock. */
> +		__set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> +				true);

This does not sound legit whatsoever as it would indicate all renames
wait for rcu grace periods to end, which would be prettye weird.

Even commentary above dentry_cmp states:
         * Be careful about RCU walk racing with rename:
         * use 'READ_ONCE' to fetch the name pointer.
         *
         * NOTE! Even if a rename will mean that the length
         * was not loaded atomically, we don't care.

It may be this is considered tolerable, but there should be no
difficulty getting a real name there?

Regardless, the comment looks bogus.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30 20:28 ` Mateusz Guzik
@ 2024-11-30 21:34   ` Linus Torvalds
  0 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2024-11-30 21:34 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Kees Cook, Al Viro, Zbigniew Jędrzejewski-Szmek,
	Tycho Andersen, Aleksa Sarai, Eric Biederman, Christian Brauner,
	Jan Kara, linux-mm, linux-fsdevel, linux-kernel, linux-hardening

On Sat, 30 Nov 2024 at 12:28, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> > +             /* The dentry name won't change while we hold the rcu read lock. */
> > +             __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
> > +                             true);
>
> This does not sound legit whatsoever as it would indicate all renames
> wait for rcu grace periods to end, which would be prettye weird.

Yes, the "won't change" should be "won't go away from under us".

          Linus


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30 18:02   ` Linus Torvalds
@ 2024-12-01 14:17     ` Christian Brauner
  2024-12-01 16:54       ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Brauner @ 2024-12-01 14:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Al Viro, Zbigniew Jędrzejewski-Szmek,
	Tycho Andersen, Aleksa Sarai, Eric Biederman, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Sat, Nov 30, 2024 at 10:02:38AM -0800, Linus Torvalds wrote:
> On Sat, 30 Nov 2024 at 04:30, Christian Brauner <brauner@kernel.org> wrote:
> >
> > What does the smp_load_acquire() pair with?
> 
> I'm not sure we have them everywhere, but at least this one at dentry
> creation time.
> 
> __d_alloc():
>         /* Make sure we always see the terminating NUL character */
>         smp_store_release(&dentry->d_name.name, dname); /* ^^^ */
> 
> so even at rename time, when we swap the d_name.name pointers
> (*without* using a store-release at that time), both of the dentry
> names had memory orderings before.
> 
> That said, looking at swap_name() at the non-"swap just the pointers"
> case, there we do just "memcpy()" the name, and it would probably be
> good to update the target d_name.name with a smp_store_release.
> 
> In practice, none of this ever matters. Anybody who uses the dentry
> name without locking either doesn't care enough (like comm[]) or will
> use the sequence number thing to serialize at a much higher level. So
> the smp_load_acquire() could probably be a READ_ONCE(), and nobody
> would ever see the difference.

Right now it's confusing. So no matter if we do READ_ONCE() or
smp_load_acquire() there'd please be a comment explaing why so we don't
pointlessly leave everyone wondering about that barrier.

/*
 * Hold rcu lock to keep the name from being freed behind our back.
 * Use cquire semantics to make sure the terminating NUL from
 * __d_alloc() is seen.
 *
 * Note, we're deliberately sloppy here. We don't need to care about
 * detecting a concurrent rename and just want a sensible name.
 */
rcu_read_lock();
__set_task_comm(me, smp_load_acquire(&file_dentry(bprm->file)->d_name.name), true);
rcu_read_unlock();

or something better.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-12-01 14:17     ` Christian Brauner
@ 2024-12-01 16:54       ` Linus Torvalds
  2024-12-01 18:37         ` Christian Brauner
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2024-12-01 16:54 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Kees Cook, Al Viro, Zbigniew Jędrzejewski-Szmek,
	Tycho Andersen, Aleksa Sarai, Eric Biederman, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Sun, 1 Dec 2024 at 06:17, Christian Brauner <brauner@kernel.org> wrote:
>
> /*
>  * Hold rcu lock to keep the name from being freed behind our back.
>  * Use cquire semantics to make sure the terminating NUL from
>  * __d_alloc() is seen.
>  *
>  * Note, we're deliberately sloppy here. We don't need to care about
>  * detecting a concurrent rename and just want a sensible name.
>  */

Sure. Note that even "sensible" isn't truly guaranteed in theory,
since a concurrent rename could be doing a "memcpy()" into the
dentry->d_name.name area at the same time on another CPU.

But "theoretically hard guarantees" isn't what this code cares about.

The only really hard rule is that the end result in comm[] needs to be
NUL-terminated at all times (and hey, even *that* is arguably a "don't
print garbage" rule rather than something truly fatal), and everything
else is "do the best you can".

Could we take the dentry lock to be really careful? Sure. We simply
don't care enough, and while other parts of execve() are much more
expensive, let's not.

              Linus


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-12-01 16:54       ` Linus Torvalds
@ 2024-12-01 18:37         ` Christian Brauner
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2024-12-01 18:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Al Viro, Zbigniew Jędrzejewski-Szmek,
	Tycho Andersen, Aleksa Sarai, Eric Biederman, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Sun, Dec 01, 2024 at 08:54:41AM -0800, Linus Torvalds wrote:
> On Sun, 1 Dec 2024 at 06:17, Christian Brauner <brauner@kernel.org> wrote:
> >
> > /*
> >  * Hold rcu lock to keep the name from being freed behind our back.
> >  * Use cquire semantics to make sure the terminating NUL from
> >  * __d_alloc() is seen.
> >  *
> >  * Note, we're deliberately sloppy here. We don't need to care about
> >  * detecting a concurrent rename and just want a sensible name.
> >  */
> 
> Sure. Note that even "sensible" isn't truly guaranteed in theory,
> since a concurrent rename could be doing a "memcpy()" into the
> dentry->d_name.name area at the same time on another CPU.

Yeah, I saw, if the dname.name assignment is reorded to happen before
the memcpy() afaict. Anyway, it's not that important especially since
PR_SET_MM_MAP puts comm, auxv etc. fully under user control anyway.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  2024-11-30  5:55 ` Aleksa Sarai
@ 2024-12-04 23:50   ` Zbigniew Jędrzejewski-Szmek
  0 siblings, 0 replies; 10+ messages in thread
From: Zbigniew Jędrzejewski-Szmek @ 2024-12-04 23:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Aleksa Sarai, Al Viro, Tycho Andersen, Linus Torvalds,
	Eric Biederman, Christian Brauner, Jan Kara, linux-mm,
	linux-fsdevel, linux-kernel, linux-hardening

On Sat, Nov 30, 2024 at 04:55:09PM +1100, Aleksa Sarai wrote:
> On 2024-11-29, Kees Cook <kees@kernel.org> wrote:
> > Zbigniew mentioned at Linux Plumber's that systemd is interested in
> > switching to execveat() for service execution, but can't, because the
> > contents of /proc/pid/comm are the file descriptor which was used,
> > instead of the path to the binary. This makes the output of tools like
> > top and ps useless, especially in a world where most fds are opened
> > CLOEXEC so the number is truly meaningless.
> > 
> > When the filename passed in is empty (e.g. with AT_EMPTY_PATH), use the
> > dentry's filename for "comm" instead of using the useless numeral from
> > the synthetic fdpath construction. This way the actual exec machinery
> > is unchanged, but cosmetically the comm looks reasonable to admins
> > investigating things.
> > 
> > Instead of adding TASK_COMM_LEN more bytes to bprm, use one of the unused
> > flag bits to indicate that we need to set "comm" from the dentry.
> 
> Looks reasonable to me, feel free to take my
> 
> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>

Thank you for making another version of the patch.

I tested this with systemd compiled to use fexecve and everything
seems to work as expected (the filename in /proc//comm).

Zbyszek


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-12-04 23:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-30  4:54 [PATCH] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case Kees Cook
2024-11-30  5:55 ` Aleksa Sarai
2024-12-04 23:50   ` Zbigniew Jędrzejewski-Szmek
2024-11-30 12:29 ` Christian Brauner
2024-11-30 18:02   ` Linus Torvalds
2024-12-01 14:17     ` Christian Brauner
2024-12-01 16:54       ` Linus Torvalds
2024-12-01 18:37         ` Christian Brauner
2024-11-30 20:28 ` Mateusz Guzik
2024-11-30 21:34   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox