linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
       [not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com>
@ 2024-08-09 14:39   ` Eric W. Biederman
  2024-08-09 15:13     ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2024-08-09 14:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Mak, Kees Cook, Alexander Viro, Christian Brauner,
	Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 6 Aug 2024 at 11:16, Brian Mak <makb@juniper.net> wrote:
>>
>> @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
>>                 cprm->vma_data_size += m->dump_size;
>>         }
>>
>> +       sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
>> +               cmp_vma_size, NULL);
>> +
>>         return true;
>>  }
>
> Hmm. Realistically we only dump core in ELF, and the order of the
> segments shouldn't matter.
>
> But I wonder if we should do this in the ->core_dump() function
> itself, in case it would have mattered for other dump formats?
>
> IOW, instead of being at the bottom of dump_vma_snapshot(), maybe the
> sorting should be at the top of elf_core_dump()?
>
> And yes, in practice I doubt we'll ever have other dump formats, and
> no, a.out isn't doing some miraculous comeback either.
>
> But I bet you didn't test elf_fdpic_core_dump() even if I bet it (a)
> works and (b) nobody cares.
>
> So moving it to the ELF side might be conceptually the right thing to do?
>
> (Or is there some reason it needs to be done at snapshot time that I
> just didn't fully appreciate?)

I asked him to perform this at snapshot time.  Plus it is obvious at
snapshot time that you can change the allocated array, while it is
not so obvious in the ->core_dump methods.

I would argue that the long term maintainable thing to do is to
merge elf_core_dump and elf_fdpic_core_dump and put all of the code
in fs/coredump.c

Performing the sort at snapshot time avoids introducing one extra reason
why the two elf implementations of elf coredumping are different.

I did read through the elf fdpic code quickly and it looks like it
should just work no matter which order the vma's are dumped in.  Just
like the other elf coredump code does.




My practical concern is that someone has a coredump thing that walks
through the program headers and short circuits the walk because it knows
the program headers are all written in order.  But the only way to find
one of those is to just try it.

Eric


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2024-08-09 14:39   ` [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores Eric W. Biederman
@ 2024-08-09 15:13     ` Linus Torvalds
  0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2024-08-09 15:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brian Mak, Kees Cook, Alexander Viro, Christian Brauner,
	Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov

On Fri, 9 Aug 2024 at 07:40, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I asked him to perform this at snapshot time.  Plus it is obvious at
> snapshot time that you can change the allocated array, while it is
> not so obvious in the ->core_dump methods.

Fair enough. The days when we supported a.out dumps are obviously long
long gone, and aren't coming back.

So I am not adamant that it has to be done in the dumper, and I
probably just have that historical "we have multiple different dumpers
with different rules" mindset that isn't really relevant any more.

> I would argue that the long term maintainable thing to do is to
> merge elf_core_dump and elf_fdpic_core_dump and put all of the code
> in fs/coredump.c

I wouldn't object. It's not like there's any foreseeable new core dump
format that we'd expect, and the indirection makes the code flow
harder to follow.

Not that most people look at this code a lot.

             Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
       [not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org>
@ 2024-08-10  0:52   ` Brian Mak
  2024-08-10  4:06     ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Mak @ 2024-08-10  0:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote:
> 
> On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote:
>> Large cores may be truncated in some scenarios, such as with daemons
>> with stop timeouts that are not large enough or lack of disk space. This
>> impacts debuggability with large core dumps since critical information
>> necessary to form a usable backtrace, such as stacks and shared library
>> information, are omitted.
>> 
>> We attempted to figure out which VMAs are needed to create a useful
>> backtrace, and it turned out to be a non-trivial problem. Instead, we
>> try simply sorting the VMAs by size, which has the intended effect.
>> 
>> [...]
> 
> While waiting on rr test validation, and since we're at the start of the
> dev cycle, I figure let's get this into -next ASAP to see if anything
> else pops out. We can drop/revise if there are problems. (And as always,
> I will add any Acks/Reviews/etc that show up on the thread.)
> 
> Applied to for-next/execve, thanks!
> 
> [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores
>      https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$

Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on
this.

Regarding the rr tests, it was not an easy task to get the environment
set up to do this, but I did it and was able to run the tests. The rr
tests require a lot of kernel config options and there's no list
documenting what's needed anywhere...

All the tests pass except for the sioc and sioc-no-syscallbuf tests.
However, these test failures are due to an incompatibility with the
network adapter I'm using. It seems that it only likes older network
adapters. I've switched my virtualized network adapter twice now, and
each time, the test gets a bit further than the previous time. Will
continue trying different network adapters until something hopefully
works. In any case, since this error isn't directly related to my
changes and the rest of the tests pass, then I think we can be pretty
confident that this change is not breaking rr.

Best,
Brian Mak

> Take care,
> 
> --
> Kees Cook
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2024-08-10  0:52   ` Brian Mak
@ 2024-08-10  4:06     ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2024-08-10  4:06 UTC (permalink / raw)
  To: Brian Mak
  Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Sat, Aug 10, 2024 at 12:52:16AM +0000, Brian Mak wrote:
> On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote:
> > 
> > On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote:
> >> Large cores may be truncated in some scenarios, such as with daemons
> >> with stop timeouts that are not large enough or lack of disk space. This
> >> impacts debuggability with large core dumps since critical information
> >> necessary to form a usable backtrace, such as stacks and shared library
> >> information, are omitted.
> >> 
> >> We attempted to figure out which VMAs are needed to create a useful
> >> backtrace, and it turned out to be a non-trivial problem. Instead, we
> >> try simply sorting the VMAs by size, which has the intended effect.
> >> 
> >> [...]
> > 
> > While waiting on rr test validation, and since we're at the start of the
> > dev cycle, I figure let's get this into -next ASAP to see if anything
> > else pops out. We can drop/revise if there are problems. (And as always,
> > I will add any Acks/Reviews/etc that show up on the thread.)
> > 
> > Applied to for-next/execve, thanks!
> > 
> > [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores
> >      https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$
> 
> Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on
> this.
> 
> Regarding the rr tests, it was not an easy task to get the environment
> set up to do this, but I did it and was able to run the tests. The rr
> tests require a lot of kernel config options and there's no list
> documenting what's needed anywhere...

Thanks for suffering through that!

> All the tests pass except for the sioc and sioc-no-syscallbuf tests.
> However, these test failures are due to an incompatibility with the
> network adapter I'm using. It seems that it only likes older network
> adapters. I've switched my virtualized network adapter twice now, and
> each time, the test gets a bit further than the previous time. Will
> continue trying different network adapters until something hopefully
> works. In any case, since this error isn't directly related to my
> changes and the rest of the tests pass, then I think we can be pretty
> confident that this change is not breaking rr.

Perfect! Okay, we'll keep our eyes open for any reports of breakage. :)

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
       [not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net>
       [not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com>
       [not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org>
@ 2024-08-10 12:28 ` Eric W. Biederman
  2024-08-12 18:05   ` Kees Cook
  2025-02-18  8:54 ` Michael Stapelberg
  3 siblings, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2024-08-10 12:28 UTC (permalink / raw)
  To: Brian Mak
  Cc: Kees Cook, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

Brian Mak <makb@juniper.net> writes:

> Large cores may be truncated in some scenarios, such as with daemons
> with stop timeouts that are not large enough or lack of disk space. This
> impacts debuggability with large core dumps since critical information
> necessary to form a usable backtrace, such as stacks and shared library
> information, are omitted.
>
> We attempted to figure out which VMAs are needed to create a useful
> backtrace, and it turned out to be a non-trivial problem. Instead, we
> try simply sorting the VMAs by size, which has the intended effect.
>
> By sorting VMAs by dump size and dumping in that order, we have a
> simple, yet effective heuristic.

To make finding the history easier I would include:
v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net
v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

As Kees has already picked this up this is quite possibly silly.
But *shrug* that was when I was out.


> Signed-off-by: Brian Mak <makb@juniper.net>
> ---
>
> Hi all,
>
> Still need to run rr tests on this, per Kees Cook's suggestion, will
> update back once done. GDB and readelf show that this patch works
> without issue though.
>
> Thanks,
> Brian Mak
>
> v3: Edited commit message to better convey alternative solution as
>     non-trivial
>
>     Moved sorting logic to fs/coredump.c to make it in place
>
>     Above edits suggested by Eric Biederman <ebiederm@xmission.com>
>
> v2: Edited commit message to include more reasoning for sorting VMAs
>     
>     Removed conditional VMA sorting with debugfs knob
>     
>     Above edits suggested by Eric Biederman <ebiederm@xmission.com>
>
>  fs/coredump.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 7f12ff6ad1d3..33c5ac53ab31 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -18,6 +18,7 @@
>  #include <linux/personality.h>
>  #include <linux/binfmts.h>
>  #include <linux/coredump.h>
> +#include <linux/sort.h>
>  #include <linux/sched/coredump.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/task_stack.h>
> @@ -1191,6 +1192,18 @@ static void free_vma_snapshot(struct coredump_params *cprm)
>  	}
>  }
>  
> +static int cmp_vma_size(const void *vma_meta_lhs_ptr, const void *vma_meta_rhs_ptr)
> +{
> +	const struct core_vma_metadata *vma_meta_lhs = vma_meta_lhs_ptr;
> +	const struct core_vma_metadata *vma_meta_rhs = vma_meta_rhs_ptr;
> +
> +	if (vma_meta_lhs->dump_size < vma_meta_rhs->dump_size)
> +		return -1;
> +	if (vma_meta_lhs->dump_size > vma_meta_rhs->dump_size)
> +		return 1;
> +	return 0;
> +}
> +
>  /*
>   * Under the mmap_lock, take a snapshot of relevant information about the task's
>   * VMAs.
> @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
>  		cprm->vma_data_size += m->dump_size;
>  	}
>  
> +	sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
> +		cmp_vma_size, NULL);
> +
>  	return true;
>  }
>
> base-commit: eb5e56d1491297e0881c95824e2050b7c205f0d4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2024-08-10 12:28 ` Eric W. Biederman
@ 2024-08-12 18:05   ` Kees Cook
  2024-08-12 18:21     ` Brian Mak
  0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2024-08-12 18:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Brian Mak, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote:
> Brian Mak <makb@juniper.net> writes:
> 
> > Large cores may be truncated in some scenarios, such as with daemons
> > with stop timeouts that are not large enough or lack of disk space. This
> > impacts debuggability with large core dumps since critical information
> > necessary to form a usable backtrace, such as stacks and shared library
> > information, are omitted.
> >
> > We attempted to figure out which VMAs are needed to create a useful
> > backtrace, and it turned out to be a non-trivial problem. Instead, we
> > try simply sorting the VMAs by size, which has the intended effect.
> >
> > By sorting VMAs by dump size and dumping in that order, we have a
> > simple, yet effective heuristic.
> 
> To make finding the history easier I would include:
> v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net
> v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> As Kees has already picked this up this is quite possibly silly.
> But *shrug* that was when I was out.

I've updated the trailers. Thanks for the review!

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2024-08-12 18:05   ` Kees Cook
@ 2024-08-12 18:21     ` Brian Mak
  2024-08-12 18:25       ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Mak @ 2024-08-12 18:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote

> On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote:
>> Brian Mak <makb@juniper.net> writes:
>> 
>>> Large cores may be truncated in some scenarios, such as with daemons
>>> with stop timeouts that are not large enough or lack of disk space. This
>>> impacts debuggability with large core dumps since critical information
>>> necessary to form a usable backtrace, such as stacks and shared library
>>> information, are omitted.
>>> 
>>> We attempted to figure out which VMAs are needed to create a useful
>>> backtrace, and it turned out to be a non-trivial problem. Instead, we
>>> try simply sorting the VMAs by size, which has the intended effect.
>>> 
>>> By sorting VMAs by dump size and dumping in that order, we have a
>>> simple, yet effective heuristic.
>> 
>> To make finding the history easier I would include:
>> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$
>> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$
>> 
>> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> 
>> As Kees has already picked this up this is quite possibly silly.
>> But *shrug* that was when I was out.
> 
> I've updated the trailers. Thanks for the review!

Hi Kees,

Thanks! I think you added it to the wrong commit though.

Please double check and update accordingly.

Regarding the sioc tests from earlier, I've reached a point where I
think I have a compatible virtual NIC (no more ioctl errors), but it's
giving me some mismatched registers error, causing the test to fail. I
can see this same test failure on a vanilla kernel with my setup, so
this is probably either some environment issue or a bug with rr or the
tests. Since all the other tests pass, I'm just going to leave it at
that.

Best,
Brian Mak




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2024-08-12 18:21     ` Brian Mak
@ 2024-08-12 18:25       ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2024-08-12 18:25 UTC (permalink / raw)
  To: Brian Mak
  Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Mon, Aug 12, 2024 at 06:21:15PM +0000, Brian Mak wrote:
> On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote
> 
> > On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote:
> >> Brian Mak <makb@juniper.net> writes:
> >> 
> >>> Large cores may be truncated in some scenarios, such as with daemons
> >>> with stop timeouts that are not large enough or lack of disk space. This
> >>> impacts debuggability with large core dumps since critical information
> >>> necessary to form a usable backtrace, such as stacks and shared library
> >>> information, are omitted.
> >>> 
> >>> We attempted to figure out which VMAs are needed to create a useful
> >>> backtrace, and it turned out to be a non-trivial problem. Instead, we
> >>> try simply sorting the VMAs by size, which has the intended effect.
> >>> 
> >>> By sorting VMAs by dump size and dumping in that order, we have a
> >>> simple, yet effective heuristic.
> >> 
> >> To make finding the history easier I would include:
> >> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$
> >> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$
> >> 
> >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> 
> >> As Kees has already picked this up this is quite possibly silly.
> >> But *shrug* that was when I was out.
> > 
> > I've updated the trailers. Thanks for the review!
> 
> Hi Kees,
> 
> Thanks! I think you added it to the wrong commit though.

Ugh. Time for more coffee. Thanks; fixed. I need to update my "b4" -- it
was hanging doing the trailers update so I did it myself manually...
That'll teach me. ;)

> tests. Since all the other tests pass, I'm just going to leave it at
> that.

Yeah, I think you're good. Thank you for taking the time to test rr!

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
       [not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net>
                   ` (2 preceding siblings ...)
  2024-08-10 12:28 ` Eric W. Biederman
@ 2025-02-18  8:54 ` Michael Stapelberg
  2025-02-18 19:53   ` Brian Mak
  3 siblings, 1 reply; 21+ messages in thread
From: Michael Stapelberg @ 2025-02-18  8:54 UTC (permalink / raw)
  To: makb
  Cc: brauner, ebiederm, jack, kees, linux-fsdevel, linux-kernel,
	linux-mm, oleg, torvalds, viro

Hey Brian and folks

> […]
> backtrace, and it turned out to be a non-trivial problem. Instead, we
> try simply sorting the VMAs by size, which has the intended effect.
> […]
> Still need to run rr tests on this, per Kees Cook's suggestion, will
> update back once done. GDB and readelf show that this patch works
> without issue though.

I think in your testing, you probably did not try the eu-stack tool
from the elfutils package, because I think I found a bug:

Current elfutils cannot symbolize core dumps created by Linux 6.12+.
I noticed this because systemd-coredump(8) uses elfutils, and when
a program crashed on my machine, syslog did not show function names.

I reported this issue with elfutils at:
https://sourceware.org/bugzilla/show_bug.cgi?id=32713
…but figured it would be good to give a heads-up here, too.

Is this breakage sufficient reason to revert the commit?
Or are we saying userspace just needs to be updated to cope?

Thanks
Best regards
Michael


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-18  8:54 ` Michael Stapelberg
@ 2025-02-18 19:53   ` Brian Mak
  2025-02-19 13:28     ` Sam James
  2025-02-19 16:20     ` Jan Kara
  0 siblings, 2 replies; 21+ messages in thread
From: Brian Mak @ 2025-02-18 19:53 UTC (permalink / raw)
  To: Michael Stapelberg
  Cc: Christian Brauner, Eric W. Biederman, Jan Kara, Kees Cook,
	linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov,
	Linus Torvalds, Alexander Viro

On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote:

> I think in your testing, you probably did not try the eu-stack tool
> from the elfutils package, because I think I found a bug:

Hi Michael,

Thanks for the report. I can confirm that this issue does seem to be
from this commit. I tested it with Juniper's Linux kernel with and
without the changes.

You're correct that the original testing done did not include the
eu-stack tool.

> Current elfutils cannot symbolize core dumps created by Linux 6.12+.
> I noticed this because systemd-coredump(8) uses elfutils, and when
> a program crashed on my machine, syslog did not show function names.
> 
> I reported this issue with elfutils at:
> https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$
> …but figured it would be good to give a heads-up here, too.
> 
> Is this breakage sufficient reason to revert the commit?
> Or are we saying userspace just needs to be updated to cope?

The way I see it is that, as long as we're in compliance with the
applicable ELF specifications, then the issue lies with userspace apps
to ensure that they are not making additional erroneous assumptions.

However, Eric mentioned a while ago in v1 of this patch that he believes
that the ELF specification requires program headers be written in memory
order. Digging through the ELF specifications, I found that any loadable
segment entries in the program header table must be sorted on the
virtual address of the first byte of which the segment resides in
memory.

This indicates that we have deviated from the ELF specification with
this commit. One thing we can do to remedy this is to have program
headers sorted according to the specification, but then continue dumping
in VMA size ordering. This would make the dumping logic significantly
more complex though.

Seeing how most popular userspace apps, with the exception of eu-stack,
seem to work, we could also just leave it, and tell userspace apps to
fix it on their end.

Eric and Kees, thoughts? I'm open to going either way.

Best,
Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-18 19:53   ` Brian Mak
@ 2025-02-19 13:28     ` Sam James
  2025-02-19 16:20     ` Jan Kara
  1 sibling, 0 replies; 21+ messages in thread
From: Sam James @ 2025-02-19 13:28 UTC (permalink / raw)
  To: makb
  Cc: brauner, ebiederm, jack, kees, linux-fsdevel, linux-kernel,
	linux-mm, michael, oleg, torvalds, viro

> Seeing how most popular userspace apps, with the exception of eu-stack,
> seem to work, we could also just leave it, and tell userspace apps to
> fix it on their end.

I'm not sure we do know that most work. 6.12 was released in November,
we're only getting this report about elfutils now. It's not like it's
been years with no complaints.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-18 19:53   ` Brian Mak
  2025-02-19 13:28     ` Sam James
@ 2025-02-19 16:20     ` Jan Kara
  2025-02-19 19:52       ` Kees Cook
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kara @ 2025-02-19 16:20 UTC (permalink / raw)
  To: Brian Mak
  Cc: Michael Stapelberg, Christian Brauner, Eric W. Biederman,
	Jan Kara, Kees Cook, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Linus Torvalds, Alexander Viro

On Tue 18-02-25 19:53:51, Brian Mak wrote:
> On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote:
> 
> > I think in your testing, you probably did not try the eu-stack tool
> > from the elfutils package, because I think I found a bug:
> 
> Hi Michael,
> 
> Thanks for the report. I can confirm that this issue does seem to be
> from this commit. I tested it with Juniper's Linux kernel with and
> without the changes.
> 
> You're correct that the original testing done did not include the
> eu-stack tool.
> 
> > Current elfutils cannot symbolize core dumps created by Linux 6.12+.
> > I noticed this because systemd-coredump(8) uses elfutils, and when
> > a program crashed on my machine, syslog did not show function names.
> > 
> > I reported this issue with elfutils at:
> > https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$
> > …but figured it would be good to give a heads-up here, too.
> > 
> > Is this breakage sufficient reason to revert the commit?
> > Or are we saying userspace just needs to be updated to cope?
> 
> The way I see it is that, as long as we're in compliance with the
> applicable ELF specifications, then the issue lies with userspace apps
> to ensure that they are not making additional erroneous assumptions.
> 
> However, Eric mentioned a while ago in v1 of this patch that he believes
> that the ELF specification requires program headers be written in memory
> order. Digging through the ELF specifications, I found that any loadable
> segment entries in the program header table must be sorted on the
> virtual address of the first byte of which the segment resides in
> memory.
> 
> This indicates that we have deviated from the ELF specification with
> this commit. One thing we can do to remedy this is to have program
> headers sorted according to the specification, but then continue dumping
> in VMA size ordering. This would make the dumping logic significantly
> more complex though.
> 
> Seeing how most popular userspace apps, with the exception of eu-stack,
> seem to work, we could also just leave it, and tell userspace apps to
> fix it on their end.

Well, it does not seem eu-stack is that unpopular and we really try hard to
avoid user visible regressions. So I think we should revert the change. Also
the fact that the patch breaks ELF spec is an indication there may be other
tools that would get confused by this and another reason for a revert...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-19 16:20     ` Jan Kara
@ 2025-02-19 19:52       ` Kees Cook
  2025-02-19 20:38         ` Brian Mak
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Kees Cook @ 2025-02-19 19:52 UTC (permalink / raw)
  To: Jan Kara, Michael Stapelberg, Brian Mak
  Cc: Christian Brauner, Eric W. Biederman, linux-fsdevel,
	linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds,
	Alexander Viro

On Wed, Feb 19, 2025 at 05:20:17PM +0100, Jan Kara wrote:
> On Tue 18-02-25 19:53:51, Brian Mak wrote:
> > On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote:
> > 
> > > I think in your testing, you probably did not try the eu-stack tool
> > > from the elfutils package, because I think I found a bug:
> > 
> > Hi Michael,
> > 
> > Thanks for the report. I can confirm that this issue does seem to be
> > from this commit. I tested it with Juniper's Linux kernel with and
> > without the changes.
> > 
> > You're correct that the original testing done did not include the
> > eu-stack tool.
> > 
> > > Current elfutils cannot symbolize core dumps created by Linux 6.12+.
> > > I noticed this because systemd-coredump(8) uses elfutils, and when
> > > a program crashed on my machine, syslog did not show function names.
> > > 
> > > I reported this issue with elfutils at:
> > > https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$
> > > …but figured it would be good to give a heads-up here, too.
> > > 
> > > Is this breakage sufficient reason to revert the commit?
> > > Or are we saying userspace just needs to be updated to cope?
> > 
> > The way I see it is that, as long as we're in compliance with the
> > applicable ELF specifications, then the issue lies with userspace apps
> > to ensure that they are not making additional erroneous assumptions.
> > 
> > However, Eric mentioned a while ago in v1 of this patch that he believes
> > that the ELF specification requires program headers be written in memory
> > order. Digging through the ELF specifications, I found that any loadable
> > segment entries in the program header table must be sorted on the
> > virtual address of the first byte of which the segment resides in
> > memory.
> > 
> > This indicates that we have deviated from the ELF specification with
> > this commit. One thing we can do to remedy this is to have program
> > headers sorted according to the specification, but then continue dumping
> > in VMA size ordering. This would make the dumping logic significantly
> > more complex though.
> > 
> > Seeing how most popular userspace apps, with the exception of eu-stack,
> > seem to work, we could also just leave it, and tell userspace apps to
> > fix it on their end.
> 
> Well, it does not seem eu-stack is that unpopular and we really try hard to
> avoid user visible regressions. So I think we should revert the change. Also
> the fact that the patch breaks ELF spec is an indication there may be other
> tools that would get confused by this and another reason for a revert...

Yeah, I think we need to make this a tunable. Updating the kernel breaks
elftools, which isn't some weird custom corner case. :P

So, while it took a few months, here is a report of breakage that I said
we'd need to watch for[1]. :)

Is anyone able to test this patch? And Brian will setting a sysctl be
okay for your use-case?


diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a43b78b4b646..35d5d86cff69 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -222,6 +222,17 @@ and ``core_uses_pid`` is set, then .PID will be appended to
 the filename.
 
 
+core_sort_vma
+=============
+
+The default coredump writes VMAs in address order. By setting
+``core_sort_vma`` to 1, VMAs will be written from smallest size
+to largest size. This is known to break at least elfutils, but
+can be handy when dealing with very large (and truncated)
+coredumps where the more useful debugging details are included
+in the smaller VMAs.
+
+
 ctrl-alt-del
 ============
 
diff --git a/fs/coredump.c b/fs/coredump.c
index 591700e1b2ce..4375c70144d0 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -63,6 +63,7 @@ static void free_vma_snapshot(struct coredump_params *cprm);
 
 static int core_uses_pid;
 static unsigned int core_pipe_limit;
+static unsigned int core_sort_vma;
 static char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
@@ -1026,6 +1027,15 @@ static const struct ctl_table coredump_sysctls[] = {
 		.extra1		= (unsigned int *)&core_file_note_size_min,
 		.extra2		= (unsigned int *)&core_file_note_size_max,
 	},
+	{
+		.procname	= "core_sort_vma",
+		.data		= &core_sort_vma,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 };
 
 static int __init init_fs_coredump_sysctls(void)
@@ -1256,8 +1266,9 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 		cprm->vma_data_size += m->dump_size;
 	}
 
-	sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
-		cmp_vma_size, NULL);
+	if (core_sort_vma)
+		sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
+		     cmp_vma_size, NULL);
 
 	return true;
 }



-Kees

[1] https://lore.kernel.org/all/202408092104.FCE51021@keescook/

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-19 19:52       ` Kees Cook
@ 2025-02-19 20:38         ` Brian Mak
  2025-02-22  2:13           ` Brian Mak
  2025-02-20  0:23         ` Brian Mak
  2025-02-20  0:39         ` Linus Torvalds
  2 siblings, 1 reply; 21+ messages in thread
From: Brian Mak @ 2025-02-19 20:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Linus Torvalds, Alexander Viro

On Feb 19, 2025, at 11:52 AM, Kees Cook <kees@kernel.org> wrote:

> Yeah, I think we need to make this a tunable. Updating the kernel breaks
> elftools, which isn't some weird custom corner case. :P
> 
> So, while it took a few months, here is a report of breakage that I said
> we'd need to watch for[1]. :)
> 
> Is anyone able to test this patch? And Brian will setting a sysctl be
> okay for your use-case?

Hi Kees,

Yes, a sysctl tunable would be good here. I can test this patch in the
next day or two.

I will also scratch up a patch to bring us back into compliance with the
ELF specifications, and see if that fixes the userspace breakage with
elfutils, while not breaking gdb or rr.

Thanks,
Brian


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-19 19:52       ` Kees Cook
  2025-02-19 20:38         ` Brian Mak
@ 2025-02-20  0:23         ` Brian Mak
  2025-02-20  0:39         ` Linus Torvalds
  2 siblings, 0 replies; 21+ messages in thread
From: Brian Mak @ 2025-02-20  0:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Linus Torvalds, Alexander Viro

On Feb 19, 2025, at 11:52 AM, Kees Cook <kees@kernel.org> wrote

> Is anyone able to test this patch? And Brian will setting a sysctl be
> okay for your use-case?

Hi Kees,

I've verified that the sysctl tunable works as expected. readelf is
showing that the VMAs are unsorted by default, with the tunable set to
0. When the tunable is set to 1, the VMAs are sorted.

I also verified that the backtrace for unsorted and sorted cores are
viewable in GDB. The backtrace reported by eu-stack shows up fine in
the unsorted case, when attempting to reproduce with Michael's steps.
As expected, I see the same error as Michael in the sorted case, when
reproducing with his steps.

Interestingly enough, in the sorted case, I found that if the crashing
program is not linked statically, eu-stack will work fine. However, if
the crashing program is linked statically, eu-stack will throw an error,
as reported.

Anyway, this patch looks pretty good.

> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index a43b78b4b646..35d5d86cff69 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -222,6 +222,17 @@ and ``core_uses_pid`` is set, then .PID will be appended to
> the filename.
> 
> 
> +core_sort_vma
> +=============
> +
> +The default coredump writes VMAs in address order. By setting
> +``core_sort_vma`` to 1, VMAs will be written from smallest size
> +to largest size. This is known to break at least elfutils, but
> +can be handy when dealing with very large (and truncated)
> +coredumps where the more useful debugging details are included
> +in the smaller VMAs.
> +
> +

Just one comment here, this should go up one entry to maintain
alphabetical ordering.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-19 19:52       ` Kees Cook
  2025-02-19 20:38         ` Brian Mak
  2025-02-20  0:23         ` Brian Mak
@ 2025-02-20  0:39         ` Linus Torvalds
  2025-02-20  1:36           ` Kees Cook
  2 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2025-02-20  0:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Michael Stapelberg, Brian Mak, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Alexander Viro

On Wed, 19 Feb 2025 at 11:52, Kees Cook <kees@kernel.org> wrote:
>
> Yeah, I think we need to make this a tunable. Updating the kernel breaks
> elftools, which isn't some weird custom corner case. :P

I wonder if we could also make the default be "no sorting" if the
vma's are all fairly small...

IOW, only trigger the new behavior when nity actually *matters*.

We already have the code to count how big the core dump is, it's that

                cprm->vma_data_size += m->dump_size;

in dump_vma_snapshot() thing, so I think this could all basically be a
one-liner that does the sort() call only if that vma_data_size is
larger than the core-dump limit, or something like that?

That way, the normal case could basically work for everybody, and the
system tunable would be only for people who want to force a certain
situation.

Something trivial like this (ENTIRELY UNTESTED) patch, perhaps:

  --- a/fs/coredump.c
  +++ b/fs/coredump.c
  @@ -1256,6 +1256,10 @@ static bool dump_vma_snapshot(struct
coredump_params *cprm)
                  cprm->vma_data_size += m->dump_size;
          }

  +       /* Only sort the vmas by size if they don't all fit in the
core dump */
  +       if (cprm->vma_data_size < cprm->limit)
  +               return true;
  +
          sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
                  cmp_vma_size, NULL);

Hmm?

             Linus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-20  0:39         ` Linus Torvalds
@ 2025-02-20  1:36           ` Kees Cook
  2025-02-20 22:59             ` Brian Mak
  0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2025-02-20  1:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Michael Stapelberg, Brian Mak, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Alexander Viro

On Wed, Feb 19, 2025 at 04:39:41PM -0800, Linus Torvalds wrote:
> On Wed, 19 Feb 2025 at 11:52, Kees Cook <kees@kernel.org> wrote:
> >
> > Yeah, I think we need to make this a tunable. Updating the kernel breaks
> > elftools, which isn't some weird custom corner case. :P
> 
> I wonder if we could also make the default be "no sorting" if the
> vma's are all fairly small...
> 
> IOW, only trigger the new behavior when nity actually *matters*.
> 
> We already have the code to count how big the core dump is, it's that
> 
>                 cprm->vma_data_size += m->dump_size;
> 
> in dump_vma_snapshot() thing, so I think this could all basically be a
> one-liner that does the sort() call only if that vma_data_size is
> larger than the core-dump limit, or something like that?
> 
> That way, the normal case could basically work for everybody, and the
> system tunable would be only for people who want to force a certain
> situation.
> 
> Something trivial like this (ENTIRELY UNTESTED) patch, perhaps:
> 
>   --- a/fs/coredump.c
>   +++ b/fs/coredump.c
>   @@ -1256,6 +1256,10 @@ static bool dump_vma_snapshot(struct
> coredump_params *cprm)
>                   cprm->vma_data_size += m->dump_size;
>           }
> 
>   +       /* Only sort the vmas by size if they don't all fit in the
> core dump */
>   +       if (cprm->vma_data_size < cprm->limit)
>   +               return true;
>   +
>           sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
>                   cmp_vma_size, NULL);
> 
> Hmm?

Oh! That's a good idea. In theory, a truncated dump is going to be
traditionally "unusable", so a sort shouldn't hurt tools that are
expecting a complete dump.

Brian, are you able to test this for your case?

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-20  1:36           ` Kees Cook
@ 2025-02-20 22:59             ` Brian Mak
  2025-02-22 15:15               ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Mak @ 2025-02-20 22:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Alexander Viro

On Feb 19, 2025, at 5:36 PM, Kees Cook <kees@kernel.org> wrote:

>> We already have the code to count how big the core dump is, it's that
>> 
>>                cprm->vma_data_size += m->dump_size;
>> 
>> in dump_vma_snapshot() thing, so I think this could all basically be a
>> one-liner that does the sort() call only if that vma_data_size is
>> larger than the core-dump limit, or something like that?
...
> Oh! That's a good idea. In theory, a truncated dump is going to be
> traditionally "unusable", so a sort shouldn't hurt tools that are
> expecting a complete dump.
> 
> Brian, are you able to test this for your case?

Hi Kees and Linus,

I like the idea, but the suggested patch seems to have issues in
practice.

First, the vma_data_size does not include the ELF header, program header
table, notes, etc. That's not a terribly big issue though. We can live
with that since it makes the estimated core dump size *smaller*. An even
bigger problem is that the vma_data_size doesn't take into account the
sparseness of the core dump, while the core dump size limit does.

If a generated core dump is very sparse, the vma_data_size will be much
larger than the actual size on disk of the core dump, triggering the
sorting logic earlier than expected.

One thing we can do though is to iterate through the pages for all VMAs
and see if get_dump_page() returns NULL. Then, we use that information
to calculate a more accurate predicted core dump size.

Patch is below. Thoughts?

Best,
Brian

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a43b78b4b646..dd49a89a62d3 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -212,6 +212,17 @@ pid>/``).
 This value defaults to 0.
 
 
+core_sort_vma
+=============
+
+The default coredump writes VMAs in address order. By setting
+``core_sort_vma`` to 1, VMAs will be written from smallest size
+to largest size. This is known to break at least elfutils, but
+can be handy when dealing with very large (and truncated)
+coredumps where the more useful debugging details are included
+in the smaller VMAs.
+
+
 core_uses_pid
 =============
 
diff --git a/fs/coredump.c b/fs/coredump.c
index 591700e1b2ce..496cc7234aa7 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -63,6 +63,7 @@ static void free_vma_snapshot(struct coredump_params *cprm);
 
 static int core_uses_pid;
 static unsigned int core_pipe_limit;
+static unsigned int core_sort_vma;
 static char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
@@ -1026,6 +1027,15 @@ static const struct ctl_table coredump_sysctls[] = {
 		.extra1		= (unsigned int *)&core_file_note_size_min,
 		.extra2		= (unsigned int *)&core_file_note_size_max,
 	},
+	{
+		.procname	= "core_sort_vma",
+		.data		= &core_sort_vma,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 };
 
 static int __init init_fs_coredump_sysctls(void)
@@ -1204,6 +1214,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 	struct mm_struct *mm = current->mm;
 	VMA_ITERATOR(vmi, mm, 0);
 	int i = 0;
+	size_t sparse_vma_dump_size = 0;
 
 	/*
 	 * Once the stack expansion code is fixed to not change VMA bounds
@@ -1241,6 +1252,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 
 	for (i = 0; i < cprm->vma_count; i++) {
 		struct core_vma_metadata *m = cprm->vma_meta + i;
+		unsigned long addr;
 
 		if (m->dump_size == DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER) {
 			char elfmag[SELFMAG];
@@ -1254,10 +1266,27 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 		}
 
 		cprm->vma_data_size += m->dump_size;
+		sparse_vma_dump_size += m->dump_size;
+
+		/* Subtract zero pages from the sparse_vma_dump_size. */
+		for (addr = m->start; addr < m->start + m->dump_size; addr += PAGE_SIZE) {
+			struct page *page = get_dump_page(addr);
+
+			if (!page)
+				sparse_vma_dump_size -= PAGE_SIZE;
+			else
+				put_page(page);
+		}
 	}
 
-	sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
-		cmp_vma_size, NULL);
+	/*
+	 * Only sort the vmas by size if:
+	 * a) the sysctl is set to do so, or
+	 * b) the vmas don't all fit in within the core dump size limit.
+	 */
+	if (core_sort_vma || sparse_vma_dump_size >= cprm->limit)
+		sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta),
+		     cmp_vma_size, NULL);
 
 	return true;
 }



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-19 20:38         ` Brian Mak
@ 2025-02-22  2:13           ` Brian Mak
  2025-02-22 14:51             ` Kees Cook
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Mak @ 2025-02-22  2:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Linus Torvalds, Alexander Viro

On Feb 19, 2025, at 12:38 PM, Brian Mak <makb@juniper.net> wrote

> I will also scratch up a patch to bring us back into compliance with the
> ELF specifications, and see if that fixes the userspace breakage with
> elfutils, while not breaking gdb or rr.

I did scratch up something for this to fix up the program header
ordering, but it seems eu-stack is still broken, even with the fix. GDB
continues to work fine with the fix.

Given that there's no known utilities that get fixed as a result of the
program header sorting, I'm not sure if it's worth taking the patch.
Maybe we can just proceed with the sysctl + sorting if the core dump
size limit is hit, and leave it at that. Thoughts?

The program header ordering fix is below if someone wants to peek at it.

Best,
Brian

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8054f44d39cf..8cf2bbc3cedf 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2021,6 +2021,7 @@ static int elf_core_dump(struct coredump_params *cprm)
 	struct elf_shdr *shdr4extnum = NULL;
 	Elf_Half e_phnum;
 	elf_addr_t e_shoff;
+	struct elf_phdr *phdrs = NULL;
 
 	/*
 	 * The number of segs are recored into ELF header as 16bit value.
@@ -2084,7 +2085,11 @@ static int elf_core_dump(struct coredump_params *cprm)
 	if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note)))
 		goto end_coredump;
 
-	/* Write program headers for segments dump */
+	phdrs = kvmalloc_array(cprm->vma_count, sizeof(*phdrs), GFP_KERNEL);
+	if (!phdrs)
+		goto end_coredump;
+
+	/* Construct sorted program headers for segments dump */
 	for (i = 0; i < cprm->vma_count; i++) {
 		struct core_vma_metadata *meta = cprm->vma_meta + i;
 		struct elf_phdr phdr;
@@ -2104,8 +2109,14 @@ static int elf_core_dump(struct coredump_params *cprm)
 		if (meta->flags & VM_EXEC)
 			phdr.p_flags |= PF_X;
 		phdr.p_align = ELF_EXEC_PAGESIZE;
+		phdrs[meta->index] = phdr;
+	}
+
+	/* Write program headers for segments dump */
+	for (i = 0; i < cprm->vma_count; i++) {
+		struct elf_phdr *phdr = phdrs + i;
 
-		if (!dump_emit(cprm, &phdr, sizeof(phdr)))
+		if (!dump_emit(cprm, phdr, sizeof(*phdr)))
 			goto end_coredump;
 	}
 
@@ -2140,6 +2151,7 @@ static int elf_core_dump(struct coredump_params *cprm)
 
 end_coredump:
 	free_note_info(&info);
+	kvfree(phdrs);
 	kfree(shdr4extnum);
 	kfree(phdr4note);
 	return has_dumped;
diff --git a/fs/coredump.c b/fs/coredump.c
index 591700e1b2ce..0ddd75c3a914 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1226,6 +1226,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
 	while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
 		struct core_vma_metadata *m = cprm->vma_meta + i;
 
+		m->index = i;
 		m->start = vma->vm_start;
 		m->end = vma->vm_end;
 		m->flags = vma->vm_flags;
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 77e6e195d1d6..cf1b9e53cd1e 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -9,6 +9,7 @@
 
 #ifdef CONFIG_COREDUMP
 struct core_vma_metadata {
+	unsigned int  index;
 	unsigned long start, end;
 	unsigned long flags;
 	unsigned long dump_size;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-22  2:13           ` Brian Mak
@ 2025-02-22 14:51             ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2025-02-22 14:51 UTC (permalink / raw)
  To: Brian Mak
  Cc: Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Linus Torvalds, Alexander Viro

On Sat, Feb 22, 2025 at 02:13:06AM +0000, Brian Mak wrote:
> On Feb 19, 2025, at 12:38 PM, Brian Mak <makb@juniper.net> wrote
> 
> > I will also scratch up a patch to bring us back into compliance with the
> > ELF specifications, and see if that fixes the userspace breakage with
> > elfutils, while not breaking gdb or rr.
> 
> I did scratch up something for this to fix up the program header
> ordering, but it seems eu-stack is still broken, even with the fix. GDB
> continues to work fine with the fix.

Okay, thanks for testing this!

> Given that there's no known utilities that get fixed as a result of the
> program header sorting, I'm not sure if it's worth taking the patch.
> Maybe we can just proceed with the sysctl + sorting if the core dump
> size limit is hit, and leave it at that. Thoughts?

Yeah, I like that this will automatically kick on under the condition
where the coredump will already be unreadable by some tools. And having
the sysctl means it can be enabled for testing, etc.

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores
  2025-02-20 22:59             ` Brian Mak
@ 2025-02-22 15:15               ` Kees Cook
  0 siblings, 0 replies; 21+ messages in thread
From: Kees Cook @ 2025-02-22 15:15 UTC (permalink / raw)
  To: Brian Mak
  Cc: Linus Torvalds, Jan Kara, Michael Stapelberg, Christian Brauner,
	Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm,
	Oleg Nesterov, Alexander Viro

On Thu, Feb 20, 2025 at 10:59:06PM +0000, Brian Mak wrote:
> One thing we can do though is to iterate through the pages for all VMAs
> and see if get_dump_page() returns NULL. Then, we use that information
> to calculate a more accurate predicted core dump size.
> 
> Patch is below. Thoughts?

I've pushed this to -next for a few days of testing, and if it's all
good, I'll send it to Linus next week for -rc5 (and -stable).

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-linus/execve&id=ff41385709f01519a97379ce7671ee4e91e301e1

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-02-22 15:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net>
     [not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com>
2024-08-09 14:39   ` [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores Eric W. Biederman
2024-08-09 15:13     ` Linus Torvalds
     [not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org>
2024-08-10  0:52   ` Brian Mak
2024-08-10  4:06     ` Kees Cook
2024-08-10 12:28 ` Eric W. Biederman
2024-08-12 18:05   ` Kees Cook
2024-08-12 18:21     ` Brian Mak
2024-08-12 18:25       ` Kees Cook
2025-02-18  8:54 ` Michael Stapelberg
2025-02-18 19:53   ` Brian Mak
2025-02-19 13:28     ` Sam James
2025-02-19 16:20     ` Jan Kara
2025-02-19 19:52       ` Kees Cook
2025-02-19 20:38         ` Brian Mak
2025-02-22  2:13           ` Brian Mak
2025-02-22 14:51             ` Kees Cook
2025-02-20  0:23         ` Brian Mak
2025-02-20  0:39         ` Linus Torvalds
2025-02-20  1:36           ` Kees Cook
2025-02-20 22:59             ` Brian Mak
2025-02-22 15:15               ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox