* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores [not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com> @ 2024-08-09 14:39 ` Eric W. Biederman 2024-08-09 15:13 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2024-08-09 14:39 UTC (permalink / raw) To: Linus Torvalds Cc: Brian Mak, Kees Cook, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 6 Aug 2024 at 11:16, Brian Mak <makb@juniper.net> wrote: >> >> @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) >> cprm->vma_data_size += m->dump_size; >> } >> >> + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), >> + cmp_vma_size, NULL); >> + >> return true; >> } > > Hmm. Realistically we only dump core in ELF, and the order of the > segments shouldn't matter. > > But I wonder if we should do this in the ->core_dump() function > itself, in case it would have mattered for other dump formats? > > IOW, instead of being at the bottom of dump_vma_snapshot(), maybe the > sorting should be at the top of elf_core_dump()? > > And yes, in practice I doubt we'll ever have other dump formats, and > no, a.out isn't doing some miraculous comeback either. > > But I bet you didn't test elf_fdpic_core_dump() even if I bet it (a) > works and (b) nobody cares. > > So moving it to the ELF side might be conceptually the right thing to do? > > (Or is there some reason it needs to be done at snapshot time that I > just didn't fully appreciate?) I asked him to perform this at snapshot time. Plus it is obvious at snapshot time that you can change the allocated array, while it is not so obvious in the ->core_dump methods. I would argue that the long term maintainable thing to do is to merge elf_core_dump and elf_fdpic_core_dump and put all of the code in fs/coredump.c Performing the sort at snapshot time avoids introducing one extra reason why the two elf implementations of elf coredumping are different. I did read through the elf fdpic code quickly and it looks like it should just work no matter which order the vma's are dumped in. Just like the other elf coredump code does. My practical concern is that someone has a coredump thing that walks through the program headers and short circuits the walk because it knows the program headers are all written in order. But the only way to find one of those is to just try it. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2024-08-09 14:39 ` [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores Eric W. Biederman @ 2024-08-09 15:13 ` Linus Torvalds 0 siblings, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2024-08-09 15:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Brian Mak, Kees Cook, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov On Fri, 9 Aug 2024 at 07:40, Eric W. Biederman <ebiederm@xmission.com> wrote: > > I asked him to perform this at snapshot time. Plus it is obvious at > snapshot time that you can change the allocated array, while it is > not so obvious in the ->core_dump methods. Fair enough. The days when we supported a.out dumps are obviously long long gone, and aren't coming back. So I am not adamant that it has to be done in the dumper, and I probably just have that historical "we have multiple different dumpers with different rules" mindset that isn't really relevant any more. > I would argue that the long term maintainable thing to do is to > merge elf_core_dump and elf_fdpic_core_dump and put all of the code > in fs/coredump.c I wouldn't object. It's not like there's any foreseeable new core dump format that we'd expect, and the indirection makes the code flow harder to follow. Not that most people look at this code a lot. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <172300808013.2419749.16446009147309523545.b4-ty@kernel.org>]
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores [not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org> @ 2024-08-10 0:52 ` Brian Mak 2024-08-10 4:06 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Brian Mak @ 2024-08-10 0:52 UTC (permalink / raw) To: Kees Cook Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote: > > On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote: >> Large cores may be truncated in some scenarios, such as with daemons >> with stop timeouts that are not large enough or lack of disk space. This >> impacts debuggability with large core dumps since critical information >> necessary to form a usable backtrace, such as stacks and shared library >> information, are omitted. >> >> We attempted to figure out which VMAs are needed to create a useful >> backtrace, and it turned out to be a non-trivial problem. Instead, we >> try simply sorting the VMAs by size, which has the intended effect. >> >> [...] > > While waiting on rr test validation, and since we're at the start of the > dev cycle, I figure let's get this into -next ASAP to see if anything > else pops out. We can drop/revise if there are problems. (And as always, > I will add any Acks/Reviews/etc that show up on the thread.) > > Applied to for-next/execve, thanks! > > [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores > https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$ Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on this. Regarding the rr tests, it was not an easy task to get the environment set up to do this, but I did it and was able to run the tests. The rr tests require a lot of kernel config options and there's no list documenting what's needed anywhere... All the tests pass except for the sioc and sioc-no-syscallbuf tests. However, these test failures are due to an incompatibility with the network adapter I'm using. It seems that it only likes older network adapters. I've switched my virtualized network adapter twice now, and each time, the test gets a bit further than the previous time. Will continue trying different network adapters until something hopefully works. In any case, since this error isn't directly related to my changes and the rest of the tests pass, then I think we can be pretty confident that this change is not breaking rr. Best, Brian Mak > Take care, > > -- > Kees Cook > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2024-08-10 0:52 ` Brian Mak @ 2024-08-10 4:06 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2024-08-10 4:06 UTC (permalink / raw) To: Brian Mak Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds On Sat, Aug 10, 2024 at 12:52:16AM +0000, Brian Mak wrote: > On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote: > > > > On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote: > >> Large cores may be truncated in some scenarios, such as with daemons > >> with stop timeouts that are not large enough or lack of disk space. This > >> impacts debuggability with large core dumps since critical information > >> necessary to form a usable backtrace, such as stacks and shared library > >> information, are omitted. > >> > >> We attempted to figure out which VMAs are needed to create a useful > >> backtrace, and it turned out to be a non-trivial problem. Instead, we > >> try simply sorting the VMAs by size, which has the intended effect. > >> > >> [...] > > > > While waiting on rr test validation, and since we're at the start of the > > dev cycle, I figure let's get this into -next ASAP to see if anything > > else pops out. We can drop/revise if there are problems. (And as always, > > I will add any Acks/Reviews/etc that show up on the thread.) > > > > Applied to for-next/execve, thanks! > > > > [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores > > https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$ > > Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on > this. > > Regarding the rr tests, it was not an easy task to get the environment > set up to do this, but I did it and was able to run the tests. The rr > tests require a lot of kernel config options and there's no list > documenting what's needed anywhere... Thanks for suffering through that! > All the tests pass except for the sioc and sioc-no-syscallbuf tests. > However, these test failures are due to an incompatibility with the > network adapter I'm using. It seems that it only likes older network > adapters. I've switched my virtualized network adapter twice now, and > each time, the test gets a bit further than the previous time. Will > continue trying different network adapters until something hopefully > works. In any case, since this error isn't directly related to my > changes and the rest of the tests pass, then I think we can be pretty > confident that this change is not breaking rr. Perfect! Okay, we'll keep our eyes open for any reports of breakage. :) -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores [not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net> [not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com> [not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org> @ 2024-08-10 12:28 ` Eric W. Biederman 2024-08-12 18:05 ` Kees Cook 2025-02-18 8:54 ` Michael Stapelberg 3 siblings, 1 reply; 21+ messages in thread From: Eric W. Biederman @ 2024-08-10 12:28 UTC (permalink / raw) To: Brian Mak Cc: Kees Cook, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds Brian Mak <makb@juniper.net> writes: > Large cores may be truncated in some scenarios, such as with daemons > with stop timeouts that are not large enough or lack of disk space. This > impacts debuggability with large core dumps since critical information > necessary to form a usable backtrace, such as stacks and shared library > information, are omitted. > > We attempted to figure out which VMAs are needed to create a useful > backtrace, and it turned out to be a non-trivial problem. Instead, we > try simply sorting the VMAs by size, which has the intended effect. > > By sorting VMAs by dump size and dumping in that order, we have a > simple, yet effective heuristic. To make finding the history easier I would include: v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> As Kees has already picked this up this is quite possibly silly. But *shrug* that was when I was out. > Signed-off-by: Brian Mak <makb@juniper.net> > --- > > Hi all, > > Still need to run rr tests on this, per Kees Cook's suggestion, will > update back once done. GDB and readelf show that this patch works > without issue though. > > Thanks, > Brian Mak > > v3: Edited commit message to better convey alternative solution as > non-trivial > > Moved sorting logic to fs/coredump.c to make it in place > > Above edits suggested by Eric Biederman <ebiederm@xmission.com> > > v2: Edited commit message to include more reasoning for sorting VMAs > > Removed conditional VMA sorting with debugfs knob > > Above edits suggested by Eric Biederman <ebiederm@xmission.com> > > fs/coredump.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 7f12ff6ad1d3..33c5ac53ab31 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -18,6 +18,7 @@ > #include <linux/personality.h> > #include <linux/binfmts.h> > #include <linux/coredump.h> > +#include <linux/sort.h> > #include <linux/sched/coredump.h> > #include <linux/sched/signal.h> > #include <linux/sched/task_stack.h> > @@ -1191,6 +1192,18 @@ static void free_vma_snapshot(struct coredump_params *cprm) > } > } > > +static int cmp_vma_size(const void *vma_meta_lhs_ptr, const void *vma_meta_rhs_ptr) > +{ > + const struct core_vma_metadata *vma_meta_lhs = vma_meta_lhs_ptr; > + const struct core_vma_metadata *vma_meta_rhs = vma_meta_rhs_ptr; > + > + if (vma_meta_lhs->dump_size < vma_meta_rhs->dump_size) > + return -1; > + if (vma_meta_lhs->dump_size > vma_meta_rhs->dump_size) > + return 1; > + return 0; > +} > + > /* > * Under the mmap_lock, take a snapshot of relevant information about the task's > * VMAs. > @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) > cprm->vma_data_size += m->dump_size; > } > > + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), > + cmp_vma_size, NULL); > + > return true; > } > > base-commit: eb5e56d1491297e0881c95824e2050b7c205f0d4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2024-08-10 12:28 ` Eric W. Biederman @ 2024-08-12 18:05 ` Kees Cook 2024-08-12 18:21 ` Brian Mak 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2024-08-12 18:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Brian Mak, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: > Brian Mak <makb@juniper.net> writes: > > > Large cores may be truncated in some scenarios, such as with daemons > > with stop timeouts that are not large enough or lack of disk space. This > > impacts debuggability with large core dumps since critical information > > necessary to form a usable backtrace, such as stacks and shared library > > information, are omitted. > > > > We attempted to figure out which VMAs are needed to create a useful > > backtrace, and it turned out to be a non-trivial problem. Instead, we > > try simply sorting the VMAs by size, which has the intended effect. > > > > By sorting VMAs by dump size and dumping in that order, we have a > > simple, yet effective heuristic. > > To make finding the history easier I would include: > v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net > v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > As Kees has already picked this up this is quite possibly silly. > But *shrug* that was when I was out. I've updated the trailers. Thanks for the review! -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2024-08-12 18:05 ` Kees Cook @ 2024-08-12 18:21 ` Brian Mak 2024-08-12 18:25 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Brian Mak @ 2024-08-12 18:21 UTC (permalink / raw) To: Kees Cook Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote > On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: >> Brian Mak <makb@juniper.net> writes: >> >>> Large cores may be truncated in some scenarios, such as with daemons >>> with stop timeouts that are not large enough or lack of disk space. This >>> impacts debuggability with large core dumps since critical information >>> necessary to form a usable backtrace, such as stacks and shared library >>> information, are omitted. >>> >>> We attempted to figure out which VMAs are needed to create a useful >>> backtrace, and it turned out to be a non-trivial problem. Instead, we >>> try simply sorting the VMAs by size, which has the intended effect. >>> >>> By sorting VMAs by dump size and dumping in that order, we have a >>> simple, yet effective heuristic. >> >> To make finding the history easier I would include: >> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$ >> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$ >> >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> >> >> As Kees has already picked this up this is quite possibly silly. >> But *shrug* that was when I was out. > > I've updated the trailers. Thanks for the review! Hi Kees, Thanks! I think you added it to the wrong commit though. Please double check and update accordingly. Regarding the sioc tests from earlier, I've reached a point where I think I have a compatible virtual NIC (no more ioctl errors), but it's giving me some mismatched registers error, causing the test to fail. I can see this same test failure on a vanilla kernel with my setup, so this is probably either some environment issue or a bug with rr or the tests. Since all the other tests pass, I'm just going to leave it at that. Best, Brian Mak ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2024-08-12 18:21 ` Brian Mak @ 2024-08-12 18:25 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2024-08-12 18:25 UTC (permalink / raw) To: Brian Mak Cc: Eric W. Biederman, Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-mm, linux-kernel, Oleg Nesterov, Linus Torvalds On Mon, Aug 12, 2024 at 06:21:15PM +0000, Brian Mak wrote: > On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote > > > On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: > >> Brian Mak <makb@juniper.net> writes: > >> > >>> Large cores may be truncated in some scenarios, such as with daemons > >>> with stop timeouts that are not large enough or lack of disk space. This > >>> impacts debuggability with large core dumps since critical information > >>> necessary to form a usable backtrace, such as stacks and shared library > >>> information, are omitted. > >>> > >>> We attempted to figure out which VMAs are needed to create a useful > >>> backtrace, and it turned out to be a non-trivial problem. Instead, we > >>> try simply sorting the VMAs by size, which has the intended effect. > >>> > >>> By sorting VMAs by dump size and dumping in that order, we have a > >>> simple, yet effective heuristic. > >> > >> To make finding the history easier I would include: > >> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$ > >> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$ > >> > >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> > >> As Kees has already picked this up this is quite possibly silly. > >> But *shrug* that was when I was out. > > > > I've updated the trailers. Thanks for the review! > > Hi Kees, > > Thanks! I think you added it to the wrong commit though. Ugh. Time for more coffee. Thanks; fixed. I need to update my "b4" -- it was hanging doing the trailers update so I did it myself manually... That'll teach me. ;) > tests. Since all the other tests pass, I'm just going to leave it at > that. Yeah, I think you're good. Thank you for taking the time to test rr! -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores [not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net> ` (2 preceding siblings ...) 2024-08-10 12:28 ` Eric W. Biederman @ 2025-02-18 8:54 ` Michael Stapelberg 2025-02-18 19:53 ` Brian Mak 3 siblings, 1 reply; 21+ messages in thread From: Michael Stapelberg @ 2025-02-18 8:54 UTC (permalink / raw) To: makb Cc: brauner, ebiederm, jack, kees, linux-fsdevel, linux-kernel, linux-mm, oleg, torvalds, viro Hey Brian and folks > […] > backtrace, and it turned out to be a non-trivial problem. Instead, we > try simply sorting the VMAs by size, which has the intended effect. > […] > Still need to run rr tests on this, per Kees Cook's suggestion, will > update back once done. GDB and readelf show that this patch works > without issue though. I think in your testing, you probably did not try the eu-stack tool from the elfutils package, because I think I found a bug: Current elfutils cannot symbolize core dumps created by Linux 6.12+. I noticed this because systemd-coredump(8) uses elfutils, and when a program crashed on my machine, syslog did not show function names. I reported this issue with elfutils at: https://sourceware.org/bugzilla/show_bug.cgi?id=32713 …but figured it would be good to give a heads-up here, too. Is this breakage sufficient reason to revert the commit? Or are we saying userspace just needs to be updated to cope? Thanks Best regards Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-18 8:54 ` Michael Stapelberg @ 2025-02-18 19:53 ` Brian Mak 2025-02-19 13:28 ` Sam James 2025-02-19 16:20 ` Jan Kara 0 siblings, 2 replies; 21+ messages in thread From: Brian Mak @ 2025-02-18 19:53 UTC (permalink / raw) To: Michael Stapelberg Cc: Christian Brauner, Eric W. Biederman, Jan Kara, Kees Cook, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote: > I think in your testing, you probably did not try the eu-stack tool > from the elfutils package, because I think I found a bug: Hi Michael, Thanks for the report. I can confirm that this issue does seem to be from this commit. I tested it with Juniper's Linux kernel with and without the changes. You're correct that the original testing done did not include the eu-stack tool. > Current elfutils cannot symbolize core dumps created by Linux 6.12+. > I noticed this because systemd-coredump(8) uses elfutils, and when > a program crashed on my machine, syslog did not show function names. > > I reported this issue with elfutils at: > https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$ > …but figured it would be good to give a heads-up here, too. > > Is this breakage sufficient reason to revert the commit? > Or are we saying userspace just needs to be updated to cope? The way I see it is that, as long as we're in compliance with the applicable ELF specifications, then the issue lies with userspace apps to ensure that they are not making additional erroneous assumptions. However, Eric mentioned a while ago in v1 of this patch that he believes that the ELF specification requires program headers be written in memory order. Digging through the ELF specifications, I found that any loadable segment entries in the program header table must be sorted on the virtual address of the first byte of which the segment resides in memory. This indicates that we have deviated from the ELF specification with this commit. One thing we can do to remedy this is to have program headers sorted according to the specification, but then continue dumping in VMA size ordering. This would make the dumping logic significantly more complex though. Seeing how most popular userspace apps, with the exception of eu-stack, seem to work, we could also just leave it, and tell userspace apps to fix it on their end. Eric and Kees, thoughts? I'm open to going either way. Best, Brian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-18 19:53 ` Brian Mak @ 2025-02-19 13:28 ` Sam James 2025-02-19 16:20 ` Jan Kara 1 sibling, 0 replies; 21+ messages in thread From: Sam James @ 2025-02-19 13:28 UTC (permalink / raw) To: makb Cc: brauner, ebiederm, jack, kees, linux-fsdevel, linux-kernel, linux-mm, michael, oleg, torvalds, viro > Seeing how most popular userspace apps, with the exception of eu-stack, > seem to work, we could also just leave it, and tell userspace apps to > fix it on their end. I'm not sure we do know that most work. 6.12 was released in November, we're only getting this report about elfutils now. It's not like it's been years with no complaints. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-18 19:53 ` Brian Mak 2025-02-19 13:28 ` Sam James @ 2025-02-19 16:20 ` Jan Kara 2025-02-19 19:52 ` Kees Cook 1 sibling, 1 reply; 21+ messages in thread From: Jan Kara @ 2025-02-19 16:20 UTC (permalink / raw) To: Brian Mak Cc: Michael Stapelberg, Christian Brauner, Eric W. Biederman, Jan Kara, Kees Cook, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Tue 18-02-25 19:53:51, Brian Mak wrote: > On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote: > > > I think in your testing, you probably did not try the eu-stack tool > > from the elfutils package, because I think I found a bug: > > Hi Michael, > > Thanks for the report. I can confirm that this issue does seem to be > from this commit. I tested it with Juniper's Linux kernel with and > without the changes. > > You're correct that the original testing done did not include the > eu-stack tool. > > > Current elfutils cannot symbolize core dumps created by Linux 6.12+. > > I noticed this because systemd-coredump(8) uses elfutils, and when > > a program crashed on my machine, syslog did not show function names. > > > > I reported this issue with elfutils at: > > https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$ > > …but figured it would be good to give a heads-up here, too. > > > > Is this breakage sufficient reason to revert the commit? > > Or are we saying userspace just needs to be updated to cope? > > The way I see it is that, as long as we're in compliance with the > applicable ELF specifications, then the issue lies with userspace apps > to ensure that they are not making additional erroneous assumptions. > > However, Eric mentioned a while ago in v1 of this patch that he believes > that the ELF specification requires program headers be written in memory > order. Digging through the ELF specifications, I found that any loadable > segment entries in the program header table must be sorted on the > virtual address of the first byte of which the segment resides in > memory. > > This indicates that we have deviated from the ELF specification with > this commit. One thing we can do to remedy this is to have program > headers sorted according to the specification, but then continue dumping > in VMA size ordering. This would make the dumping logic significantly > more complex though. > > Seeing how most popular userspace apps, with the exception of eu-stack, > seem to work, we could also just leave it, and tell userspace apps to > fix it on their end. Well, it does not seem eu-stack is that unpopular and we really try hard to avoid user visible regressions. So I think we should revert the change. Also the fact that the patch breaks ELF spec is an indication there may be other tools that would get confused by this and another reason for a revert... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-19 16:20 ` Jan Kara @ 2025-02-19 19:52 ` Kees Cook 2025-02-19 20:38 ` Brian Mak ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Kees Cook @ 2025-02-19 19:52 UTC (permalink / raw) To: Jan Kara, Michael Stapelberg, Brian Mak Cc: Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Wed, Feb 19, 2025 at 05:20:17PM +0100, Jan Kara wrote: > On Tue 18-02-25 19:53:51, Brian Mak wrote: > > On Feb 18, 2025, at 12:54 AM, Michael Stapelberg <michael@stapelberg.ch> wrote: > > > > > I think in your testing, you probably did not try the eu-stack tool > > > from the elfutils package, because I think I found a bug: > > > > Hi Michael, > > > > Thanks for the report. I can confirm that this issue does seem to be > > from this commit. I tested it with Juniper's Linux kernel with and > > without the changes. > > > > You're correct that the original testing done did not include the > > eu-stack tool. > > > > > Current elfutils cannot symbolize core dumps created by Linux 6.12+. > > > I noticed this because systemd-coredump(8) uses elfutils, and when > > > a program crashed on my machine, syslog did not show function names. > > > > > > I reported this issue with elfutils at: > > > https://urldefense.com/v3/__https://sourceware.org/bugzilla/show_bug.cgi?id=32713__;!!NEt6yMaO-gk!DbttKuHxkBdrV4Cj9axM3ED6mlBHXeQGY3NVzvfDlthl-K39e9QIrZcwT8iCXLRu0OivWRGgficcD-aCuus$ > > > …but figured it would be good to give a heads-up here, too. > > > > > > Is this breakage sufficient reason to revert the commit? > > > Or are we saying userspace just needs to be updated to cope? > > > > The way I see it is that, as long as we're in compliance with the > > applicable ELF specifications, then the issue lies with userspace apps > > to ensure that they are not making additional erroneous assumptions. > > > > However, Eric mentioned a while ago in v1 of this patch that he believes > > that the ELF specification requires program headers be written in memory > > order. Digging through the ELF specifications, I found that any loadable > > segment entries in the program header table must be sorted on the > > virtual address of the first byte of which the segment resides in > > memory. > > > > This indicates that we have deviated from the ELF specification with > > this commit. One thing we can do to remedy this is to have program > > headers sorted according to the specification, but then continue dumping > > in VMA size ordering. This would make the dumping logic significantly > > more complex though. > > > > Seeing how most popular userspace apps, with the exception of eu-stack, > > seem to work, we could also just leave it, and tell userspace apps to > > fix it on their end. > > Well, it does not seem eu-stack is that unpopular and we really try hard to > avoid user visible regressions. So I think we should revert the change. Also > the fact that the patch breaks ELF spec is an indication there may be other > tools that would get confused by this and another reason for a revert... Yeah, I think we need to make this a tunable. Updating the kernel breaks elftools, which isn't some weird custom corner case. :P So, while it took a few months, here is a report of breakage that I said we'd need to watch for[1]. :) Is anyone able to test this patch? And Brian will setting a sysctl be okay for your use-case? diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index a43b78b4b646..35d5d86cff69 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -222,6 +222,17 @@ and ``core_uses_pid`` is set, then .PID will be appended to the filename. +core_sort_vma +============= + +The default coredump writes VMAs in address order. By setting +``core_sort_vma`` to 1, VMAs will be written from smallest size +to largest size. This is known to break at least elfutils, but +can be handy when dealing with very large (and truncated) +coredumps where the more useful debugging details are included +in the smaller VMAs. + + ctrl-alt-del ============ diff --git a/fs/coredump.c b/fs/coredump.c index 591700e1b2ce..4375c70144d0 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -63,6 +63,7 @@ static void free_vma_snapshot(struct coredump_params *cprm); static int core_uses_pid; static unsigned int core_pipe_limit; +static unsigned int core_sort_vma; static char core_pattern[CORENAME_MAX_SIZE] = "core"; static int core_name_size = CORENAME_MAX_SIZE; unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT; @@ -1026,6 +1027,15 @@ static const struct ctl_table coredump_sysctls[] = { .extra1 = (unsigned int *)&core_file_note_size_min, .extra2 = (unsigned int *)&core_file_note_size_max, }, + { + .procname = "core_sort_vma", + .data = &core_sort_vma, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, }; static int __init init_fs_coredump_sysctls(void) @@ -1256,8 +1266,9 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) cprm->vma_data_size += m->dump_size; } - sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), - cmp_vma_size, NULL); + if (core_sort_vma) + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), + cmp_vma_size, NULL); return true; } -Kees [1] https://lore.kernel.org/all/202408092104.FCE51021@keescook/ -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-19 19:52 ` Kees Cook @ 2025-02-19 20:38 ` Brian Mak 2025-02-22 2:13 ` Brian Mak 2025-02-20 0:23 ` Brian Mak 2025-02-20 0:39 ` Linus Torvalds 2 siblings, 1 reply; 21+ messages in thread From: Brian Mak @ 2025-02-19 20:38 UTC (permalink / raw) To: Kees Cook Cc: Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Feb 19, 2025, at 11:52 AM, Kees Cook <kees@kernel.org> wrote: > Yeah, I think we need to make this a tunable. Updating the kernel breaks > elftools, which isn't some weird custom corner case. :P > > So, while it took a few months, here is a report of breakage that I said > we'd need to watch for[1]. :) > > Is anyone able to test this patch? And Brian will setting a sysctl be > okay for your use-case? Hi Kees, Yes, a sysctl tunable would be good here. I can test this patch in the next day or two. I will also scratch up a patch to bring us back into compliance with the ELF specifications, and see if that fixes the userspace breakage with elfutils, while not breaking gdb or rr. Thanks, Brian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-19 20:38 ` Brian Mak @ 2025-02-22 2:13 ` Brian Mak 2025-02-22 14:51 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Brian Mak @ 2025-02-22 2:13 UTC (permalink / raw) To: Kees Cook Cc: Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Feb 19, 2025, at 12:38 PM, Brian Mak <makb@juniper.net> wrote > I will also scratch up a patch to bring us back into compliance with the > ELF specifications, and see if that fixes the userspace breakage with > elfutils, while not breaking gdb or rr. I did scratch up something for this to fix up the program header ordering, but it seems eu-stack is still broken, even with the fix. GDB continues to work fine with the fix. Given that there's no known utilities that get fixed as a result of the program header sorting, I'm not sure if it's worth taking the patch. Maybe we can just proceed with the sysctl + sorting if the core dump size limit is hit, and leave it at that. Thoughts? The program header ordering fix is below if someone wants to peek at it. Best, Brian diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 8054f44d39cf..8cf2bbc3cedf 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2021,6 +2021,7 @@ static int elf_core_dump(struct coredump_params *cprm) struct elf_shdr *shdr4extnum = NULL; Elf_Half e_phnum; elf_addr_t e_shoff; + struct elf_phdr *phdrs = NULL; /* * The number of segs are recored into ELF header as 16bit value. @@ -2084,7 +2085,11 @@ static int elf_core_dump(struct coredump_params *cprm) if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note))) goto end_coredump; - /* Write program headers for segments dump */ + phdrs = kvmalloc_array(cprm->vma_count, sizeof(*phdrs), GFP_KERNEL); + if (!phdrs) + goto end_coredump; + + /* Construct sorted program headers for segments dump */ for (i = 0; i < cprm->vma_count; i++) { struct core_vma_metadata *meta = cprm->vma_meta + i; struct elf_phdr phdr; @@ -2104,8 +2109,14 @@ static int elf_core_dump(struct coredump_params *cprm) if (meta->flags & VM_EXEC) phdr.p_flags |= PF_X; phdr.p_align = ELF_EXEC_PAGESIZE; + phdrs[meta->index] = phdr; + } + + /* Write program headers for segments dump */ + for (i = 0; i < cprm->vma_count; i++) { + struct elf_phdr *phdr = phdrs + i; - if (!dump_emit(cprm, &phdr, sizeof(phdr))) + if (!dump_emit(cprm, phdr, sizeof(*phdr))) goto end_coredump; } @@ -2140,6 +2151,7 @@ static int elf_core_dump(struct coredump_params *cprm) end_coredump: free_note_info(&info); + kvfree(phdrs); kfree(shdr4extnum); kfree(phdr4note); return has_dumped; diff --git a/fs/coredump.c b/fs/coredump.c index 591700e1b2ce..0ddd75c3a914 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1226,6 +1226,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) { struct core_vma_metadata *m = cprm->vma_meta + i; + m->index = i; m->start = vma->vm_start; m->end = vma->vm_end; m->flags = vma->vm_flags; diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 77e6e195d1d6..cf1b9e53cd1e 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -9,6 +9,7 @@ #ifdef CONFIG_COREDUMP struct core_vma_metadata { + unsigned int index; unsigned long start, end; unsigned long flags; unsigned long dump_size; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-22 2:13 ` Brian Mak @ 2025-02-22 14:51 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2025-02-22 14:51 UTC (permalink / raw) To: Brian Mak Cc: Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Sat, Feb 22, 2025 at 02:13:06AM +0000, Brian Mak wrote: > On Feb 19, 2025, at 12:38 PM, Brian Mak <makb@juniper.net> wrote > > > I will also scratch up a patch to bring us back into compliance with the > > ELF specifications, and see if that fixes the userspace breakage with > > elfutils, while not breaking gdb or rr. > > I did scratch up something for this to fix up the program header > ordering, but it seems eu-stack is still broken, even with the fix. GDB > continues to work fine with the fix. Okay, thanks for testing this! > Given that there's no known utilities that get fixed as a result of the > program header sorting, I'm not sure if it's worth taking the patch. > Maybe we can just proceed with the sysctl + sorting if the core dump > size limit is hit, and leave it at that. Thoughts? Yeah, I like that this will automatically kick on under the condition where the coredump will already be unreadable by some tools. And having the sysctl means it can be enabled for testing, etc. -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-19 19:52 ` Kees Cook 2025-02-19 20:38 ` Brian Mak @ 2025-02-20 0:23 ` Brian Mak 2025-02-20 0:39 ` Linus Torvalds 2 siblings, 0 replies; 21+ messages in thread From: Brian Mak @ 2025-02-20 0:23 UTC (permalink / raw) To: Kees Cook Cc: Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Linus Torvalds, Alexander Viro On Feb 19, 2025, at 11:52 AM, Kees Cook <kees@kernel.org> wrote > Is anyone able to test this patch? And Brian will setting a sysctl be > okay for your use-case? Hi Kees, I've verified that the sysctl tunable works as expected. readelf is showing that the VMAs are unsorted by default, with the tunable set to 0. When the tunable is set to 1, the VMAs are sorted. I also verified that the backtrace for unsorted and sorted cores are viewable in GDB. The backtrace reported by eu-stack shows up fine in the unsorted case, when attempting to reproduce with Michael's steps. As expected, I see the same error as Michael in the sorted case, when reproducing with his steps. Interestingly enough, in the sorted case, I found that if the crashing program is not linked statically, eu-stack will work fine. However, if the crashing program is linked statically, eu-stack will throw an error, as reported. Anyway, this patch looks pretty good. > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index a43b78b4b646..35d5d86cff69 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -222,6 +222,17 @@ and ``core_uses_pid`` is set, then .PID will be appended to > the filename. > > > +core_sort_vma > +============= > + > +The default coredump writes VMAs in address order. By setting > +``core_sort_vma`` to 1, VMAs will be written from smallest size > +to largest size. This is known to break at least elfutils, but > +can be handy when dealing with very large (and truncated) > +coredumps where the more useful debugging details are included > +in the smaller VMAs. > + > + Just one comment here, this should go up one entry to maintain alphabetical ordering. Thanks, Brian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-19 19:52 ` Kees Cook 2025-02-19 20:38 ` Brian Mak 2025-02-20 0:23 ` Brian Mak @ 2025-02-20 0:39 ` Linus Torvalds 2025-02-20 1:36 ` Kees Cook 2 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2025-02-20 0:39 UTC (permalink / raw) To: Kees Cook Cc: Jan Kara, Michael Stapelberg, Brian Mak, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Alexander Viro On Wed, 19 Feb 2025 at 11:52, Kees Cook <kees@kernel.org> wrote: > > Yeah, I think we need to make this a tunable. Updating the kernel breaks > elftools, which isn't some weird custom corner case. :P I wonder if we could also make the default be "no sorting" if the vma's are all fairly small... IOW, only trigger the new behavior when nity actually *matters*. We already have the code to count how big the core dump is, it's that cprm->vma_data_size += m->dump_size; in dump_vma_snapshot() thing, so I think this could all basically be a one-liner that does the sort() call only if that vma_data_size is larger than the core-dump limit, or something like that? That way, the normal case could basically work for everybody, and the system tunable would be only for people who want to force a certain situation. Something trivial like this (ENTIRELY UNTESTED) patch, perhaps: --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1256,6 +1256,10 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) cprm->vma_data_size += m->dump_size; } + /* Only sort the vmas by size if they don't all fit in the core dump */ + if (cprm->vma_data_size < cprm->limit) + return true; + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), cmp_vma_size, NULL); Hmm? Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-20 0:39 ` Linus Torvalds @ 2025-02-20 1:36 ` Kees Cook 2025-02-20 22:59 ` Brian Mak 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2025-02-20 1:36 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Kara, Michael Stapelberg, Brian Mak, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Alexander Viro On Wed, Feb 19, 2025 at 04:39:41PM -0800, Linus Torvalds wrote: > On Wed, 19 Feb 2025 at 11:52, Kees Cook <kees@kernel.org> wrote: > > > > Yeah, I think we need to make this a tunable. Updating the kernel breaks > > elftools, which isn't some weird custom corner case. :P > > I wonder if we could also make the default be "no sorting" if the > vma's are all fairly small... > > IOW, only trigger the new behavior when nity actually *matters*. > > We already have the code to count how big the core dump is, it's that > > cprm->vma_data_size += m->dump_size; > > in dump_vma_snapshot() thing, so I think this could all basically be a > one-liner that does the sort() call only if that vma_data_size is > larger than the core-dump limit, or something like that? > > That way, the normal case could basically work for everybody, and the > system tunable would be only for people who want to force a certain > situation. > > Something trivial like this (ENTIRELY UNTESTED) patch, perhaps: > > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -1256,6 +1256,10 @@ static bool dump_vma_snapshot(struct > coredump_params *cprm) > cprm->vma_data_size += m->dump_size; > } > > + /* Only sort the vmas by size if they don't all fit in the > core dump */ > + if (cprm->vma_data_size < cprm->limit) > + return true; > + > sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), > cmp_vma_size, NULL); > > Hmm? Oh! That's a good idea. In theory, a truncated dump is going to be traditionally "unusable", so a sort shouldn't hurt tools that are expecting a complete dump. Brian, are you able to test this for your case? -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-20 1:36 ` Kees Cook @ 2025-02-20 22:59 ` Brian Mak 2025-02-22 15:15 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Brian Mak @ 2025-02-20 22:59 UTC (permalink / raw) To: Kees Cook Cc: Linus Torvalds, Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Alexander Viro On Feb 19, 2025, at 5:36 PM, Kees Cook <kees@kernel.org> wrote: >> We already have the code to count how big the core dump is, it's that >> >> cprm->vma_data_size += m->dump_size; >> >> in dump_vma_snapshot() thing, so I think this could all basically be a >> one-liner that does the sort() call only if that vma_data_size is >> larger than the core-dump limit, or something like that? ... > Oh! That's a good idea. In theory, a truncated dump is going to be > traditionally "unusable", so a sort shouldn't hurt tools that are > expecting a complete dump. > > Brian, are you able to test this for your case? Hi Kees and Linus, I like the idea, but the suggested patch seems to have issues in practice. First, the vma_data_size does not include the ELF header, program header table, notes, etc. That's not a terribly big issue though. We can live with that since it makes the estimated core dump size *smaller*. An even bigger problem is that the vma_data_size doesn't take into account the sparseness of the core dump, while the core dump size limit does. If a generated core dump is very sparse, the vma_data_size will be much larger than the actual size on disk of the core dump, triggering the sorting logic earlier than expected. One thing we can do though is to iterate through the pages for all VMAs and see if get_dump_page() returns NULL. Then, we use that information to calculate a more accurate predicted core dump size. Patch is below. Thoughts? Best, Brian diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index a43b78b4b646..dd49a89a62d3 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -212,6 +212,17 @@ pid>/``). This value defaults to 0. +core_sort_vma +============= + +The default coredump writes VMAs in address order. By setting +``core_sort_vma`` to 1, VMAs will be written from smallest size +to largest size. This is known to break at least elfutils, but +can be handy when dealing with very large (and truncated) +coredumps where the more useful debugging details are included +in the smaller VMAs. + + core_uses_pid ============= diff --git a/fs/coredump.c b/fs/coredump.c index 591700e1b2ce..496cc7234aa7 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -63,6 +63,7 @@ static void free_vma_snapshot(struct coredump_params *cprm); static int core_uses_pid; static unsigned int core_pipe_limit; +static unsigned int core_sort_vma; static char core_pattern[CORENAME_MAX_SIZE] = "core"; static int core_name_size = CORENAME_MAX_SIZE; unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT; @@ -1026,6 +1027,15 @@ static const struct ctl_table coredump_sysctls[] = { .extra1 = (unsigned int *)&core_file_note_size_min, .extra2 = (unsigned int *)&core_file_note_size_max, }, + { + .procname = "core_sort_vma", + .data = &core_sort_vma, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, }; static int __init init_fs_coredump_sysctls(void) @@ -1204,6 +1214,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) struct mm_struct *mm = current->mm; VMA_ITERATOR(vmi, mm, 0); int i = 0; + size_t sparse_vma_dump_size = 0; /* * Once the stack expansion code is fixed to not change VMA bounds @@ -1241,6 +1252,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) for (i = 0; i < cprm->vma_count; i++) { struct core_vma_metadata *m = cprm->vma_meta + i; + unsigned long addr; if (m->dump_size == DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER) { char elfmag[SELFMAG]; @@ -1254,10 +1266,27 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) } cprm->vma_data_size += m->dump_size; + sparse_vma_dump_size += m->dump_size; + + /* Subtract zero pages from the sparse_vma_dump_size. */ + for (addr = m->start; addr < m->start + m->dump_size; addr += PAGE_SIZE) { + struct page *page = get_dump_page(addr); + + if (!page) + sparse_vma_dump_size -= PAGE_SIZE; + else + put_page(page); + } } - sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), - cmp_vma_size, NULL); + /* + * Only sort the vmas by size if: + * a) the sysctl is set to do so, or + * b) the vmas don't all fit in within the core dump size limit. + */ + if (core_sort_vma || sparse_vma_dump_size >= cprm->limit) + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), + cmp_vma_size, NULL); return true; } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores 2025-02-20 22:59 ` Brian Mak @ 2025-02-22 15:15 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2025-02-22 15:15 UTC (permalink / raw) To: Brian Mak Cc: Linus Torvalds, Jan Kara, Michael Stapelberg, Christian Brauner, Eric W. Biederman, linux-fsdevel, linux-kernel, linux-mm, Oleg Nesterov, Alexander Viro On Thu, Feb 20, 2025 at 10:59:06PM +0000, Brian Mak wrote: > One thing we can do though is to iterate through the pages for all VMAs > and see if get_dump_page() returns NULL. Then, we use that information > to calculate a more accurate predicted core dump size. > > Patch is below. Thoughts? I've pushed this to -next for a few days of testing, and if it's all good, I'll send it to Linus next week for -rc5 (and -stable). https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-linus/execve&id=ff41385709f01519a97379ce7671ee4e91e301e1 -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-02-22 15:15 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net>
[not found] ` <CAHk-=wh_P7UR6RiYmgBDQ4L-kgmmLMziGarLsx_0bUn5vYTJUw@mail.gmail.com>
2024-08-09 14:39 ` [PATCH v3] binfmt_elf: Dump smaller VMAs first in ELF cores Eric W. Biederman
2024-08-09 15:13 ` Linus Torvalds
[not found] ` <172300808013.2419749.16446009147309523545.b4-ty@kernel.org>
2024-08-10 0:52 ` Brian Mak
2024-08-10 4:06 ` Kees Cook
2024-08-10 12:28 ` Eric W. Biederman
2024-08-12 18:05 ` Kees Cook
2024-08-12 18:21 ` Brian Mak
2024-08-12 18:25 ` Kees Cook
2025-02-18 8:54 ` Michael Stapelberg
2025-02-18 19:53 ` Brian Mak
2025-02-19 13:28 ` Sam James
2025-02-19 16:20 ` Jan Kara
2025-02-19 19:52 ` Kees Cook
2025-02-19 20:38 ` Brian Mak
2025-02-22 2:13 ` Brian Mak
2025-02-22 14:51 ` Kees Cook
2025-02-20 0:23 ` Brian Mak
2025-02-20 0:39 ` Linus Torvalds
2025-02-20 1:36 ` Kees Cook
2025-02-20 22:59 ` Brian Mak
2025-02-22 15:15 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox