From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Roman Kisel <romank@linux.microsoft.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
akpm@linux-foundation.org, apais@linux.microsoft.com,
ardb@kernel.org, brauner@kernel.org, jack@suse.cz,
keescook@chromium.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
nagvijay@microsoft.com, oleg@redhat.com, tandersen@netflix.com,
vincent.whitchurch@axis.com, viro@zeniv.linux.org.uk,
apais@microsoft.com, ssengar@microsoft.com,
sunilmut@microsoft.com, vdso@hexbites.dev
Subject: Re: [PATCH 1/1] binfmt_elf, coredump: Log the reason of the failed core dumps
Date: Tue, 18 Jun 2024 16:21:09 -0500 [thread overview]
Message-ID: <87sexakkvu.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <c4644f2c-fad3-4d98-8301-acdc0ff2f3a6@linux.microsoft.com> (Roman Kisel's message of "Tue, 18 Jun 2024 09:30:22 -0700")
Roman Kisel <romank@linux.microsoft.com> writes:
> On 6/17/2024 11:18 PM, Sebastian Andrzej Siewior wrote:
>> On 2024-06-17 16:41:30 [-0700], Roman Kisel wrote:
>>> Missing, failed, or corrupted core dumps might impede crash
>>> investigations. To improve reliability of that process and consequently
>>> the programs themselves, one needs to trace the path from producing
>>> a core dumpfile to analyzing it. That path starts from the core dump file
>>> written to the disk by the kernel or to the standard input of a user
>>> mode helper program to which the kernel streams the coredump contents.
>>> There are cases where the kernel will interrupt writing the core out or
>>> produce a truncated/not-well-formed core dump.
>> How much of this happened and how much of this is just "let me handle
>> everything that could go wrong".
> Some of that must be happening as there are truncated dump files. Haven't run
> the logging code at large scale yet with the systems being stressed a lot by the
> customer workloads to hit all edge cases. Sent the changes to the kernel mail
> list out of abundance of caution first, and being ecstatic about that: on the
> other thread Kees noticed I didn't use the ratelimited logging. That has
> absolutely made me day and whole week, just glowing :) Might've been a close
> call due to something in a crash loop.
Another reason you could have truncated coredumps is the coredumping
process being killed.
I suspect if you want reasons why the coredump is truncated you are
going to want to instrument dump_interrupted, dump_skip and dump_emit
rather than their callers. As they don't actually report why the
failed.
Are you using systemd-coredump? Or another pipe based coredump
collector? It might be the dump collector is truncating things.
Do you know if your application uses io_uring? There were some weird
issues with io_uring and coredumps that were causing things to get
truncation at one point. As I recall a hack was put in the coredump
code so that it worked but maybe there is another odd case that still
needs to be handled.
>
> I think it'd be fair to say that I am asking to please "let me handle (log)
> everything that could go wrong", ratelimited, as these error cases are present
> in the code, and logging can give a clue why the core dump collection didn't
> succeed and what one would need to explore to increase reliability of the
> system.
If you are looking for reasons you definitely want to instrument
fs/coredump.c much more than fs/binfmt_elf.c. As fs/coredump.c is the
code that actually performs the writes.
One of these days if someone is ambitious we should probably merge the
coredump code from fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c and just
hardcode the coredump code to always produce an elf format coredump.
Just for the simplicity of it all.
Eric
next prev parent reply other threads:[~2024-06-18 21:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 23:41 [PATCH 0/1] " Roman Kisel
2024-06-17 23:41 ` [PATCH 1/1] " Roman Kisel
2024-06-17 23:52 ` Kees Cook
2024-06-18 15:49 ` Roman Kisel
2024-06-18 6:18 ` Sebastian Andrzej Siewior
2024-06-18 16:30 ` Roman Kisel
2024-06-18 21:21 ` Eric W. Biederman [this message]
2024-06-20 19:10 ` Roman Kisel
2024-06-18 10:54 ` kernel test robot
2024-06-18 11:31 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sexakkvu.fsf@email.froward.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=apais@microsoft.com \
--cc=ardb@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=keescook@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nagvijay@microsoft.com \
--cc=oleg@redhat.com \
--cc=romank@linux.microsoft.com \
--cc=ssengar@microsoft.com \
--cc=sunilmut@microsoft.com \
--cc=tandersen@netflix.com \
--cc=vdso@hexbites.dev \
--cc=vincent.whitchurch@axis.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox