From: Edward Chron <echron@arista.com>
To: Qian Cai <cai@lca.pw>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Shakeel Butt <shakeelb@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ivan Delalande <colona@arista.com>
Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information
Date: Tue, 27 Aug 2019 18:13:20 -0700 [thread overview]
Message-ID: <CAM3twVSdxJaEpmWXu2m_F1MxFMB58C6=LWWCDYNn5yT3Ns+0sQ@mail.gmail.com> (raw)
In-Reply-To: <79FC3DA1-47F0-4FFC-A92B-9A7EBCE3F15F@lca.pw>
On Tue, Aug 27, 2019 at 5:50 PM Qian Cai <cai@lca.pw> wrote:
>
>
>
> > On Aug 27, 2019, at 8:23 PM, Edward Chron <echron@arista.com> wrote:
> >
> >
> >
> > On Tue, Aug 27, 2019 at 5:40 AM Qian Cai <cai@lca.pw> wrote:
> > On Mon, 2019-08-26 at 12:36 -0700, Edward Chron wrote:
> > > This patch series provides code that works as a debug option through
> > > debugfs to provide additional controls to limit how much information
> > > gets printed when an OOM event occurs and or optionally print additional
> > > information about slab usage, vmalloc allocations, user process memory
> > > usage, the number of processes / tasks and some summary information
> > > about these tasks (number runable, i/o wait), system information
> > > (#CPUs, Kernel Version and other useful state of the system),
> > > ARP and ND Cache entry information.
> > >
> > > Linux OOM can optionally provide a lot of information, what's missing?
> > > ----------------------------------------------------------------------
> > > Linux provides a variety of detailed information when an OOM event occurs
> > > but has limited options to control how much output is produced. The
> > > system related information is produced unconditionally and limited per
> > > user process information is produced as a default enabled option. The
> > > per user process information may be disabled.
> > >
> > > Slab usage information was recently added and is output only if slab
> > > usage exceeds user memory usage.
> > >
> > > Many OOM events are due to user application memory usage sometimes in
> > > combination with the use of kernel resource usage that exceeds what is
> > > expected memory usage. Detailed information about how memory was being
> > > used when the event occurred may be required to identify the root cause
> > > of the OOM event.
> > >
> > > However, some environments are very large and printing all of the
> > > information about processes, slabs and or vmalloc allocations may
> > > not be feasible. For other environments printing as much information
> > > about these as possible may be needed to root cause OOM events.
> > >
> >
> > For more in-depth analysis of OOM events, people could use kdump to save a
> > vmcore by setting "panic_on_oom", and then use the crash utility to analysis the
> > vmcore which contains pretty much all the information you need.
> >
> > Certainly, this is the ideal. A full system dump would give you the maximum amount of
> > information.
> >
> > Unfortunately some environments may lack space to store the dump,
>
> Kdump usually also support dumping to a remote target via NFS, SSH etc
>
> > let alone the time to dump the storage contents and restart the system. Some
>
> There is also “makedumpfile” that could compress and filter unwanted memory to reduce
> the vmcore size and speed up the dumping process by utilizing multi-threads.
>
> > systems can take many minutes to fully boot up, to reset and reinitialize all the
> > devices. So unfortunately this is not always an option, and we need an OOM Report.
>
> I am not sure how the system needs some minutes to reboot would be relevant for the
> discussion here. The idea is to save a vmcore and it can be analyzed offline even on
> another system as long as it having a matching “vmlinux.".
>
>
If selecting a dump on an OOM event doesn't reboot the system and if
it runs fast enough such
that it doesn't slow processing enough to appreciably effect the
system's responsiveness then
then it would be ideal solution. For some it would be over kill but
since it is an option it is a
choice to consider or not.
next prev parent reply other threads:[~2019-08-28 1:13 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-26 19:36 Edward Chron
2019-08-26 19:36 ` [PATCH 01/10] mm/oom_debug: Add Debug base code Edward Chron
2019-08-27 13:28 ` kbuild test robot
2019-08-26 19:36 ` [PATCH 02/10] mm/oom_debug: Add System State Summary Edward Chron
2019-08-26 19:36 ` [PATCH 03/10] mm/oom_debug: Add Tasks Summary Edward Chron
2019-08-26 19:36 ` [PATCH 04/10] mm/oom_debug: Add ARP and ND Table Summary usage Edward Chron
2019-08-26 19:36 ` [PATCH 05/10] mm/oom_debug: Add Select Slabs Print Edward Chron
2019-08-26 19:36 ` [PATCH 06/10] mm/oom_debug: Add Select Vmalloc Entries Print Edward Chron
2019-08-26 19:36 ` [PATCH 07/10] mm/oom_debug: Add Select Process " Edward Chron
2019-08-26 19:36 ` [PATCH 08/10] mm/oom_debug: Add Slab Select Always Print Enable Edward Chron
2019-08-26 19:36 ` [PATCH 09/10] mm/oom_debug: Add Enhanced Slab Print Information Edward Chron
2019-08-26 19:36 ` [PATCH 10/10] mm/oom_debug: Add Enhanced Process " Edward Chron
2019-08-28 0:21 ` kbuild test robot
2019-08-27 7:15 ` [PATCH 00/10] OOM Debug print selection and additional information Michal Hocko
[not found] ` <5768394f-1511-5b00-f715-c0c5446a2d2a@i-love.sakura.ne.jp>
2019-08-27 10:38 ` Michal Hocko
2019-08-28 1:07 ` Edward Chron
2019-08-28 6:59 ` Michal Hocko
2019-08-28 19:46 ` Edward Chron
2019-08-28 20:18 ` Qian Cai
2019-08-28 21:17 ` Edward Chron
2019-08-28 21:34 ` Qian Cai
2019-08-29 7:11 ` Michal Hocko
[not found] ` <297cf049-d92e-f13a-1386-403553d86401@i-love.sakura.ne.jp>
2019-08-29 11:56 ` Michal Hocko
2019-08-29 15:03 ` Edward Chron
2019-08-29 15:42 ` Qian Cai
2019-08-29 16:09 ` Edward Chron
2019-08-29 18:44 ` Qian Cai
2019-08-29 22:41 ` Edward Chron
2019-08-29 16:17 ` Michal Hocko
2019-08-29 16:35 ` Edward Chron
2019-08-29 15:20 ` Edward Chron
2019-08-27 12:40 ` Qian Cai
2019-08-28 0:23 ` Edward Chron
2019-08-28 0:50 ` Qian Cai
2019-08-28 1:13 ` Edward Chron [this message]
2019-08-28 1:32 ` Qian Cai
2019-08-28 2:47 ` Edward Chron
2019-08-28 7:08 ` Michal Hocko
[not found] ` <2e816b05-7b5b-4bc0-8d38-8415daea920d@i-love.sakura.ne.jp>
2019-08-28 10:32 ` Michal Hocko
[not found] ` <5db2d2bd-645b-8967-849a-0d1de5861742@i-love.sakura.ne.jp>
2019-08-28 11:12 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM3twVSdxJaEpmWXu2m_F1MxFMB58C6=LWWCDYNn5yT3Ns+0sQ@mail.gmail.com' \
--to=echron@arista.com \
--cc=akpm@linux-foundation.org \
--cc=cai@lca.pw \
--cc=colona@arista.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox