From: Qian Cai <cai@lca.pw>
To: Edward Chron <echron@arista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Shakeel Butt <shakeelb@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ivan Delalande <colona@arista.com>
Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information
Date: Tue, 27 Aug 2019 21:32:41 -0400 [thread overview]
Message-ID: <2A1D8FFC-9E9E-4D86-9A0E-28F8263CC508@lca.pw> (raw)
In-Reply-To: <CAM3twVSdxJaEpmWXu2m_F1MxFMB58C6=LWWCDYNn5yT3Ns+0sQ@mail.gmail.com>
> On Aug 27, 2019, at 9:13 PM, Edward Chron <echron@arista.com> wrote:
>
> On Tue, Aug 27, 2019 at 5:50 PM Qian Cai <cai@lca.pw> wrote:
>>
>>
>>
>>> On Aug 27, 2019, at 8:23 PM, Edward Chron <echron@arista.com> wrote:
>>>
>>>
>>>
>>> On Tue, Aug 27, 2019 at 5:40 AM Qian Cai <cai@lca.pw> wrote:
>>> On Mon, 2019-08-26 at 12:36 -0700, Edward Chron wrote:
>>>> This patch series provides code that works as a debug option through
>>>> debugfs to provide additional controls to limit how much information
>>>> gets printed when an OOM event occurs and or optionally print additional
>>>> information about slab usage, vmalloc allocations, user process memory
>>>> usage, the number of processes / tasks and some summary information
>>>> about these tasks (number runable, i/o wait), system information
>>>> (#CPUs, Kernel Version and other useful state of the system),
>>>> ARP and ND Cache entry information.
>>>>
>>>> Linux OOM can optionally provide a lot of information, what's missing?
>>>> ----------------------------------------------------------------------
>>>> Linux provides a variety of detailed information when an OOM event occurs
>>>> but has limited options to control how much output is produced. The
>>>> system related information is produced unconditionally and limited per
>>>> user process information is produced as a default enabled option. The
>>>> per user process information may be disabled.
>>>>
>>>> Slab usage information was recently added and is output only if slab
>>>> usage exceeds user memory usage.
>>>>
>>>> Many OOM events are due to user application memory usage sometimes in
>>>> combination with the use of kernel resource usage that exceeds what is
>>>> expected memory usage. Detailed information about how memory was being
>>>> used when the event occurred may be required to identify the root cause
>>>> of the OOM event.
>>>>
>>>> However, some environments are very large and printing all of the
>>>> information about processes, slabs and or vmalloc allocations may
>>>> not be feasible. For other environments printing as much information
>>>> about these as possible may be needed to root cause OOM events.
>>>>
>>>
>>> For more in-depth analysis of OOM events, people could use kdump to save a
>>> vmcore by setting "panic_on_oom", and then use the crash utility to analysis the
>>> vmcore which contains pretty much all the information you need.
>>>
>>> Certainly, this is the ideal. A full system dump would give you the maximum amount of
>>> information.
>>>
>>> Unfortunately some environments may lack space to store the dump,
>>
>> Kdump usually also support dumping to a remote target via NFS, SSH etc
>>
>>> let alone the time to dump the storage contents and restart the system. Some
>>
>> There is also “makedumpfile” that could compress and filter unwanted memory to reduce
>> the vmcore size and speed up the dumping process by utilizing multi-threads.
>>
>>> systems can take many minutes to fully boot up, to reset and reinitialize all the
>>> devices. So unfortunately this is not always an option, and we need an OOM Report.
>>
>> I am not sure how the system needs some minutes to reboot would be relevant for the
>> discussion here. The idea is to save a vmcore and it can be analyzed offline even on
>> another system as long as it having a matching “vmlinux.".
>>
>>
>
> If selecting a dump on an OOM event doesn't reboot the system and if
> it runs fast enough such
> that it doesn't slow processing enough to appreciably effect the
> system's responsiveness then
> then it would be ideal solution. For some it would be over kill but
> since it is an option it is a
> choice to consider or not.
It sounds like you are looking for more of this,
https://github.com/iovisor/bcc/blob/master/tools/oomkill.py
next prev parent reply other threads:[~2019-08-28 1:32 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-26 19:36 Edward Chron
2019-08-26 19:36 ` [PATCH 01/10] mm/oom_debug: Add Debug base code Edward Chron
2019-08-27 13:28 ` kbuild test robot
2019-08-26 19:36 ` [PATCH 02/10] mm/oom_debug: Add System State Summary Edward Chron
2019-08-26 19:36 ` [PATCH 03/10] mm/oom_debug: Add Tasks Summary Edward Chron
2019-08-26 19:36 ` [PATCH 04/10] mm/oom_debug: Add ARP and ND Table Summary usage Edward Chron
2019-08-26 19:36 ` [PATCH 05/10] mm/oom_debug: Add Select Slabs Print Edward Chron
2019-08-26 19:36 ` [PATCH 06/10] mm/oom_debug: Add Select Vmalloc Entries Print Edward Chron
2019-08-26 19:36 ` [PATCH 07/10] mm/oom_debug: Add Select Process " Edward Chron
2019-08-26 19:36 ` [PATCH 08/10] mm/oom_debug: Add Slab Select Always Print Enable Edward Chron
2019-08-26 19:36 ` [PATCH 09/10] mm/oom_debug: Add Enhanced Slab Print Information Edward Chron
2019-08-26 19:36 ` [PATCH 10/10] mm/oom_debug: Add Enhanced Process " Edward Chron
2019-08-28 0:21 ` kbuild test robot
2019-08-27 7:15 ` [PATCH 00/10] OOM Debug print selection and additional information Michal Hocko
[not found] ` <5768394f-1511-5b00-f715-c0c5446a2d2a@i-love.sakura.ne.jp>
2019-08-27 10:38 ` Michal Hocko
2019-08-28 1:07 ` Edward Chron
2019-08-28 6:59 ` Michal Hocko
2019-08-28 19:46 ` Edward Chron
2019-08-28 20:18 ` Qian Cai
2019-08-28 21:17 ` Edward Chron
2019-08-28 21:34 ` Qian Cai
2019-08-29 7:11 ` Michal Hocko
[not found] ` <297cf049-d92e-f13a-1386-403553d86401@i-love.sakura.ne.jp>
2019-08-29 11:56 ` Michal Hocko
2019-08-29 15:03 ` Edward Chron
2019-08-29 15:42 ` Qian Cai
2019-08-29 16:09 ` Edward Chron
2019-08-29 18:44 ` Qian Cai
2019-08-29 22:41 ` Edward Chron
2019-08-29 16:17 ` Michal Hocko
2019-08-29 16:35 ` Edward Chron
2019-08-29 15:20 ` Edward Chron
2019-08-27 12:40 ` Qian Cai
2019-08-28 0:23 ` Edward Chron
2019-08-28 0:50 ` Qian Cai
2019-08-28 1:13 ` Edward Chron
2019-08-28 1:32 ` Qian Cai [this message]
2019-08-28 2:47 ` Edward Chron
2019-08-28 7:08 ` Michal Hocko
[not found] ` <2e816b05-7b5b-4bc0-8d38-8415daea920d@i-love.sakura.ne.jp>
2019-08-28 10:32 ` Michal Hocko
[not found] ` <5db2d2bd-645b-8967-849a-0d1de5861742@i-love.sakura.ne.jp>
2019-08-28 11:12 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2A1D8FFC-9E9E-4D86-9A0E-28F8263CC508@lca.pw \
--to=cai@lca.pw \
--cc=akpm@linux-foundation.org \
--cc=colona@arista.com \
--cc=echron@arista.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox