linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Edward Chron <echron@arista.com>
Cc: Qian Cai <cai@lca.pw>, Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Ivan Delalande <colona@arista.com>
Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information
Date: Wed, 28 Aug 2019 09:08:45 +0200	[thread overview]
Message-ID: <20190828070845.GC7386@dhcp22.suse.cz> (raw)
In-Reply-To: <CAM3twVR5TVuuZSLM2qRJYnkCEKVZmA3XDNREaB+wdKH2Ne9vVA@mail.gmail.com>

On Tue 27-08-19 19:47:22, Edward Chron wrote:
> On Tue, Aug 27, 2019 at 6:32 PM Qian Cai <cai@lca.pw> wrote:
> >
> >
> >
> > > On Aug 27, 2019, at 9:13 PM, Edward Chron <echron@arista.com> wrote:
> > >
> > > On Tue, Aug 27, 2019 at 5:50 PM Qian Cai <cai@lca.pw> wrote:
> > >>
> > >>
> > >>
> > >>> On Aug 27, 2019, at 8:23 PM, Edward Chron <echron@arista.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Aug 27, 2019 at 5:40 AM Qian Cai <cai@lca.pw> wrote:
> > >>> On Mon, 2019-08-26 at 12:36 -0700, Edward Chron wrote:
> > >>>> This patch series provides code that works as a debug option through
> > >>>> debugfs to provide additional controls to limit how much information
> > >>>> gets printed when an OOM event occurs and or optionally print additional
> > >>>> information about slab usage, vmalloc allocations, user process memory
> > >>>> usage, the number of processes / tasks and some summary information
> > >>>> about these tasks (number runable, i/o wait), system information
> > >>>> (#CPUs, Kernel Version and other useful state of the system),
> > >>>> ARP and ND Cache entry information.
> > >>>>
> > >>>> Linux OOM can optionally provide a lot of information, what's missing?
> > >>>> ----------------------------------------------------------------------
> > >>>> Linux provides a variety of detailed information when an OOM event occurs
> > >>>> but has limited options to control how much output is produced. The
> > >>>> system related information is produced unconditionally and limited per
> > >>>> user process information is produced as a default enabled option. The
> > >>>> per user process information may be disabled.
> > >>>>
> > >>>> Slab usage information was recently added and is output only if slab
> > >>>> usage exceeds user memory usage.
> > >>>>
> > >>>> Many OOM events are due to user application memory usage sometimes in
> > >>>> combination with the use of kernel resource usage that exceeds what is
> > >>>> expected memory usage. Detailed information about how memory was being
> > >>>> used when the event occurred may be required to identify the root cause
> > >>>> of the OOM event.
> > >>>>
> > >>>> However, some environments are very large and printing all of the
> > >>>> information about processes, slabs and or vmalloc allocations may
> > >>>> not be feasible. For other environments printing as much information
> > >>>> about these as possible may be needed to root cause OOM events.
> > >>>>
> > >>>
> > >>> For more in-depth analysis of OOM events, people could use kdump to save a
> > >>> vmcore by setting "panic_on_oom", and then use the crash utility to analysis the
> > >>> vmcore which contains pretty much all the information you need.
> > >>>
> > >>> Certainly, this is the ideal. A full system dump would give you the maximum amount of
> > >>> information.
> > >>>
> > >>> Unfortunately some environments may lack space to store the dump,
> > >>
> > >> Kdump usually also support dumping to a remote target via NFS, SSH etc
> > >>
> > >>> let alone the time to dump the storage contents and restart the system. Some
> > >>
> > >> There is also “makedumpfile” that could compress and filter unwanted memory to reduce
> > >> the vmcore size and speed up the dumping process by utilizing multi-threads.
> > >>
> > >>> systems can take many minutes to fully boot up, to reset and reinitialize all the
> > >>> devices. So unfortunately this is not always an option, and we need an OOM Report.
> > >>
> > >> I am not sure how the system needs some minutes to reboot would be relevant  for the
> > >> discussion here. The idea is to save a vmcore and it can be analyzed offline even on
> > >> another system as long as it having a matching “vmlinux.".
> > >>
> > >>
> > >
> > > If selecting a dump on an OOM event doesn't reboot the system and if
> > > it runs fast enough such
> > > that it doesn't slow processing enough to appreciably effect the
> > > system's responsiveness then
> > > then it would be ideal solution. For some it would be over kill but
> > > since it is an option it is a
> > > choice to consider or not.
> >
> > It sounds like you are looking for more of this,
> 
> If you want to supplement the OOM Report and keep the information
> together than you could use EBPF to do that. If that really is the
> preference it might make sense to put the entire report as an EBPF
> script than you can modify the script however you choose. That would
> be very flexible. You can change your configuration on the fly. As
> long as it has access to everything you need it should work.
> 
> Michal would know what direction OOM is headed and if he thinks that fits with
> where things are headed.

It seems we have landed in the similar thinking here. As mentioned in my
earlier email in this thread I can see the extensibility to be achieved
by eBPF. Essentially we would have a base form of the oom report like
now and scripts would then hook in there to provide whatever a specific
usecase needs. My practical experience with eBPF is close to zero so I
have no idea how that would actually work out though.

[...]
> For production systems installing and updating EBPF scripts may someday
> be very common, but I wonder how data center managers feel about it now?
> Developers are very excited about it and it is a very powerful tool but can I
> get permission to add or replace an existing EBPF on production systems?

I am not sure I understand. There must be somebody trusted to take care
of systems, right?
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-08-28  7:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 19:36 Edward Chron
2019-08-26 19:36 ` [PATCH 01/10] mm/oom_debug: Add Debug base code Edward Chron
2019-08-27 13:28   ` kbuild test robot
2019-08-26 19:36 ` [PATCH 02/10] mm/oom_debug: Add System State Summary Edward Chron
2019-08-26 19:36 ` [PATCH 03/10] mm/oom_debug: Add Tasks Summary Edward Chron
2019-08-26 19:36 ` [PATCH 04/10] mm/oom_debug: Add ARP and ND Table Summary usage Edward Chron
2019-08-26 19:36 ` [PATCH 05/10] mm/oom_debug: Add Select Slabs Print Edward Chron
2019-08-26 19:36 ` [PATCH 06/10] mm/oom_debug: Add Select Vmalloc Entries Print Edward Chron
2019-08-26 19:36 ` [PATCH 07/10] mm/oom_debug: Add Select Process " Edward Chron
2019-08-26 19:36 ` [PATCH 08/10] mm/oom_debug: Add Slab Select Always Print Enable Edward Chron
2019-08-26 19:36 ` [PATCH 09/10] mm/oom_debug: Add Enhanced Slab Print Information Edward Chron
2019-08-26 19:36 ` [PATCH 10/10] mm/oom_debug: Add Enhanced Process " Edward Chron
2019-08-28  0:21   ` kbuild test robot
2019-08-27  7:15 ` [PATCH 00/10] OOM Debug print selection and additional information Michal Hocko
     [not found]   ` <5768394f-1511-5b00-f715-c0c5446a2d2a@i-love.sakura.ne.jp>
2019-08-27 10:38     ` Michal Hocko
2019-08-28  1:07   ` Edward Chron
2019-08-28  6:59     ` Michal Hocko
2019-08-28 19:46       ` Edward Chron
2019-08-28 20:18         ` Qian Cai
2019-08-28 21:17           ` Edward Chron
2019-08-28 21:34             ` Qian Cai
2019-08-29  7:11         ` Michal Hocko
     [not found]           ` <297cf049-d92e-f13a-1386-403553d86401@i-love.sakura.ne.jp>
2019-08-29 11:56             ` Michal Hocko
2019-08-29 15:03               ` Edward Chron
2019-08-29 15:42                 ` Qian Cai
2019-08-29 16:09                   ` Edward Chron
2019-08-29 18:44                     ` Qian Cai
2019-08-29 22:41                       ` Edward Chron
2019-08-29 16:17                 ` Michal Hocko
2019-08-29 16:35                   ` Edward Chron
2019-08-29 15:20           ` Edward Chron
2019-08-27 12:40 ` Qian Cai
2019-08-28  0:23   ` Edward Chron
2019-08-28  0:50     ` Qian Cai
2019-08-28  1:13       ` Edward Chron
2019-08-28  1:32         ` Qian Cai
2019-08-28  2:47           ` Edward Chron
2019-08-28  7:08             ` Michal Hocko [this message]
     [not found]               ` <2e816b05-7b5b-4bc0-8d38-8415daea920d@i-love.sakura.ne.jp>
2019-08-28 10:32                 ` Michal Hocko
     [not found]                   ` <5db2d2bd-645b-8967-849a-0d1de5861742@i-love.sakura.ne.jp>
2019-08-28 11:12                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190828070845.GC7386@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=colona@arista.com \
    --cc=echron@arista.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox