linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Miles Chen <miles.chen@mediatek.com>
To: Christopher Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Jonathan Corbet <corbet@lwn.net>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>
Subject: Re: [PATCH v2] mm/slub: introduce SLAB_WARN_ON_ERROR
Date: Wed, 30 Jan 2019 09:43:27 +0800	[thread overview]
Message-ID: <1548812607.3832.11.camel@mtkswgap22> (raw)
In-Reply-To: <010001689b25e696-3caebea9-56c2-46eb-bb49-34e504a123ee-000000@email.amazonses.com>

On Tue, 2019-01-29 at 19:46 +0000, Christopher Lameter wrote:
> On Tue, 29 Jan 2019, Miles Chen wrote:
> 
> > a) classic slub issue. e.g., use-after-free, redzone overwritten. It's
> > more efficient to report a issue as soon as slub detects it. (comparing
> > to monitor the log, set a breakpoint, and re-produce the issue). With
> > the coredump file, we can analyze the issue.
> 
> What usually happens is that the systems fails with a strange error
> message. Then the system is rebooted using slub_debug options and the
> issue is reproduced yielding more information about the problem.
> 
> Then you run the scenario again with additional debugging in the subsystem
> that caused the problem.

Thanks your comments and patient.

I now understand the difference between us.
I usually enable CONFIG_SLUB_DEBUG=y, CONFIG_SLUB_DEBUG_ON=y and setup
slub_debug by default and do all tests. (eng mode).
Not hit an issue first, then setup slub_debug and reproduce the issue
again.

CONFIG_SLUB_DEBUG is disabled for products.

> 
> So you are already reproducing the issue because you need to activate
> debugging to get more information. Doing it for the 3rd time is not that
> much more difficult.
> 
> None of your modifications will be active in a production kernel.
> slub_debug must be activated to use it and thus you are already
> reproducing the issue.
> 
> > b) memory corruption issues caused by h/w write. e.g., memory
> > overwritten by a DMA engine. Memory corruptions may or may not related
> > to the slab cache that reports any error. For example: kmalloc-256 or
> > dentry may report the same errors. If we can preserve the the coredump
> > file without any restore/reset processing in slub, we could have more
> > information of this memory corruption.
> 
> If debugging is active then reporting will include the accurate slab cache
> affected. The memory layout is already changing when you enable the
> existing debugging code. None of your code runs without that and thus is
> cannot add a coredump for the prod case without debugging.

I usually set slub_debug by default and get the coredump file.

> > c) memory corruption issues caused by unstable h/w. e.g., bit flipping
> > because of xxxx DRAM die or applying new power settings. It's hard to
> > re-produce this kind of issue and it much easier to tell this kind of
> > issue in the coredump file without any restore/reset processing.
> 
> But then you patch does not help in this situation because the code has to
> be enabled by special  slub debug options.
> 
> 
> > Users can set the option by slub_debug. We can still have the original
> > behavior(keep the system alive) if the option is not set. We can turn on
> > the option when we need the coredump file. (with panic_on_warn is set,
> > of course).
> 
> I think we would need to turn on debugging by default and have your patch
> for this to make sense. We already reproducing the issue multiple times
> for debugging. This patch does not change that.
> 
yes. I turn on the debugging by default. Does that make sense now?

Thanks again for your comments.


  reply	other threads:[~2019-01-30  1:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24  7:00 miles.chen
2019-01-24  7:00 ` miles.chen
2019-01-28 20:29 ` Andrew Morton
2019-01-29  5:46   ` Christopher Lameter
2019-01-29  7:53     ` Miles Chen
2019-01-29 19:46       ` Christopher Lameter
2019-01-30  1:43         ` Miles Chen [this message]
2019-01-29  1:41 ` David Rientjes
2019-01-29  3:45   ` Miles Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1548812607.3832.11.camel@mtkswgap22 \
    --to=miles.chen@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox