linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: rientjes@google.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, hannes@cmpxchg.org, mhocko@kernel.org,
	sgruszka@redhat.com
Subject: Re: [PATCH] mm,page_alloc: Split stall warning and failure warning.
Date: Tue, 18 Apr 2017 20:49:20 +0900	[thread overview]
Message-ID: <201704182049.BIE34837.FJOFOMFOQSLHVt@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <alpine.DEB.2.10.1704171539190.46404@chino.kir.corp.google.com>

David Rientjes wrote:
> On Mon, 10 Apr 2017, Andrew Morton wrote:
> > I interpret __GFP_NOWARN to mean "don't warn about this allocation
> > attempt failing", not "don't warn about anything at all".  It's a very
> > minor issue but yes, methinks that stall warning should still come out.
> > 
> 
> Agreed, and we have found this to be helpful in automated memory stress 
> tests.
> 
> I agree that masking off __GFP_NOWARN and then reporting the gfp_mask to 
> the user is only harmful.  If the allocation stalls vs allocation failure 
> warnings are separated such as you have done, it is easily preventable.
> 
> I have a couple of suggestions for Tetsuo about this patch, though:
> 
>  - We now have show_mem_rs, stall_rs, and nopage_rs.  Ugh.  I think it's
>    better to get rid of show_mem_rs and let warn_alloc_common() not 
>    enforce any ratelimiting at all and leave it to the callers.

Commit aa187507ef8bb317 ("mm: throttle show_mem() from warn_alloc()") says
that show_mem_rs was added because a big part of the output is show_mem()
which can generate a lot of output even on a small machines. Thus, I think
ratelimiting at warn_alloc_common() makes sense for users who want to use
warn_alloc_stall() for reporting stalls.

> 
>  - warn_alloc() is probably better off renamed to warn_alloc_failed()
>    since it enforces __GFP_NOWARN and uses an allocation failure ratelimit 
>    regardless of what the passed text is.

I'm OK to rename warn_alloc() back to warn_alloc_failed() for reporting
allocation failures. Maybe we can remove debug_guardpage_minorder() > 0
check from warn_alloc_failed() anyway.

> 
> It may also be slightly off-topic, but I think it would be useful to print 
> current's pid.  I find printing its parent's pid and comm helpful when 
> using shared libraries, but you may not agree.

I think additional actions such as printing more variables can be controlled
using SystemTap (or IO Visor) hooks as long as triggers and relevant
information are available. For example, running

----------
# stap -DSTP_NO_OVERLOAD=1 -F -g -e 'function gfp_str:string(gfp_flags:long) %{ snprintf(STAP_RETVALUE, MAXSTRINGLEN, "%pGg", &STAP_ARG_gfp_flags); %}
probe kernel.function("warn_alloc") { printk(6, sprintf("MemAlloc gfp=%#x(%s) self=%s/%u parent=%s/%u", $gfp_mask, gfp_str($gfp_mask), execname(), pid(), pexecname(), ppid())); }'
----------

will give us output like below.

----------
[  275.848932] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=systemd/1 parent=swapper/0/0
[  276.434211] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=a.out/3339 parent=a.out/2371
[  276.456524] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=systemd-journal/566 parent=systemd/1
[  276.463857] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=gmain/703 parent=systemd/1
[  276.560590] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=rs:main Q:Reg/1013 parent=systemd/1
[  276.643430] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=tuned/1019 parent=systemd/1
[  276.654054] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=postgres/2220 parent=postgres/1561
[  276.668904] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
[  276.676866] postgres cpuset=/ mems_allowed=0
[  276.679809] CPU: 3 PID: 2220 Comm: postgres Tainted: G           OE   4.11.0-rc7 #217
----------

Thus, passing relevant information as-is

  warn_alloc_stall(gfp_t gfp_mask, nodemask_t *nodemask, unsigned long alloc_start, int order)

rather than via printf() arguments

  warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask, "page allocation stalls for %ums, order:%u", jiffies_to_msecs(jiffies-alloc_start), order);

will give us a lot of flexibility including e.g. ratelimit calling
show_mem() using timers.

If relevant information were available via off-stack memory (e.g. via
"struct task_struct"), kmallocwd-like behavior which allows us to report
all possibly-relevant threads timely (and take actions including e.g.
taking memory snapshots for analysis via commands sent from KVM host
environment if running as a KVM guest as a reaction to kernel messages
sent via netconsole) becomes possible rather than
needlessly-spammable-and-possibly-unreportable after-the-fact stall reports.

> 
> Otherwise, I think this is a good direction.

So, here we got a conflict. Michal thinks this is a pointless code and
David thinks this is a good direction. Michal, can you accept
warn_alloc_stall()/warn_alloc_failed() separation?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-04-18 11:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10 11:58 Tetsuo Handa
2017-04-10 12:39 ` Michal Hocko
2017-04-10 14:23   ` Tetsuo Handa
2017-04-10 22:03 ` Andrew Morton
2017-04-11  7:15   ` Michal Hocko
2017-04-11 11:43     ` Tetsuo Handa
2017-04-11 11:54       ` Michal Hocko
2017-04-11 13:26         ` Tetsuo Handa
2017-04-17 22:48   ` David Rientjes
2017-04-18 11:49     ` Tetsuo Handa [this message]
2017-04-18 12:14       ` Michal Hocko
2017-04-18 21:47       ` David Rientjes
2017-04-19 11:13         ` Michal Hocko
2017-04-19 13:22           ` Stanislaw Gruszka
2017-04-19 13:33             ` Michal Hocko
2017-04-22  8:10               ` Stanislaw Gruszka
2017-04-24  8:42                 ` Michal Hocko
2017-04-24 13:06                   ` Stanislaw Gruszka
2017-04-24 15:06                     ` Tetsuo Handa
2017-04-25  6:36                       ` Stanislaw Gruszka
2017-04-19 22:34             ` David Rientjes
2017-04-20 11:46         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201704182049.BIE34837.FJOFOMFOQSLHVt@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox