From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: rientjes@google.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, hannes@cmpxchg.org, mhocko@kernel.org,
sgruszka@redhat.com
Subject: Re: [PATCH] mm,page_alloc: Split stall warning and failure warning.
Date: Tue, 18 Apr 2017 20:49:20 +0900 [thread overview]
Message-ID: <201704182049.BIE34837.FJOFOMFOQSLHVt@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <alpine.DEB.2.10.1704171539190.46404@chino.kir.corp.google.com>
David Rientjes wrote:
> On Mon, 10 Apr 2017, Andrew Morton wrote:
> > I interpret __GFP_NOWARN to mean "don't warn about this allocation
> > attempt failing", not "don't warn about anything at all". It's a very
> > minor issue but yes, methinks that stall warning should still come out.
> >
>
> Agreed, and we have found this to be helpful in automated memory stress
> tests.
>
> I agree that masking off __GFP_NOWARN and then reporting the gfp_mask to
> the user is only harmful. If the allocation stalls vs allocation failure
> warnings are separated such as you have done, it is easily preventable.
>
> I have a couple of suggestions for Tetsuo about this patch, though:
>
> - We now have show_mem_rs, stall_rs, and nopage_rs. Ugh. I think it's
> better to get rid of show_mem_rs and let warn_alloc_common() not
> enforce any ratelimiting at all and leave it to the callers.
Commit aa187507ef8bb317 ("mm: throttle show_mem() from warn_alloc()") says
that show_mem_rs was added because a big part of the output is show_mem()
which can generate a lot of output even on a small machines. Thus, I think
ratelimiting at warn_alloc_common() makes sense for users who want to use
warn_alloc_stall() for reporting stalls.
>
> - warn_alloc() is probably better off renamed to warn_alloc_failed()
> since it enforces __GFP_NOWARN and uses an allocation failure ratelimit
> regardless of what the passed text is.
I'm OK to rename warn_alloc() back to warn_alloc_failed() for reporting
allocation failures. Maybe we can remove debug_guardpage_minorder() > 0
check from warn_alloc_failed() anyway.
>
> It may also be slightly off-topic, but I think it would be useful to print
> current's pid. I find printing its parent's pid and comm helpful when
> using shared libraries, but you may not agree.
I think additional actions such as printing more variables can be controlled
using SystemTap (or IO Visor) hooks as long as triggers and relevant
information are available. For example, running
----------
# stap -DSTP_NO_OVERLOAD=1 -F -g -e 'function gfp_str:string(gfp_flags:long) %{ snprintf(STAP_RETVALUE, MAXSTRINGLEN, "%pGg", &STAP_ARG_gfp_flags); %}
probe kernel.function("warn_alloc") { printk(6, sprintf("MemAlloc gfp=%#x(%s) self=%s/%u parent=%s/%u", $gfp_mask, gfp_str($gfp_mask), execname(), pid(), pexecname(), ppid())); }'
----------
will give us output like below.
----------
[ 275.848932] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=systemd/1 parent=swapper/0/0
[ 276.434211] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=a.out/3339 parent=a.out/2371
[ 276.456524] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=systemd-journal/566 parent=systemd/1
[ 276.463857] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=gmain/703 parent=systemd/1
[ 276.560590] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=rs:main Q:Reg/1013 parent=systemd/1
[ 276.643430] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=tuned/1019 parent=systemd/1
[ 276.654054] MemAlloc gfp=0x142134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) self=postgres/2220 parent=postgres/1561
[ 276.668904] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
[ 276.676866] postgres cpuset=/ mems_allowed=0
[ 276.679809] CPU: 3 PID: 2220 Comm: postgres Tainted: G OE 4.11.0-rc7 #217
----------
Thus, passing relevant information as-is
warn_alloc_stall(gfp_t gfp_mask, nodemask_t *nodemask, unsigned long alloc_start, int order)
rather than via printf() arguments
warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask, "page allocation stalls for %ums, order:%u", jiffies_to_msecs(jiffies-alloc_start), order);
will give us a lot of flexibility including e.g. ratelimit calling
show_mem() using timers.
If relevant information were available via off-stack memory (e.g. via
"struct task_struct"), kmallocwd-like behavior which allows us to report
all possibly-relevant threads timely (and take actions including e.g.
taking memory snapshots for analysis via commands sent from KVM host
environment if running as a KVM guest as a reaction to kernel messages
sent via netconsole) becomes possible rather than
needlessly-spammable-and-possibly-unreportable after-the-fact stall reports.
>
> Otherwise, I think this is a good direction.
So, here we got a conflict. Michal thinks this is a pointless code and
David thinks this is a good direction. Michal, can you accept
warn_alloc_stall()/warn_alloc_failed() separation?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-04-18 11:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-10 11:58 Tetsuo Handa
2017-04-10 12:39 ` Michal Hocko
2017-04-10 14:23 ` Tetsuo Handa
2017-04-10 22:03 ` Andrew Morton
2017-04-11 7:15 ` Michal Hocko
2017-04-11 11:43 ` Tetsuo Handa
2017-04-11 11:54 ` Michal Hocko
2017-04-11 13:26 ` Tetsuo Handa
2017-04-17 22:48 ` David Rientjes
2017-04-18 11:49 ` Tetsuo Handa [this message]
2017-04-18 12:14 ` Michal Hocko
2017-04-18 21:47 ` David Rientjes
2017-04-19 11:13 ` Michal Hocko
2017-04-19 13:22 ` Stanislaw Gruszka
2017-04-19 13:33 ` Michal Hocko
2017-04-22 8:10 ` Stanislaw Gruszka
2017-04-24 8:42 ` Michal Hocko
2017-04-24 13:06 ` Stanislaw Gruszka
2017-04-24 15:06 ` Tetsuo Handa
2017-04-25 6:36 ` Stanislaw Gruszka
2017-04-19 22:34 ` David Rientjes
2017-04-20 11:46 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201704182049.BIE34837.FJOFOMFOQSLHVt@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=sgruszka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox