From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: tj@kernel.org
Cc: mgorman@techsingularity.net, vbabka@suse.cz, linux-mm@kvack.org,
hillf.zj@alibaba-inc.com, brouer@redhat.com
Subject: Re: mm: Why WQ_MEM_RECLAIM workqueue remains pending?
Date: Tue, 11 Jul 2017 19:51:07 +0900 [thread overview]
Message-ID: <201707111951.IHA98084.OHQtVOFJMLOSFF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20170710181214.GD1305447@devbig577.frc2.facebook.com>
Tejun Heo wrote:
> Hello, Tetsuo.
>
> I went through the logs and it doesn't look like the mm workqueue
> actually stalled, it was just slow to make progress. Please see
> below.
>
> On Fri, Jul 07, 2017 at 07:27:06PM +0900, Tetsuo Handa wrote:
> > Since drain_local_pages_wq work was stalling for 144 seconds as of uptime = 541,
> > drain_local_pages_wq work was queued around uptime = 397 (which is about 6 seconds
> > since the OOM killer/reaper reclaimed some memory for the last time).
> >
> > But as far as I can see from traces, the mm_percpu_wq thread as of uptime = 444 was
> > idle, while drain_local_pages_wq work was pending from uptime = 541 to uptime = 605.
> > This means that the mm_percpu_wq thread did not start processing drain_local_pages_wq
> > work immediately. (I don't know what made drain_local_pages_wq work be processed.)
> >
> > Why? Is this a workqueue implementation bug? Is this a workqueue usage bug?
>
> So, rescuer doesn't kick as soon as the workqueue becomes slow. It
> kicks in if the worker pool that the workqueue is associated with
> hangs. That is, if you have other work items actively running, e.g.,
> for reclaim on the pool, the pool isn't stalled and rescuers won't be
> woken up. IOW, having a rescuer prevents a workqueue from deadlocking
> due to resource starvation but it doesn't necessarily make it go
> faster. It's a deadlock prevention mechanism, not a priority raising
> one. If the work items need preferential execution, it should use
> WQ_HIGHPRI.
Thank you for explanation.
I tried below change. It indeed reduced delays, but even with WQ_HIGHPRI, up to a
few seconds of delay is unavoidable? I wished it is processed within a few jiffies.
----------
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 9a4441b..c099ebf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1768,7 +1768,8 @@ void __init init_mm_internals(void)
{
int ret __maybe_unused;
- mm_percpu_wq = alloc_workqueue("mm_percpu_wq", WQ_MEM_RECLAIM, 0);
+ mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
+ WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
#ifdef CONFIG_SMP
ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead",
----------
Before:
----------
[ 906.781160] Showing busy workqueues and worker pools:
[ 906.789620] workqueue events: flags=0x0
[ 906.796439] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=7/256
[ 906.805291] in-flight: 99:vmw_fb_dirty_flush [vmwgfx]{513504}
[ 906.809676] pending: vmpressure_work_fn{571835}, e1000_watchdog [e1000]{571413}, vmstat_shepherd{571093}, vmw_fb_dirty_flush [vmwgfx]{513504}, free_work{50571}, free_obj_work{50546}
[ 906.821048] workqueue events_power_efficient: flags=0x80
[ 906.825113] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[ 906.829567] pending: fb_flashcursor{571684}, do_cache_clean{566868}, neigh_periodic_work{564836}
[ 906.835508] workqueue events_freezable_power_: flags=0x84
[ 906.838162] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 906.840906] in-flight: 200:disk_events_workfn{571819}
[ 906.843485] workqueue mm_percpu_wq: flags=0x8
[ 906.845675] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 906.848754] pending: drain_local_pages_wq{50498} BAR(2104){50498}
[ 906.851717] workqueue writeback: flags=0x4e
[ 906.853841] pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[ 906.856556] in-flight: 354:wb_workfn{571030} wb_workfn{571030}
[ 906.859881] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=571s workers=2 manager: 33
[ 906.863663] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=7s workers=3 idle: 29 1548
[ 906.867153] pool 128: cpus=0-63 flags=0x4 nice=0 hung=43s workers=3 idle: 355 2115
----------
After:
----------
[ 778.377896] Showing busy workqueues and worker pools:
[ 778.380129] workqueue events: flags=0x0
[ 778.381879] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=4/256
[ 778.384371] pending: vmpressure_work_fn{117854}, free_work{117845}, e1000_watchdog [e1000]{117406}, e1000_watchdog [e1000]{117406}
[ 778.389198] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 778.391732] in-flight: 1635:vmw_fb_dirty_flush [vmwgfx]{154837} vmw_fb_dirty_flush [vmwgfx]{154837}
[ 778.395522] workqueue events_power_efficient: flags=0x80
[ 778.397828] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 778.400385] pending: do_cache_clean{96351}
[ 778.402395] workqueue events_freezable_power_: flags=0x84
[ 778.404734] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 778.407360] in-flight: 185:disk_events_workfn{156895}
[ 778.409902] workqueue mm_percpu_wq: flags=0x18
[ 778.412035] pwq 7: cpus=3 node=0 flags=0x0 nice=-20 active=1/256
[ 778.414675] pending: vmstat_update{5644}
[ 778.416681] workqueue writeback: flags=0x4e
[ 778.418701] pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256
[ 778.421210] in-flight: 358:wb_workfn{546} wb_workfn{546}
[ 778.424101] workqueue xfs-eofblocks/sda1: flags=0xc
[ 778.426304] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 778.429112] in-flight: 52:xfs_eofblocks_worker [xfs]{117964}
[ 778.431749] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=2 manager: 231 idle: 3
[ 778.435105] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 1606 49
[ 778.438235] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=117s workers=2 manager: 215
[ 778.441389] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=6s workers=3 idle: 63 29
[ 778.444882] pool 128: cpus=0-63 flags=0x4 nice=0 hung=0s workers=3 idle: 2449 360
----------
By the way, I think it might be useful if delay of each work item is printed together...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-11 10:51 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 10:57 Tetsuo Handa
2017-07-07 10:27 ` Tetsuo Handa
2017-07-10 18:12 ` Tejun Heo
2017-07-11 10:51 ` Tetsuo Handa [this message]
2017-07-11 12:19 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201707111951.IHA98084.OHQtVOFJMLOSFF@I-love.SAKURA.ne.jp \
--to=penguin-kernel@i-love.sakura.ne.jp \
--cc=brouer@redhat.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox