* A pitfall of mempool?
@ 2017-04-25 11:22 Tetsuo Handa
2017-04-27 11:19 ` block: mempool allocation hangs under OOM. (Re: A pitfall of mempool?) Tetsuo Handa
0 siblings, 1 reply; 2+ messages in thread
From: Tetsuo Handa @ 2017-04-25 11:22 UTC (permalink / raw)
To: linux-mm
Today's testing showed an interesting lockup.
xfs_alloc_ioend() from xfs_add_to_ioend() by wb_workfn() requested
bio_alloc_bioset(GFP_NOFS) and entered into too_many_isolated() loop
waiting for kswapd inside mempool_alloc(). But xfs_alloc_ioend() from
xfs_add_to_ioend() by kswapd() also entered into mempool_alloc() and
was waiting for somebody to return to memory pool which wb_workfn() was
also waiting for. The result was a silent hang up explained at
http://lkml.kernel.org/r/20170307133057.26182-1-mhocko@kernel.org .
But why there was no memory in the memory pool?
Memory pool does not guarantee at least one memory is available?
I hope memory pool guarantees it.
Then, there was a race window that both wb_workfn() and kswapd() looked
at memory pool and found that the pool was empty, and somebody returned
memory to the pool after wb_workfn() entered into __alloc_pages_nodemask() ?
Then, why kswapd() cannot find returned memory? I can't tell whether kswapd()
was making progress...
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170425.txt.xz .
----------
[ 0.000000] Linux version 4.11.0-rc7-next-20170424 (root@localhost.localdomain) (gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) ) #88 SMP Tue Apr 25 17:04:45 JST 2017
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.11.0-rc7-next-20170424 root=UUID=98df1583-260a-423a-a193-182dade5d085 ro crashkernel=256M security=none sysrq_always_enabled console=ttyS0,115200n8 console=tty0 LANG=en_US.UTF-8
[ 279.056280] kswapd0 D12368 60 2 0x00000000
[ 279.058541] Call Trace:
[ 279.059888] __schedule+0x403/0x940
[ 279.061442] schedule+0x3d/0x90
[ 279.062903] schedule_timeout+0x23b/0x510
[ 279.064764] ? init_timer_on_stack_key+0x60/0x60
[ 279.066737] io_schedule_timeout+0x1e/0x50
[ 279.068505] ? io_schedule_timeout+0x1e/0x50
[ 279.070397] mempool_alloc+0x162/0x190
[ 279.072105] ? remove_wait_queue+0x70/0x70
[ 279.073872] bvec_alloc+0x90/0xf0
[ 279.075394] bio_alloc_bioset+0x17b/0x230
[ 279.077181] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 279.079109] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 279.081056] ? clear_page_dirty_for_io+0xab/0x2c0
[ 279.083077] xfs_vm_writepage+0x3b/0x70 [xfs]
[ 279.085019] pageout.isra.55+0x1a0/0x430
[ 279.086755] shrink_page_list+0xa5b/0xce0
[ 279.088510] shrink_inactive_list+0x1ba/0x590
[ 279.090430] ? __list_lru_count_one.isra.2+0x22/0x70
[ 279.092528] shrink_node_memcg+0x378/0x750
[ 279.094356] shrink_node+0xe1/0x310
[ 279.095916] ? shrink_node+0xe1/0x310
[ 279.097566] kswapd+0x3eb/0x9d0
[ 279.099008] kthread+0x117/0x150
[ 279.100514] ? mem_cgroup_shrink_node+0x350/0x350
[ 279.102563] ? kthread_create_on_node+0x70/0x70
[ 279.104560] ret_from_fork+0x31/0x40
[ 281.159433] kworker/u128:29 D10424 381 2 0x00000000
[ 281.161730] Workqueue: writeback wb_workfn (flush-8:0)
[ 281.163867] Call Trace:
[ 281.165100] __schedule+0x403/0x940
[ 281.166690] schedule+0x3d/0x90
[ 281.168143] schedule_timeout+0x23b/0x510
[ 281.169967] ? init_timer_on_stack_key+0x60/0x60
[ 281.171998] io_schedule_timeout+0x1e/0x50
[ 281.174004] ? io_schedule_timeout+0x1e/0x50
[ 281.176070] congestion_wait+0x86/0x210
[ 281.177765] ? remove_wait_queue+0x70/0x70
[ 281.179623] shrink_inactive_list+0x45e/0x590
[ 281.181514] ? __list_lru_count_one.isra.2+0x22/0x70
[ 281.183599] shrink_node_memcg+0x378/0x750
[ 281.185483] ? lock_acquire+0xfb/0x200
[ 281.187161] ? mem_cgroup_iter+0xa1/0x570
[ 281.188911] shrink_node+0xe1/0x310
[ 281.190569] ? shrink_node+0xe1/0x310
[ 281.192226] do_try_to_free_pages+0xef/0x370
[ 281.194116] try_to_free_pages+0x12c/0x370
[ 281.195896] __alloc_pages_slowpath+0x4d4/0x1110
[ 281.197885] __alloc_pages_nodemask+0x2dd/0x390
[ 281.199927] alloc_pages_current+0xa1/0x1f0
[ 281.201780] new_slab+0x2dc/0x680
[ 281.203279] ? free_object+0x19/0xa0
[ 281.204908] ___slab_alloc+0x443/0x640
[ 281.206709] ? mempool_alloc_slab+0x1d/0x30
[ 281.208986] ? mempool_alloc_slab+0x1d/0x30
[ 281.210867] __slab_alloc+0x51/0x90
[ 281.212450] ? __slab_alloc+0x51/0x90
[ 281.214126] ? mempool_alloc_slab+0x1d/0x30
[ 281.215919] kmem_cache_alloc+0x283/0x350
[ 281.217717] ? trace_hardirqs_on+0xd/0x10
[ 281.219505] mempool_alloc_slab+0x1d/0x30
[ 281.221238] mempool_alloc+0x77/0x190
[ 281.222879] ? remove_wait_queue+0x70/0x70
[ 281.224769] bvec_alloc+0x90/0xf0
[ 281.226260] bio_alloc_bioset+0x17b/0x230
[ 281.228055] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 281.230015] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 281.231988] write_cache_pages+0x230/0x680
[ 281.233768] ? xfs_add_to_ioend+0x260/0x260 [xfs]
[ 281.235800] ? xfs_vm_writepages+0x48/0xa0 [xfs]
[ 281.237863] xfs_vm_writepages+0x6b/0xa0 [xfs]
[ 281.239856] do_writepages+0x21/0x30
[ 281.241733] __writeback_single_inode+0x68/0x7f0
[ 281.244117] ? _raw_spin_unlock+0x27/0x40
[ 281.245860] writeback_sb_inodes+0x328/0x700
[ 281.247758] __writeback_inodes_wb+0x92/0xc0
[ 281.249655] wb_writeback+0x3c0/0x5f0
[ 281.251263] wb_workfn+0xaf/0x650
[ 281.252777] ? wb_workfn+0xaf/0x650
[ 281.254371] ? process_one_work+0x1c2/0x690
[ 281.256163] process_one_work+0x250/0x690
[ 281.257939] worker_thread+0x4e/0x3b0
[ 281.259701] kthread+0x117/0x150
[ 281.261150] ? process_one_work+0x690/0x690
[ 281.263018] ? kthread_create_on_node+0x70/0x70
[ 281.265005] ret_from_fork+0x31/0x40
[ 430.018312] kswapd0 D12368 60 2 0x00000000
[ 430.020792] Call Trace:
[ 430.022061] __schedule+0x403/0x940
[ 430.023623] schedule+0x3d/0x90
[ 430.025093] schedule_timeout+0x23b/0x510
[ 430.026921] ? init_timer_on_stack_key+0x60/0x60
[ 430.028912] io_schedule_timeout+0x1e/0x50
[ 430.030687] ? io_schedule_timeout+0x1e/0x50
[ 430.032594] mempool_alloc+0x162/0x190
[ 430.034299] ? remove_wait_queue+0x70/0x70
[ 430.036070] bvec_alloc+0x90/0xf0
[ 430.037673] bio_alloc_bioset+0x17b/0x230
[ 430.039736] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 430.042116] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 430.044225] ? clear_page_dirty_for_io+0xab/0x2c0
[ 430.046533] xfs_vm_writepage+0x3b/0x70 [xfs]
[ 430.048640] pageout.isra.55+0x1a0/0x430
[ 430.050712] shrink_page_list+0xa5b/0xce0
[ 430.052586] shrink_inactive_list+0x1ba/0x590
[ 430.054538] ? __list_lru_count_one.isra.2+0x22/0x70
[ 430.056713] shrink_node_memcg+0x378/0x750
[ 430.058511] shrink_node+0xe1/0x310
[ 430.060147] ? shrink_node+0xe1/0x310
[ 430.061997] kswapd+0x3eb/0x9d0
[ 430.063435] kthread+0x117/0x150
[ 430.065150] ? mem_cgroup_shrink_node+0x350/0x350
[ 430.067476] ? kthread_create_on_node+0x70/0x70
[ 430.069452] ret_from_fork+0x31/0x40
[ 431.716061] kworker/u128:29 D10424 381 2 0x00000000
[ 431.718359] Workqueue: writeback wb_workfn (flush-8:0)
[ 431.720536] Call Trace:
[ 431.721777] __schedule+0x403/0x940
[ 431.723343] schedule+0x3d/0x90
[ 431.724791] schedule_timeout+0x23b/0x510
[ 431.726612] ? init_timer_on_stack_key+0x60/0x60
[ 431.728573] io_schedule_timeout+0x1e/0x50
[ 431.730391] ? io_schedule_timeout+0x1e/0x50
[ 431.732283] congestion_wait+0x86/0x210
[ 431.734016] ? remove_wait_queue+0x70/0x70
[ 431.735794] shrink_inactive_list+0x45e/0x590
[ 431.737711] ? __list_lru_count_one.isra.2+0x22/0x70
[ 431.739837] shrink_node_memcg+0x378/0x750
[ 431.741666] ? lock_acquire+0xfb/0x200
[ 431.743322] ? mem_cgroup_iter+0xa1/0x570
[ 431.745129] shrink_node+0xe1/0x310
[ 431.746752] ? shrink_node+0xe1/0x310
[ 431.748379] do_try_to_free_pages+0xef/0x370
[ 431.750263] try_to_free_pages+0x12c/0x370
[ 431.752144] __alloc_pages_slowpath+0x4d4/0x1110
[ 431.754108] __alloc_pages_nodemask+0x2dd/0x390
[ 431.756031] alloc_pages_current+0xa1/0x1f0
[ 431.757860] new_slab+0x2dc/0x680
[ 431.759457] ? free_object+0x19/0xa0
[ 431.761049] ___slab_alloc+0x443/0x640
[ 431.762844] ? mempool_alloc_slab+0x1d/0x30
[ 431.764687] ? mempool_alloc_slab+0x1d/0x30
[ 431.766646] __slab_alloc+0x51/0x90
[ 431.768213] ? __slab_alloc+0x51/0x90
[ 431.769851] ? mempool_alloc_slab+0x1d/0x30
[ 431.771673] kmem_cache_alloc+0x283/0x350
[ 431.773430] ? trace_hardirqs_on+0xd/0x10
[ 431.775217] mempool_alloc_slab+0x1d/0x30
[ 431.777001] mempool_alloc+0x77/0x190
[ 431.778615] ? remove_wait_queue+0x70/0x70
[ 431.780454] bvec_alloc+0x90/0xf0
[ 431.781995] bio_alloc_bioset+0x17b/0x230
[ 431.783804] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 431.785708] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 431.787691] write_cache_pages+0x230/0x680
[ 431.789519] ? xfs_add_to_ioend+0x260/0x260 [xfs]
[ 431.791561] ? xfs_vm_writepages+0x48/0xa0 [xfs]
[ 431.793546] xfs_vm_writepages+0x6b/0xa0 [xfs]
[ 431.796642] do_writepages+0x21/0x30
[ 431.798244] __writeback_single_inode+0x68/0x7f0
[ 431.800236] ? _raw_spin_unlock+0x27/0x40
[ 431.802043] writeback_sb_inodes+0x328/0x700
[ 431.803873] __writeback_inodes_wb+0x92/0xc0
[ 431.805693] wb_writeback+0x3c0/0x5f0
[ 431.807327] wb_workfn+0xaf/0x650
[ 431.808854] ? wb_workfn+0xaf/0x650
[ 431.810397] ? process_one_work+0x1c2/0x690
[ 431.812210] process_one_work+0x250/0x690
[ 431.813957] worker_thread+0x4e/0x3b0
[ 431.815561] kthread+0x117/0x150
[ 431.817055] ? process_one_work+0x690/0x690
[ 431.818867] ? kthread_create_on_node+0x70/0x70
[ 431.820775] ret_from_fork+0x31/0x40
----------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* block: mempool allocation hangs under OOM. (Re: A pitfall of mempool?)
2017-04-25 11:22 A pitfall of mempool? Tetsuo Handa
@ 2017-04-27 11:19 ` Tetsuo Handa
0 siblings, 0 replies; 2+ messages in thread
From: Tetsuo Handa @ 2017-04-27 11:19 UTC (permalink / raw)
To: linux-block, linux-mm; +Cc: axboe, mhocko
Hello.
I noticed a hang up where kswapd was unable to get memory for bio from mempool
at bio_alloc_bioset(GFP_NOFS) request at
http://lkml.kernel.org/r/201704252022.DFB26076.FMOQVFOtJOSFHL@I-love.SAKURA.ne.jp .
Since there is no mean to check whether kswapd was making progress, I tried
below patch on top of allocation watchdog patch at
http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .
----------
include/linux/gfp.h | 8 ++++++++
kernel/hung_task.c | 5 ++++-
mm/mempool.c | 9 ++++++++-
mm/page_alloc.c | 6 ++----
4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 2b1a44f5..cc16050 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -469,6 +469,14 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
return __alloc_pages_node(nid, gfp_mask, order);
}
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+extern void start_memalloc_timer(const gfp_t gfp_mask, const int order);
+extern void stop_memalloc_timer(const gfp_t gfp_mask);
+#else
+#define start_memalloc_timer(gfp_mask, order) do { } while (0)
+#define stop_memalloc_timer(gfp_mask) do { } while (0)
+#endif
+
#ifdef CONFIG_NUMA
extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8f237c0..7d11e8e 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -17,6 +17,7 @@
#include <linux/sysctl.h>
#include <linux/utsname.h>
#include <linux/oom.h>
+#include <linux/console.h>
#include <linux/sched/signal.h>
#include <linux/sched/debug.h>
@@ -149,7 +150,9 @@ static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
get_task_struct(g);
get_task_struct(t);
rcu_read_unlock();
- cond_resched();
+ if (console_trylock())
+ console_unlock();
+ //cond_resched();
rcu_read_lock();
can_cont = pid_alive(g) && pid_alive(t);
put_task_struct(t);
diff --git a/mm/mempool.c b/mm/mempool.c
index 47a659d..8b449af 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -324,11 +324,14 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
+ start_memalloc_timer(gfp_temp, -1);
repeat_alloc:
element = pool->alloc(gfp_temp, pool->pool_data);
- if (likely(element != NULL))
+ if (likely(element != NULL)) {
+ stop_memalloc_timer(gfp_temp);
return element;
+ }
spin_lock_irqsave(&pool->lock, flags);
if (likely(pool->curr_nr)) {
@@ -341,6 +344,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
* for debugging.
*/
kmemleak_update_trace(element);
+ stop_memalloc_timer(gfp_temp);
return element;
}
@@ -350,13 +354,16 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
*/
if (gfp_temp != gfp_mask) {
spin_unlock_irqrestore(&pool->lock, flags);
+ stop_memalloc_timer(gfp_temp);
gfp_temp = gfp_mask;
+ start_memalloc_timer(gfp_temp, -1);
goto repeat_alloc;
}
/* We must not sleep if !__GFP_DIRECT_RECLAIM */
if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
spin_unlock_irqrestore(&pool->lock, flags);
+ stop_memalloc_timer(gfp_temp);
return NULL;
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f539752..652ba4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4008,7 +4008,7 @@ bool memalloc_maybe_stalling(void)
return false;
}
-static void start_memalloc_timer(const gfp_t gfp_mask, const int order)
+void start_memalloc_timer(const gfp_t gfp_mask, const int order)
{
struct memalloc_info *m = ¤t->memalloc;
@@ -4032,7 +4032,7 @@ static void start_memalloc_timer(const gfp_t gfp_mask, const int order)
m->in_flight++;
}
-static void stop_memalloc_timer(const gfp_t gfp_mask)
+void stop_memalloc_timer(const gfp_t gfp_mask)
{
struct memalloc_info *m = ¤t->memalloc;
@@ -4055,8 +4055,6 @@ static void memalloc_counter_fold(int cpu)
}
#else
-#define start_memalloc_timer(gfp_mask, order) do { } while (0)
-#define stop_memalloc_timer(gfp_mask) do { } while (0)
#define memalloc_counter_fold(cpu) do { } while (0)
#endif
----------
These patches reported that mempool_alloc(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOMEMALLOC)
was not able to get memory for more than 15 minutes (and thus unable to submit block I/O
request for reclaiming memory via FS writeback, and thus unable to unblock __GFP_FS
allocation requests waiting for FS writeback, and thus unable to invoke the OOM killer for
reclaiming memory for more than 12 minutes, and thus unable to unblock mempool_alloc(),
and thus silent hang up).
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170427.txt.xz .
----------
[ 0.000000] Linux version 4.11.0-rc8-next-20170426+ (root@localhost.localdomain) (gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) ) #96 SMP Thu Apr 27 09:45:31 JST 2017
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.11.0-rc8-next-20170426+ root=UUID=98df1583-260a-423a-a193-182dade5d085 ro crashkernel=256M security=none sysrq_always_enabled console=ttyS0,115200n8 console=tty0 LANG=en_US.UTF-8
(...snipped...)
[ 588.687385] Out of memory: Kill process 8050 (a.out) score 999 or sacrifice child
[ 588.691029] Killed process 8050 (a.out) total-vm:4168kB, anon-rss:80kB, file-rss:0kB, shmem-rss:0kB
[ 588.699415] oom_reaper: reaped process 8050 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
(...snipped...)
[ 1401.127653] MemAlloc: a.out(8865) flags=0x400040 switches=1816 seq=25 gfp=0x1411240(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOMEMALLOC) order=4294967295 delay=1054846 uninterruptible
[ 1401.134726] a.out D11768 8865 7842 0x00000080
[ 1401.137339] Call Trace:
[ 1401.138802] __schedule+0x403/0x940
[ 1401.140626] schedule+0x3d/0x90
[ 1401.142324] schedule_timeout+0x23b/0x510
[ 1401.144348] ? init_timer_on_stack_key+0x60/0x60
[ 1401.146578] io_schedule_timeout+0x1e/0x50
[ 1401.148618] ? io_schedule_timeout+0x1e/0x50
[ 1401.150689] mempool_alloc+0x18b/0x1c0
[ 1401.152587] ? remove_wait_queue+0x70/0x70
[ 1401.154619] bio_alloc_bioset+0xae/0x230
[ 1401.156580] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 1401.158750] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 1401.160916] write_cache_pages+0x230/0x680
[ 1401.162935] ? xfs_add_to_ioend+0x260/0x260 [xfs]
[ 1401.165169] ? sched_clock+0x9/0x10
[ 1401.166961] ? xfs_vm_writepages+0x55/0xa0 [xfs]
[ 1401.169183] xfs_vm_writepages+0x6b/0xa0 [xfs]
[ 1401.171312] do_writepages+0x21/0x30
[ 1401.173115] __filemap_fdatawrite_range+0xc6/0x100
[ 1401.175381] filemap_write_and_wait_range+0x2d/0x70
[ 1401.177705] xfs_file_fsync+0x86/0x2b0 [xfs]
[ 1401.179785] vfs_fsync_range+0x4b/0xb0
[ 1401.181660] ? __audit_syscall_exit+0x21f/0x2c0
[ 1401.183824] do_fsync+0x3d/0x70
[ 1401.185473] SyS_fsync+0x10/0x20
[ 1401.187160] do_syscall_64+0x6c/0x1c0
[ 1401.189004] entry_SYSCALL64_slow_path+0x25/0x25
(...snipped...)
[ 1401.211964] MemAlloc: a.out(8866) flags=0x400040 switches=1825 seq=31 gfp=0x1411240(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_NOMEMALLOC) order=4294967295 delay=1055258 uninterruptible
[ 1401.219018] a.out D11320 8866 7842 0x00000080
[ 1401.221583] Call Trace:
[ 1401.223032] __schedule+0x403/0x940
[ 1401.224871] schedule+0x3d/0x90
[ 1401.226545] schedule_timeout+0x23b/0x510
[ 1401.228550] ? init_timer_on_stack_key+0x60/0x60
[ 1401.230773] io_schedule_timeout+0x1e/0x50
[ 1401.232791] ? io_schedule_timeout+0x1e/0x50
[ 1401.234890] mempool_alloc+0x18b/0x1c0
[ 1401.236768] ? remove_wait_queue+0x70/0x70
[ 1401.238757] bvec_alloc+0x90/0xf0
[ 1401.240444] bio_alloc_bioset+0x17b/0x230
[ 1401.242425] xfs_add_to_ioend+0x7b/0x260 [xfs]
[ 1401.244566] xfs_do_writepage+0x3d9/0x8f0 [xfs]
[ 1401.246709] write_cache_pages+0x230/0x680
[ 1401.248736] ? xfs_add_to_ioend+0x260/0x260 [xfs]
[ 1401.250944] ? sched_clock+0x9/0x10
[ 1401.252720] ? xfs_vm_writepages+0x55/0xa0 [xfs]
[ 1401.254918] xfs_vm_writepages+0x6b/0xa0 [xfs]
[ 1401.257028] do_writepages+0x21/0x30
[ 1401.258832] __filemap_fdatawrite_range+0xc6/0x100
[ 1401.261079] filemap_write_and_wait_range+0x2d/0x70
[ 1401.263383] xfs_file_fsync+0x86/0x2b0 [xfs]
[ 1401.265435] vfs_fsync_range+0x4b/0xb0
[ 1401.267301] ? __audit_syscall_exit+0x21f/0x2c0
[ 1401.269449] do_fsync+0x3d/0x70
[ 1401.271080] SyS_fsync+0x10/0x20
[ 1401.272738] do_syscall_64+0x6c/0x1c0
[ 1401.274574] entry_SYSCALL64_slow_path+0x25/0x25
----------
# ./scripts/faddr2line vmlinux bio_alloc_bioset+0xae/0x230
bio_alloc_bioset+0xae/0x230:
bio_alloc_bioset at block/bio.c:484
# ./scripts/faddr2line vmlinux bio_alloc_bioset+0x17b/0x230
bio_alloc_bioset+0x17b/0x230:
bio_alloc_bioset at block/bio.c:501
# ./scripts/faddr2line vmlinux bvec_alloc+0x90/0xf0
bvec_alloc+0x90/0xf0:
bvec_alloc at block/bio.c:216
Something is wrong with assumptions for mempool_alloc() under memory pressure.
I was expecting for a patch that forces not to wait for FS writeback forever
at shrink_inactive_list() at
http://lkml.kernel.org/r/20170307133057.26182-1-mhocko@kernel.org , but
I think this mempool allocation hang problem should be examined before
we give up and workaround shrink_inactive_list().
Reproducer is shown below. Just write()ing & fsync()ing files on a plain
XFS filesystem ( /dev/sda1 ) on a plain SCSI disk under memory pressure.
No RAID/LVM/loopback etc. are used.
----------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
static char buffer[128] = { };
char *buf = NULL;
unsigned long size;
unsigned long i;
for (i = 0; i < 1024; i++) {
if (fork() == 0) {
int fd = open("/proc/self/oom_score_adj", O_WRONLY);
write(fd, "1000", 4);
close(fd);
snprintf(buffer, sizeof(buffer), "/tmp/file.%u", getpid());
fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600);
sleep(1);
while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer))
fsync(fd);
_exit(0);
}
}
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
sleep(2);
/* Will cause OOM due to overcommit */
for (i = 0; i < size; i += 4096) {
buf[i] = 0;
}
pause();
return 0;
}
----------
Regards.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-04-27 11:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 11:22 A pitfall of mempool? Tetsuo Handa
2017-04-27 11:19 ` block: mempool allocation hangs under OOM. (Re: A pitfall of mempool?) Tetsuo Handa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox