From: "Mike Snitzer" <snitzer@gmail.com>
To: linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Christoph Lameter <clameter@sgi.com>,
Nick Piggin <npiggin@suse.de>
Subject: deadlock w/ parallel mke2fs on 2 servers that each host an MD w/ an NBD member?
Date: Fri, 14 Sep 2007 19:04:16 -0400 [thread overview]
Message-ID: <170fa0d20709141604o301dfcceqc3652e23a72e639@mail.gmail.com> (raw)
Hello,
I'm interested in any insight into how to avoid the following deadlock
scenario. Here is the overview of the systems' configuration with
each of 2 4GB servers hosting an lvm2 LV on MD raid1 with 2 750GB
members (one local, one remote via nbd):
server A:
[lvm2 vg1/lv1]
[raid1 md0]
[sda][nbd0]
nbd-server -> [sdb]
server B:
[lvm2 vg2/lv2]
[raid1 md0]
[sdb][nbd0]
nbd-server -> [sda]
The deadlock occurs when the following is started simulataneously:
on server A: mke2fs -j /dev/vg1/lv1
on server B: mke2fs -j /dev/vg2/lv2
This deadlocks with both 2.6.15.7 and 2.6.19.7. I can easily try any
newer kernel with any patchset that might help (peterz's net deadlock
avoidance or per-bdi dirty accounting or CFS or ...).
All the following data is from a 2.6.15.7 kernel to which I've applied
2 nbd patches that peterz posted to LKML over the past year. One to
pin nbd to the noop scheduler and the other being the proposed nbd
request_fn fix:
http://lkml.org/lkml/2006/7/7/164
http://lkml.org/lkml/2007/4/29/283
I've tried playing games with prolonging the inevitable deadlock with
dirty_ratio=60, background_dirty_ratio=1, and running mke2fs with nice
19. The deadlock hits once the dirty_ratio is reached on server A and
B.
I could easily be missing a quick fix (via an existing patchset) but
it feels like the nbd-server _needs_ to be able to reserve a pool of
memory in the kernel to be able guarantee progress on its contribution
to the overall cross-connected systems' writeback. If not that then
what? And if that, how?
I have the full vmcore from 'system B' and can pull out any data that
you'd like to see via crash. Here are some traces that may be useful:
PID: 5185 TASK: ffff81015e0497f0 CPU: 1 COMMAND: "md0_raid1"
#0 [ffff81015543fbe8] schedule at ffffffff8031db68
#1 [ffff81015543fca0] io_schedule at ffffffff8031e52f
#2 [ffff81015543fcc0] get_request_wait at ffffffff801e084f
#3 [ffff81015543fd60] __make_request at ffffffff801e1565
#4 [ffff81015543fdb0] generic_make_request at ffffffff801e18af
#5 [ffff81015543fdd8] raid1d at ffffffff8806a6c7
#6 [ffff81015543fe00] raid1d at ffffffff8806a6d8
#7 [ffff81015543fe40] del_timer_sync at ffffffff80138bc5
#8 [ffff81015543fe50] schedule_timeout at ffffffff8031e614
#9 [ffff81015543fea0] md_thread at ffffffff802ac1bf
#10 [ffff81015543ff20] kthread at ffffffff80143e9f
#11 [ffff81015543ff50] kernel_thread at ffffffff8010e97e
PID: 5176 TASK: ffff81015fbce080 CPU: 0 COMMAND: "nbd-server"
#0 [ffff810157395938] schedule at ffffffff8031db68
#1 [ffff8101573959f0] schedule_timeout at ffffffff8031e60c
#2 [ffff810157395a40] io_schedule_timeout at ffffffff8031e568
#3 [ffff810157395a60] blk_congestion_wait at ffffffff801e1106
#4 [ffff810157395a90] get_writeback_state at ffffffff80158894
#5 [ffff810157395ae0] balance_dirty_pages_ratelimited at ffffffff80158a9d
#6 [ffff810157395ae8] blkdev_get_block at ffffffff80177d21
#7 [ffff810157395ba0] generic_file_buffered_write at ffffffff80155026
#8 [ffff810157395c40] skb_copy_datagram_iovec at ffffffff802bd137
#9 [ffff810157395c70] current_fs_time at ffffffff8013540d
#10 [ffff810157395ce0] __generic_file_aio_write_nolock at ffffffff80155676
#11 [ffff810157395d40] sock_aio_read at ffffffff802b66f9
#12 [ffff810157395dc0] generic_file_aio_write_nolock at ffffffff801559ec
#13 [ffff810157395e00] generic_file_write_nolock at ffffffff80155b24
#14 [ffff810157395e10] generic_file_read at ffffffff80155e50
#15 [ffff810157395ef0] blkdev_file_write at ffffffff80178bfa
#16 [ffff810157395f10] vfs_write at ffffffff801710f8
#17 [ffff810157395f40] sys_write at ffffffff80171249
#18 [ffff810157395f80] system_call at ffffffff8010d84a
RIP: 0000003ccbdb9302 RSP: 00007fffff71f3d8 RFLAGS: 00000246
RAX: 0000000000000001 RBX: ffffffff8010d84a RCX: 0000003ccbdb9302
RDX: 0000000000001000 RSI: 00007fffff71f3e0 RDI: 0000000000000003
RBP: 0000000000001000 R8: 0000000000000000 R9: 0000000000000000
R10: 00007fffff71f301 R11: 0000000000000246 R12: 0000000000505a40
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ff71f301
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
PID: 5274 TASK: ffff81015facd040 CPU: 0 COMMAND: "mke2fs"
#0 [ffff81014a7e3938] schedule at ffffffff8031db68
#1 [ffff81014a7e39f0] schedule_timeout at ffffffff8031e60c
#2 [ffff81014a7e3a40] io_schedule_timeout at ffffffff8031e568
#3 [ffff81014a7e3a60] blk_congestion_wait at ffffffff801e1106
#4 [ffff81014a7e3a90] get_writeback_state at ffffffff80158894
#5 [ffff81014a7e3ae0] balance_dirty_pages_ratelimited at ffffffff80158a9d
#6 [ffff81014a7e3ae8] blkdev_get_block at ffffffff80177d21
#7 [ffff81014a7e3ba0] generic_file_buffered_write at ffffffff80155026
#8 [ffff81014a7e3c80] __mark_inode_dirty at ffffffff80191e89
#9 [ffff81014a7e3ce0] __generic_file_aio_write_nolock at ffffffff80155676
#10 [ffff81014a7e3d30] thread_return at ffffffff8031dbcd
#11 [ffff81014a7e3dc0] generic_file_aio_write_nolock at ffffffff801559ec
#12 [ffff81014a7e3e00] generic_file_write_nolock at ffffffff80155b24
#13 [ffff81014a7e3e50] __wake_up at ffffffff8012c124
#14 [ffff81014a7e3ef0] blkdev_file_write at ffffffff80178bfa
#15 [ffff81014a7e3f10] vfs_write at ffffffff801710f8
#16 [ffff81014a7e3f40] sys_write at ffffffff80171249
#17 [ffff81014a7e3f80] system_call at ffffffff8010d84a
RIP: 0000003ccbdb9302 RSP: 00007fffff988b18 RFLAGS: 00000246
RAX: 0000000000000001 RBX: ffffffff8010d84a RCX: 0000003ccbdc6902
RDX: 0000000000008000 RSI: 0000000000514c60 RDI: 0000000000000003
RBP: 0000000000008000 R8: 0000000000514c60 R9: 00007fffff988c4c
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000050b470
R13: 0000000000000008 R14: 000000191804a000 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 969567 3.7 GB ----
FREE 337073 1.3 GB 34% of TOTAL MEM
USED 632494 2.4 GB 65% of TOTAL MEM
SHARED 340951 1.3 GB 35% of TOTAL MEM
BUFFERS 479526 1.8 GB 49% of TOTAL MEM
CACHED 35846 140 MB 3% of TOTAL MEM
SLAB 100397 392.2 MB 10% of TOTAL MEM
TOTAL HIGH 0 0 0% of TOTAL MEM
FREE HIGH 0 0 0% of TOTAL HIGH
TOTAL LOW 969567 3.7 GB 100% of TOTAL MEM
FREE LOW 337073 1.3 GB 34% of TOTAL LOW
TOTAL SWAP 2096472 8 GB ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 2096472 8 GB 100% of TOTAL SWAP
SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 0, high 0, batch 1 used:0
cpu 0 cold: low 0, high 0, batch 1 used:0
cpu 1 hot: low 0, high 0, batch 1 used:0
cpu 1 cold: low 0, high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: low 0, high 186, batch 31 used:138
cpu 0 cold: low 0, high 62, batch 15 used:0
cpu 1 hot: low 0, high 186, batch 31 used:28
cpu 1 cold: low 0, high 62, batch 15 used:0
Normal per-cpu:
cpu 0 hot: low 0, high 186, batch 31 used:178
cpu 0 cold: low 0, high 62, batch 15 used:14
cpu 1 hot: low 0, high 186, batch 31 used:104
cpu 1 cold: low 0, high 62, batch 15 used:3
HighMem per-cpu: empty
Free pages: 1352420kB (0kB HighMem)
Active:28937 inactive:496921 dirty:1 writeback:409317 unstable:0
free:338105 slab:100341 mapped:12395 pagetables:373
DMA free:10732kB min:56kB low:68kB high:84kB active:0kB inactive:0kB
present:11368kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 2466 3975 3975
DMA32 free:1331696kB min:12668kB low:15832kB high:19000kB active:0kB
inactive:698504kB present:2526132kB pages_scanned:0 all_unreclaimable?
no
lowmem_reserve[]: 0 0 1509 1509
Normal free:9992kB min:7748kB low:9684kB high:11620kB active:115748kB
inactive:1289180kB present:1545216kB pages_scanned:0
all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 1*8kB 2*16kB 4*32kB 5*64kB 2*128kB 1*256kB 1*512kB 1*1024kB
0*2048kB 2*4096kB = 10728kB
DMA32: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB
0*1024kB 0*2048kB 325*4096kB = 1331696kB
Normal: 76*4kB 7*8kB 2*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
1*1024kB 0*2048kB 2*4096kB = 9992kB
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap = 8385888kB
Total swap = 8385888kB
Free swap: 8385888kB
1441792 pages of RAM
472225 reserved pages
551969 pages shared
0 pages swap cached
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
reply other threads:[~2007-09-14 23:04 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=170fa0d20709141604o301dfcceqc3652e23a72e639@mail.gmail.com \
--to=snitzer@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox