* Memory pressure handling with iSCSI
@ 2005-07-26 17:35 Badari Pulavarty
2005-07-26 18:04 ` Roland Dreier
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 17:35 UTC (permalink / raw)
To: lkml, linux-mm; +Cc: akpm
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
Hi Andrew,
After KS & OLS discussions about memory pressure, I wanted to re-do
iSCSI testing with "dd"s to see if we are throttling writes.
I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
50 dds (one per filesystem). System seems to throttle memory properly
and making progress. (Machine doesn't respond very well for anything
else, but my vmstat keeps running - 100% sys time).
Thanks,
Badari
[-- Attachment #2: vmstat.out --]
[-- Type: text/plain, Size: 1461 bytes --]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
38 96 30500 43360 16612 6671064 2 0 103 11079 9860 2960 0 100 0 0
43 94 30500 43872 16704 6670460 0 0 124 11232 10993 3624 0 100 0 0
41 95 30500 44756 16780 6670304 22 0 41 11615 10864 3702 0 100 0 0
43 91 30500 43392 16580 6672096 6 0 11 10885 9736 2528 0 100 0 0
44 88 30500 43268 16468 6672204 6 0 14 12084 10361 1971 0 100 0 0
42 90 30500 43640 16556 6672116 0 0 26 12094 10447 3550 0 100 0 0
45 90 30500 46120 16584 6670016 6 0 22 11546 10690 3815 0 100 0 0
42 89 30500 43516 16560 6672564 11 0 48 12902 9368 3464 0 100 0 0
40 91 30500 43640 16572 6671540 6 0 87 10866 9253 2943 0 100 0 0
37 90 30500 43516 16608 6672040 6 0 25 14411 9374 2595 0 100 0 0
36 99 30500 43268 16568 6672080 0 0 23 14071 9524 2401 0 100 0 0
36 93 30500 43268 16596 6671504 6 0 16 11502 9403 3185 0 100 0 0
33 91 30500 43392 16588 6671540 0 0 11 10191 9837 3374 0 100 0 0
33 91 30500 43392 16552 6672092 0 0 15 11762 9703 2915 0 100 0 0
33 90 30500 43268 16648 6671480 0 0 131 11692 9784 3154 0 100 0 0
33 97 30500 43640 16640 6672004 0 0 18 9253 9491 1998 0 100 0 0
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: Memory pressure handling with iSCSI 2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty @ 2005-07-26 18:04 ` Roland Dreier 2005-07-26 18:11 ` Andrew Morton 2005-07-26 20:59 ` Rik van Riel 2 siblings, 0 replies; 28+ messages in thread From: Roland Dreier @ 2005-07-26 18:04 UTC (permalink / raw) To: Badari Pulavarty; +Cc: lkml, linux-mm, akpm Thanks, this is a good test. It would be interesting to know if the system does eventually deadlock with less system memory or with even more filesystems. - R. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty 2005-07-26 18:04 ` Roland Dreier @ 2005-07-26 18:11 ` Andrew Morton 2005-07-26 18:39 ` Badari Pulavarty 2005-07-26 20:59 ` Rik van Riel 2 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-26 18:11 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > After KS & OLS discussions about memory pressure, I wanted to re-do > iSCSI testing with "dd"s to see if we are throttling writes. > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple > 50 dds (one per filesystem). System seems to throttle memory properly > and making progress. (Machine doesn't respond very well for anything > else, but my vmstat keeps running - 100% sys time). It's important to monitor /proc/meminfo too - the amount of dirty/writeback pages, etc. btw, 100% system time is quite appalling. Are you sure vmstat is telling the truth? If so, where's it all being spent? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 18:11 ` Andrew Morton @ 2005-07-26 18:39 ` Badari Pulavarty 2005-07-26 18:48 ` Andrew Morton 2005-07-26 19:31 ` Sonny Rao 0 siblings, 2 replies; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 18:39 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > iSCSI testing with "dd"s to see if we are throttling writes. > > > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple > > 50 dds (one per filesystem). System seems to throttle memory properly > > and making progress. (Machine doesn't respond very well for anything > > else, but my vmstat keeps running - 100% sys time). > > It's important to monitor /proc/meminfo too - the amount of dirty/writeback > pages, etc. > > btw, 100% system time is quite appalling. Are you sure vmstat is telling > the truth? If so, where's it all being spent? > > Well, profile doesn't show any time in "default_idle". So I believe, vmstat is telling the truth. # cat /proc/meminfo MemTotal: 7143628 kB MemFree: 43252 kB Buffers: 16736 kB Cached: 6683348 kB SwapCached: 5336 kB Active: 14460 kB Inactive: 6686928 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 43252 kB SwapTotal: 1048784 kB SwapFree: 1017920 kB Dirty: 6225664 kB Writeback: 447272 kB Mapped: 10460 kB Slab: 362136 kB CommitLimit: 4620596 kB Committed_AS: 168616 kB PageTables: 2452 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB # echo 2 > /proc/profile; sleep 5; readprofile - m /usr/src/*12.3/System.map | sort -nr 1634737 total 0.5464 1468569 shrink_zone 390.5769 21203 unlock_page 331.2969 19497 release_pages 46.8678 19061 __wake_up_bit 397.1042 17936 page_referenced 53.3810 10679 lru_add_drain 133.4875 7348 page_waitqueue 76.5417 5877 tg3_poll 2.4007 4650 cond_resched 41.5179 4476 copy_user_generic 15.0201 1973 do_get_write_access 1.2583 1858 __mod_page_state 38.7083 1754 tg3_start_xmit 0.9876 1348 journal_dirty_metadata 2.1063 1250 __find_get_block 2.7902 1224 journal_add_journal_head 2.6379 1082 kmem_cache_free 11.2708 1077 tcp_sendpage 0.3580 1076 tcp_ack 0.1431 1075 __make_request 0.7999 1035 tg3_interrupt_tagged 2.5875 1022 __pagevec_lru_add 4.5625 928 tcp_transmit_skb 0.4677 924 kmem_cache_alloc 14.4375 900 thread_return 3.5294 819 __ext3_get_inode_loc 0.9307 754 established_get_next 2.2440 711 journal_cancel_revoke 1.4335 684 file_send_actor 7.1250 Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 18:39 ` Badari Pulavarty @ 2005-07-26 18:48 ` Andrew Morton 2005-07-26 19:12 ` Andrew Morton 2005-07-26 19:31 ` Sonny Rao 1 sibling, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-26 18:48 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > > iSCSI testing with "dd"s to see if we are throttling writes. > > > > > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple > > > 50 dds (one per filesystem). System seems to throttle memory properly > > > and making progress. (Machine doesn't respond very well for anything > > > else, but my vmstat keeps running - 100% sys time). > > > > It's important to monitor /proc/meminfo too - the amount of dirty/writeback > > pages, etc. > > > > btw, 100% system time is quite appalling. Are you sure vmstat is telling > > the truth? If so, where's it all being spent? > > > > > > Well, profile doesn't show any time in "default_idle". So > I believe, vmstat is telling the truth. > > # cat /proc/meminfo > MemTotal: 7143628 kB > MemFree: 43252 kB > Buffers: 16736 kB > Cached: 6683348 kB > SwapCached: 5336 kB > Active: 14460 kB > Inactive: 6686928 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 7143628 kB > LowFree: 43252 kB > SwapTotal: 1048784 kB > SwapFree: 1017920 kB > Dirty: 6225664 kB > Writeback: 447272 kB > Mapped: 10460 kB > Slab: 362136 kB > CommitLimit: 4620596 kB > Committed_AS: 168616 kB > PageTables: 2452 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 9888 kB > VmallocChunk: 34359728447 kB > HugePages_Total: 0 > HugePages_Free: 0 > Hugepagesize: 2048 kB > That is extremely wrong. dirty memory is *way* too high. > > # echo 2 > /proc/profile; sleep 5; readprofile - > m /usr/src/*12.3/System.map | sort -nr > 1634737 total 0.5464 > 1468569 shrink_zone 390.5769 > 21203 unlock_page 331.2969 > 19497 release_pages 46.8678 > 19061 __wake_up_bit 397.1042 > 17936 page_referenced 53.3810 > 10679 lru_add_drain 133.4875 And so page reclaim has gone crazy. We need to work out why the dirty memory levels are so high. Can you please reduce the number of filesystems, see if that reduces the dirty levels? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 18:48 ` Andrew Morton @ 2005-07-26 19:12 ` Andrew Morton 2005-07-26 20:36 ` Badari Pulavarty 2005-07-26 21:11 ` Badari Pulavarty 0 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2005-07-26 19:12 UTC (permalink / raw) To: pbadari, linux-kernel, linux-mm Andrew Morton <akpm@osdl.org> wrote: > > Can you please reduce the number of filesystems, see if that reduces the > dirty levels? Also, it's conceivable that ext3 is implicated here, so it might be saner to perform initial investigation on ext2. (when kjournald writes back a page via its buffers, the page remains "dirty" as far as the VFS is concerned. Later, someone tries to do a writepage() on it and we'll discover the buffers' cleanness and the page will be cleaned without any I/O being performed. All the throttling _should_ work OK in this case. But ext2 is more straightforward.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 19:12 ` Andrew Morton @ 2005-07-26 20:36 ` Badari Pulavarty 2005-07-26 21:11 ` Badari Pulavarty 1 sibling, 0 replies; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 20:36 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Tue, 2005-07-26 at 12:12 -0700, Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > > > > Can you please reduce the number of filesystems, see if that reduces the > > dirty levels? > > Also, it's conceivable that ext3 is implicated here, so it might be saner > to perform initial investigation on ext2. > > (when kjournald writes back a page via its buffers, the page remains > "dirty" as far as the VFS is concerned. Later, someone tries to do a > writepage() on it and we'll discover the buffers' cleanness and the page > will be cleaned without any I/O being performed. All the throttling > _should_ work OK in this case. But ext2 is more straightforward.) I will try ext2 next. - Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 19:12 ` Andrew Morton 2005-07-26 20:36 ` Badari Pulavarty @ 2005-07-26 21:11 ` Badari Pulavarty 2005-07-26 21:24 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 21:11 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm [-- Attachment #1: Type: text/plain, Size: 2668 bytes --] On Tue, 2005-07-26 at 12:12 -0700, Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > > > > Can you please reduce the number of filesystems, see if that reduces the > > dirty levels? > > Also, it's conceivable that ext3 is implicated here, so it might be saner > to perform initial investigation on ext2. > > (when kjournald writes back a page via its buffers, the page remains > "dirty" as far as the VFS is concerned. Later, someone tries to do a > writepage() on it and we'll discover the buffers' cleanness and the page > will be cleaned without any I/O being performed. All the throttling > _should_ work OK in this case. But ext2 is more straightforward.) ext2 is incredibly better. Machine is very responsive. # echo 2 > /proc/profile; sleep 5; readprofile - m /usr/src/*12.3/System.map | sort -nr 28671 total 0.0096 25024 default_idle 521.3333 1987 shrink_zone 0.5285 163 tg3_poll 0.0666 154 unlock_page 2.4062 113 page_referenced 0.3363 106 copy_user_generic 0.3557 98 __wake_up_bit 2.0417 74 release_pages 0.1779 71 page_waitqueue 0.7396 51 tg3_start_xmit 0.0287 39 __make_request 0.0290 36 tcp_ack 0.0048 30 tcp_sendpage 0.0100 30 scsi_request_fn 0.0260 28 tg3_interrupt_tagged 0.0700 27 kmem_cache_alloc 0.4219 23 kmem_cache_free 0.2396 22 rotate_reclaimable_page 0.0859 20 established_get_next 0.0595 20 cond_resched 0.1786 20 __mod_page_state 0.4167 16 tcp_transmit_skb 0.0081 15 memset 0.0781 15 __kfree_skb 0.0521 14 tcp_write_xmit 0.0194 14 handle_IRQ_event 0.1458 12 skb_clone 0.0214 12 kfree 0.0500 12 end_buffer_async_write 0.0469 11 tcp_v4_rcv 0.0041 10 test_set_page_writeback 0.0329 Thanks, Badari [-- Attachment #2: vmstat-ext2.out --] [-- Type: text/plain, Size: 1375 bytes --] procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 56 4 33372 12512 6794560 0 0 142 1451 10283 1632 0 7 0 93 0 56 4 35488 12496 6791996 0 0 131 1762 10335 1583 0 3 0 96 0 56 4 33132 12540 6794532 0 0 1 1320 10228 2082 0 4 0 96 0 56 4 33132 12684 6794388 0 0 35 2054 10414 1973 0 7 0 93 0 56 4 33380 12712 6794876 0 0 0 2676 10635 2739 0 6 0 94 0 56 4 33132 12672 6793368 0 0 2 6799 10240 2617 0 10 0 90 0 56 4 33132 12608 6793948 0 0 0 10525 10249 2945 0 10 0 90 2 56 4 33380 12528 6792996 0 0 1 12566 11081 2813 0 12 0 88 1 55 4 33380 12368 6793672 0 0 1 9206 10237 2608 0 13 0 87 0 56 4 33132 12176 6793348 0 0 0 10939 10156 2744 0 17 0 83 2 59 4 33256 12060 6794496 0 0 5 11706 10464 2746 0 15 0 85 0 56 4 33504 11844 6794196 0 0 0 12196 10525 2835 0 17 0 83 0 56 4 33504 11592 6795480 0 0 0 8656 10463 2692 0 10 0 90 0 56 4 33132 11492 6796612 0 0 1 9022 10222 2496 0 11 0 89 2 55 4 33256 11384 6796720 0 0 0 9661 10830 2813 0 9 0 91 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 21:11 ` Badari Pulavarty @ 2005-07-26 21:24 ` Andrew Morton 2005-07-26 21:45 ` Badari Pulavarty 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-26 21:24 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > ext2 is incredibly better. Machine is very responsive. > OK. Please, always monitor and send /proc/meminfo. I assume that the dirty-memory clamping is working OK with ext2 and that perhaps it'll work OK with ext3/data=writeback. All very odd. I wonder how to reproduce this. Maybe 50 ext3 filesystems on regular old scsi will do it? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 21:24 ` Andrew Morton @ 2005-07-26 21:45 ` Badari Pulavarty 2005-07-26 22:10 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 21:45 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > ext2 is incredibly better. Machine is very responsive. > > > > OK. Please, always monitor and send /proc/meminfo. I assume that the > dirty-memory clamping is working OK with ext2 and that perhaps it'll work > OK with ext3/data=writeback. Nope. Dirty is still very high.. # cat /proc/meminfo MemTotal: 7143628 kB MemFree: 33248 kB Buffers: 8368 kB Cached: 6789932 kB SwapCached: 0 kB Active: 51316 kB Inactive: 6769144 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 33248 kB SwapTotal: 1048784 kB SwapFree: 1048780 kB Dirty: 6605704 kB Writeback: 168452 kB Mapped: 49724 kB Slab: 252200 kB CommitLimit: 4620596 kB Committed_AS: 163524 kB PageTables: 2284 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 21:45 ` Badari Pulavarty @ 2005-07-26 22:10 ` Andrew Morton 2005-07-26 22:48 ` Badari Pulavarty 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-26 22:10 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > ext2 is incredibly better. Machine is very responsive. > > > > > > > OK. Please, always monitor and send /proc/meminfo. I assume that the > > dirty-memory clamping is working OK with ext2 and that perhaps it'll work > > OK with ext3/data=writeback. > > Nope. Dirty is still very high.. That's a relief in a way. Can you please try decreasing the number of filesystems now? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 22:10 ` Andrew Morton @ 2005-07-26 22:48 ` Badari Pulavarty 2005-07-26 23:07 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 22:48 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm [-- Attachment #1: Type: text/plain, Size: 887 bytes --] On Tue, 2005-07-26 at 15:10 -0700, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote: > > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > > > ext2 is incredibly better. Machine is very responsive. > > > > > > > > > > OK. Please, always monitor and send /proc/meminfo. I assume that the > > > dirty-memory clamping is working OK with ext2 and that perhaps it'll work > > > OK with ext3/data=writeback. > > > > Nope. Dirty is still very high.. > > That's a relief in a way. Can you please try decreasing the number of > filesystems now? Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo every 5 seconds. As you can see, we seem to dirty 6GB of data in 20 seconds of starting the test. I am not sure if its bad, since we have lots of free memory.. Thanks, Badari [-- Attachment #2: vmstat-5-ext2.out --] [-- Type: text/plain, Size: 969 bytes --] procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 11 120 32912 10624 6813476 0 0 0 6766 10364 485 0 4 0 96 0 11 120 33036 10652 6813964 0 0 2 8889 10079 475 0 4 0 96 0 11 120 33036 10712 6813904 0 0 0 8077 9984 469 0 4 0 96 0 11 120 32912 10752 6814380 0 0 0 15576 10226 514 0 4 0 95 0 11 120 33036 10668 6813432 0 0 0 11334 10112 488 0 4 0 96 0 11 120 33656 10600 6813500 0 0 0 11811 10238 497 0 4 0 96 0 11 120 33036 10596 6814020 0 0 0 12713 10191 489 0 4 0 96 0 11 120 33036 10648 6813968 0 0 1 15775 10195 508 0 4 0 96 0 10 120 33780 10656 6812928 0 0 2 5390 10265 503 0 3 5 92 0 11 120 33036 10660 6813440 0 0 0 9700 10217 518 0 4 2 94 [-- Attachment #3: meminfo.out --] [-- Type: text/plain, Size: 3384 bytes --] MemTotal: 7143628 kB MemFree: 7001860 kB Buffers: 5080 kB Cached: 23300 kB SwapCached: 0 kB Active: 48600 kB Inactive: 5872 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 7001860 kB SwapTotal: 1048784 kB SwapFree: 1048780 kB Dirty: 0 kB Writeback: 0 kB Mapped: 45948 kB Slab: 56348 kB CommitLimit: 4620596 kB Committed_AS: 148436 kB PageTables: 1544 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB MemTotal: 7143628 kB MemFree: 4871864 kB Buffers: 14564 kB Cached: 2091232 kB SwapCached: 0 kB Active: 51380 kB Inactive: 2081780 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 4871864 kB SwapTotal: 1048784 kB SwapFree: 1048780 kB Dirty: 2070752 kB Writeback: 0 kB Mapped: 46368 kB Slab: 107912 kB CommitLimit: 4620596 kB Committed_AS: 148524 kB PageTables: 1608 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB MemTotal: 7143628 kB MemFree: 406384 kB Buffers: 18940 kB Cached: 6443960 kB SwapCached: 0 kB Active: 55688 kB Inactive: 6435048 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 406384 kB SwapTotal: 1048784 kB SwapFree: 1048780 kB Dirty: 6144652 kB Writeback: 252152 kB Mapped: 46380 kB Slab: 216580 kB CommitLimit: 4620596 kB Committed_AS: 148756 kB PageTables: 1608 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB MemTotal: 7143628 kB MemFree: 32772 kB Buffers: 10028 kB Cached: 6817680 kB SwapCached: 4 kB Active: 48180 kB Inactive: 6804552 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 32772 kB SwapTotal: 1048784 kB SwapFree: 1048664 kB Dirty: 6489496 kB Writeback: 285264 kB Mapped: 46000 kB Slab: 228172 kB CommitLimit: 4620596 kB Committed_AS: 148756 kB PageTables: 1608 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB MemTotal: 7143628 kB MemFree: 32524 kB Buffers: 10056 kB Cached: 6816620 kB SwapCached: 4 kB Active: 48672 kB Inactive: 6803212 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7143628 kB LowFree: 32524 kB SwapTotal: 1048784 kB SwapFree: 1048664 kB Dirty: 6465124 kB Writeback: 268876 kB Mapped: 46008 kB Slab: 229580 kB CommitLimit: 4620596 kB Committed_AS: 148996 kB PageTables: 1608 kB VmallocTotal: 34359738367 kB VmallocUsed: 9888 kB VmallocChunk: 34359728447 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 22:48 ` Badari Pulavarty @ 2005-07-26 23:07 ` Andrew Morton 2005-07-26 23:26 ` Badari Pulavarty 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-26 23:07 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20 > seconds of starting the test. I am not sure if its bad, since we have > lots of free memory.. It's bad. The logic in balance_dirty_pages() should block those write() callers as soon as we hit 40% dirty memory or whatever is in /proc/sys/vm/dirty_ratio. So something is horridly busted. Can you try reducing the number of filesystems even further? Either the underlying block driver is doing something most bizarre to the VFS or something has gone wrong with the arithmetic in page-writeback.c. If total_pages or ratelimit_pages are totally wrong or if get_dirty_limits() is returning junk then we'd be seeing something like this. It'll be something simple - if you have time, stick some printks in balance_dirty_pages(), work out why it is not remaining in that `for' loop until dirty memory has fallen below the 40%. I'll take a shot at reproducing this on my 4G x86_64 box, but this is so grossly wrong that I'm sure it would have been noted before now if it was commonly happening (famous last words). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 23:07 ` Andrew Morton @ 2005-07-26 23:26 ` Badari Pulavarty 2005-07-27 0:31 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 23:26 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo > > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20 > > seconds of starting the test. I am not sure if its bad, since we have > > lots of free memory.. > > It's bad. The logic in balance_dirty_pages() should block those write() > callers as soon as we hit 40% dirty memory or whatever is in > /proc/sys/vm/dirty_ratio. So something is horridly busted. > > Can you try reducing the number of filesystems even further? Single ext2 filesystem. We still dirty pretty quickly (data collected every 5 seconds). # grep Dirty OUT Dirty: 312 kB Dirty: 1121852 kB Dirty: 2896952 kB Dirty: 4344564 kB Dirty: 5310856 kB Dirty: 5507812 kB Dirty: 5714884 kB Dirty: 5865132 kB Dirty: 6004276 kB Dirty: 6206544 kB Dirty: 6380524 kB Dirty: 6583200 kB Dirty: 6727296 kB Dirty: 6708564 kB Dirty: 6733768 kB Dirty: 6737868 kB Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 23:26 ` Badari Pulavarty @ 2005-07-27 0:31 ` Andrew Morton 2005-07-27 1:20 ` Martin J. Bligh 2005-07-27 1:31 ` Badari Pulavarty 0 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2005-07-27 0:31 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo > > > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20 > > > seconds of starting the test. I am not sure if its bad, since we have > > > lots of free memory.. > > > > It's bad. The logic in balance_dirty_pages() should block those write() > > callers as soon as we hit 40% dirty memory or whatever is in > > /proc/sys/vm/dirty_ratio. So something is horridly busted. > > > > Can you try reducing the number of filesystems even further? > > Single ext2 filesystem. We still dirty pretty quickly (data collected > every 5 seconds). It happens here, a bit. My machine goes up to 60% dirty when it should be clamping at 40%. The variable `total_pages' in page-writeback.c (from nr_free_pagecache_pages()) is too high. I trace it back to here: On node 0 totalpages: 1572864 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 1568768 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 This machine only has 4G of memory, so the platform code is overestimating the number of pages by 50%. Can you please check your dmesg, see if your system is also getting this wrong? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-27 0:31 ` Andrew Morton @ 2005-07-27 1:20 ` Martin J. Bligh 2005-07-27 1:26 ` Andrew Morton 2005-07-27 1:31 ` Badari Pulavarty 1 sibling, 1 reply; 28+ messages in thread From: Martin J. Bligh @ 2005-07-27 1:20 UTC (permalink / raw) To: Andrew Morton, Badari Pulavarty; +Cc: linux-kernel, linux-mm > It happens here, a bit. My machine goes up to 60% dirty when it should be > clamping at 40%. > > The variable `total_pages' in page-writeback.c (from > nr_free_pagecache_pages()) is too high. I trace it back to here: > > On node 0 totalpages: 1572864 > DMA zone: 4096 pages, LIFO batch:1 > Normal zone: 1568768 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 > > This machine only has 4G of memory, so the platform code is overestimating > the number of pages by 50%. Can you please check your dmesg, see if your > system is also getting this wrong? I think we're repeatedly iterating over the same zones by walking the zonelists: static unsigned int nr_free_zone_pages(int offset) { pg_data_t *pgdat; unsigned int sum = 0; int i; for_each_pgdat(pgdat) { struct zone *zone; for (i = 0; i < MAX_NR_ZONES; i++) { unsigned long size, high; zone = pgdat->node_zones[i]; size = zone->present_pages; high = zone->pages_high; if (size > high) sum += size - high; } } } Does that look more sensible? I'd send you a real patch, except the box just crashed ;-) M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-27 1:20 ` Martin J. Bligh @ 2005-07-27 1:26 ` Andrew Morton 2005-07-27 1:47 ` Martin J. Bligh 0 siblings, 1 reply; 28+ messages in thread From: Andrew Morton @ 2005-07-27 1:26 UTC (permalink / raw) To: Martin J. Bligh; +Cc: pbadari, linux-kernel, linux-mm "Martin J. Bligh" <mbligh@mbligh.org> wrote: > > > > It happens here, a bit. My machine goes up to 60% dirty when it should be > > clamping at 40%. > > > > The variable `total_pages' in page-writeback.c (from > > nr_free_pagecache_pages()) is too high. I trace it back to here: > > > > On node 0 totalpages: 1572864 > > DMA zone: 4096 pages, LIFO batch:1 > > Normal zone: 1568768 pages, LIFO batch:31 > > HighMem zone: 0 pages, LIFO batch:1 > > > > This machine only has 4G of memory, so the platform code is overestimating > > the number of pages by 50%. Can you please check your dmesg, see if your > > system is also getting this wrong? > > I think we're repeatedly iterating over the same zones by walking the > zonelists: > > static unsigned int nr_free_zone_pages(int offset) > { > pg_data_t *pgdat; > unsigned int sum = 0; > int i; > > for_each_pgdat(pgdat) { > struct zone *zone; > > for (i = 0; i < MAX_NR_ZONES; i++) { > unsigned long size, high; > > zone = pgdat->node_zones[i]; > size = zone->present_pages; > high = zone->pages_high; > > if (size > high) > sum += size - high; > } > } > } I don't think so. We're getting the wrong answer out of calculate_zone_totalpages() which is an init-time thing. Maybe nr_free_zone_pages() is supposed to fix that up post-facto somehow, but calculate_zone_totalpages() sure as heck shouldn't be putting 1568768 into my ZONE_NORMAL's ->node_present_pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-27 1:26 ` Andrew Morton @ 2005-07-27 1:47 ` Martin J. Bligh 0 siblings, 0 replies; 28+ messages in thread From: Martin J. Bligh @ 2005-07-27 1:47 UTC (permalink / raw) To: Andrew Morton; +Cc: pbadari, linux-kernel, linux-mm > I don't think so. We're getting the wrong answer out of > calculate_zone_totalpages() which is an init-time thing. > > Maybe nr_free_zone_pages() is supposed to fix that up post-facto somehow, > but calculate_zone_totalpages() sure as heck shouldn't be putting 1568768 > into my ZONE_NORMAL's ->node_present_pages. Humpf. I'll look at it again later. nr_free_pagecache_pages -> nr_free_zone_pages -> nr_free_zone_pages is it not? M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-27 0:31 ` Andrew Morton 2005-07-27 1:20 ` Martin J. Bligh @ 2005-07-27 1:31 ` Badari Pulavarty 2005-07-27 1:40 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-27 1:31 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Tue, 2005-07-26 at 17:31 -0700, Andrew Morton wrote: > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote: > > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo > > > > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20 > > > > seconds of starting the test. I am not sure if its bad, since we have > > > > lots of free memory.. > > > > > > It's bad. The logic in balance_dirty_pages() should block those write() > > > callers as soon as we hit 40% dirty memory or whatever is in > > > /proc/sys/vm/dirty_ratio. So something is horridly busted. > > > > > > Can you try reducing the number of filesystems even further? > > > > Single ext2 filesystem. We still dirty pretty quickly (data collected > > every 5 seconds). > > It happens here, a bit. My machine goes up to 60% dirty when it should be > clamping at 40%. > > The variable `total_pages' in page-writeback.c (from > nr_free_pagecache_pages()) is too high. I trace it back to here: > > On node 0 totalpages: 1572864 > DMA zone: 4096 pages, LIFO batch:1 > Normal zone: 1568768 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 > > This machine only has 4G of memory, so the platform code is overestimating > the number of pages by 50%. Can you please check your dmesg, see if your > system is also getting this wrong? On node 0 totalpages: 1572863 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 1568767 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 On node 1 totalpages: 131071 DMA zone: 0 pages, LIFO batch:1 Normal zone: 131071 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 On node 2 totalpages: 131071 DMA zone: 0 pages, LIFO batch:1 Normal zone: 131071 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 On node 3 totalpages: 131071 DMA zone: 0 pages, LIFO batch:1 Normal zone: 131071 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-27 1:31 ` Badari Pulavarty @ 2005-07-27 1:40 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2005-07-27 1:40 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > This machine only has 4G of memory, so the platform code is overestimating > > the number of pages by 50%. Can you please check your dmesg, see if your > > system is also getting this wrong? > > > > On node 0 totalpages: 1572863 > DMA zone: 4096 pages, LIFO batch:1 > Normal zone: 1568767 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 > On node 1 totalpages: 131071 > DMA zone: 0 pages, LIFO batch:1 > Normal zone: 131071 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 > On node 2 totalpages: 131071 > DMA zone: 0 pages, LIFO batch:1 > Normal zone: 131071 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 > On node 3 totalpages: 131071 > DMA zone: 0 pages, LIFO batch:1 > Normal zone: 131071 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:1 That's 7.7GB, yes? On a 6GB machine? If so, that's a bit off, but not grossly. Here's the dopey debug patch which I used: - boot - dmesg -s 1000000 | grep total_pages > foo - kill off syslogd (sudo service syslog stop) - run the dd command - wait for it to hit steady state (max dirty memory) - dmesg -s 1000000 >> foo diff -puN mm/page-writeback.c~a mm/page-writeback.c --- 25/mm/page-writeback.c~a 2005-07-26 15:53:46.000000000 -0700 +++ 25-akpm/mm/page-writeback.c 2005-07-26 16:21:55.000000000 -0700 @@ -161,7 +161,8 @@ get_dirty_limits(struct writeback_state dirty_ratio = vm_dirty_ratio; if (dirty_ratio > unmapped_ratio / 2) dirty_ratio = unmapped_ratio / 2; - + printk("vm_dirty_ratio=%d unmapped_ratio=%d dirty_ratio=%d\n", + vm_dirty_ratio, unmapped_ratio, dirty_ratio); if (dirty_ratio < 5) dirty_ratio = 5; @@ -171,6 +172,8 @@ get_dirty_limits(struct writeback_state background = (background_ratio * available_memory) / 100; dirty = (dirty_ratio * available_memory) / 100; + printk("dirty_ratio=%d available_memory=%lu dirty=%lu\n", + dirty_ratio, available_memory, dirty); tsk = current; if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) { background += background / 4; @@ -209,6 +212,12 @@ static void balance_dirty_pages(struct a get_dirty_limits(&wbs, &background_thresh, &dirty_thresh, mapping); nr_reclaimable = wbs.nr_dirty + wbs.nr_unstable; + printk("background_thresh=%ld dirty_thresh=%ld " + "nr_dirty=%ld nr_unstable=%ld " + "nr_reclaimable=%ld wbs.nr_writeback=%ld\n", + background_thresh, dirty_thresh, + wbs.nr_dirty, wbs.nr_unstable, + nr_reclaimable, wbs.nr_writeback); if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh) break; @@ -532,6 +541,8 @@ void __init page_writeback_init(void) total_pages = nr_free_pagecache_pages(); + printk("total_pages=%ld\n", total_pages); + correction = (100 * 4 * buffer_pages) / total_pages; if (correction < 100) { _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 18:39 ` Badari Pulavarty 2005-07-26 18:48 ` Andrew Morton @ 2005-07-26 19:31 ` Sonny Rao 2005-07-26 20:37 ` Badari Pulavarty 1 sibling, 1 reply; 28+ messages in thread From: Sonny Rao @ 2005-07-26 19:31 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Andrew Morton, lkml, linux-mm On Tue, Jul 26, 2005 at 11:39:11AM -0700, Badari Pulavarty wrote: > On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote: > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > > iSCSI testing with "dd"s to see if we are throttling writes. > > > > > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple > > > 50 dds (one per filesystem). System seems to throttle memory properly > > > and making progress. (Machine doesn't respond very well for anything > > > else, but my vmstat keeps running - 100% sys time). > > > > It's important to monitor /proc/meminfo too - the amount of dirty/writeback > > pages, etc. > > > > btw, 100% system time is quite appalling. Are you sure vmstat is telling > > the truth? If so, where's it all being spent? > > > > > > Well, profile doesn't show any time in "default_idle". So > I believe, vmstat is telling the truth. Badari, You probably covered this, but just to make sure, if you're on a pentium4 machine, I usually boot w/ "idle=poll" to see proper idle reporting because otherwise the chip will throttle itself back and idle time will be skewed -- at least on oprofile. Sonny -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 19:31 ` Sonny Rao @ 2005-07-26 20:37 ` Badari Pulavarty 2005-07-26 21:21 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 20:37 UTC (permalink / raw) To: Sonny Rao; +Cc: Andrew Morton, lkml, linux-mm On Tue, 2005-07-26 at 15:31 -0400, Sonny Rao wrote: > On Tue, Jul 26, 2005 at 11:39:11AM -0700, Badari Pulavarty wrote: > > On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote: > > > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > > > > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > > > iSCSI testing with "dd"s to see if we are throttling writes. > > > > > > > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple > > > > 50 dds (one per filesystem). System seems to throttle memory properly > > > > and making progress. (Machine doesn't respond very well for anything > > > > else, but my vmstat keeps running - 100% sys time). > > > > > > It's important to monitor /proc/meminfo too - the amount of dirty/writeback > > > pages, etc. > > > > > > btw, 100% system time is quite appalling. Are you sure vmstat is telling > > > the truth? If so, where's it all being spent? > > > > > > > > > > Well, profile doesn't show any time in "default_idle". So > > I believe, vmstat is telling the truth. > > Badari, > > You probably covered this, but just to make sure, if you're on a > pentium4 machine, I usually boot w/ "idle=poll" to see proper idle > reporting because otherwise the chip will throttle itself back and > idle time will be skewed -- at least on oprofile. > My machine is AMD64. - Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 20:37 ` Badari Pulavarty @ 2005-07-26 21:21 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2005-07-26 21:21 UTC (permalink / raw) To: Badari Pulavarty; +Cc: sonny, linux-kernel, linux-mm Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > You probably covered this, but just to make sure, if you're on a > > pentium4 machine, I usually boot w/ "idle=poll" to see proper idle > > reporting because otherwise the chip will throttle itself back and > > idle time will be skewed -- at least on oprofile. > > > > My machine is AMD64. I'd expect the problem to which Sonny refers will occur on many architectures. IIRC, the problem is that many (or all) of the counters which oprofile uses are turned off when the CPU does a halt. So the profiler ends up thinking that zero time is spent in the idle handler. The net effect is that if your workload spends 90% of its time idle then all the other profiler hits are exaggerated by a factor of ten. Making the CPU busywait in idle() fixes this. But you're using the old /proc/profile profiler which uses a free-running timer which doesn't get stopped by halt, so it is unaffected by this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty 2005-07-26 18:04 ` Roland Dreier 2005-07-26 18:11 ` Andrew Morton @ 2005-07-26 20:59 ` Rik van Riel 2005-07-26 21:05 ` Badari Pulavarty 2005-07-26 21:12 ` Andrew Morton 2 siblings, 2 replies; 28+ messages in thread From: Rik van Riel @ 2005-07-26 20:59 UTC (permalink / raw) To: Badari Pulavarty; +Cc: lkml, linux-mm, Andrew Morton On Tue, 26 Jul 2005, Badari Pulavarty wrote: > After KS & OLS discussions about memory pressure, I wanted to re-do > iSCSI testing with "dd"s to see if we are throttling writes. Could you also try with shared writable mmap, to see if that works ok or triggers a deadlock ? -- The Theory of Escalating Commitment: "The cost of continuing mistakes is borne by others, while the cost of admitting mistakes is borne by yourself." -- Joseph Stiglitz, Nobel Laureate in Economics -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 20:59 ` Rik van Riel @ 2005-07-26 21:05 ` Badari Pulavarty 2005-07-26 21:33 ` Martin J. Bligh 2005-07-26 21:12 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Badari Pulavarty @ 2005-07-26 21:05 UTC (permalink / raw) To: Rik van Riel; +Cc: lkml, linux-mm, Andrew Morton On Tue, 2005-07-26 at 16:59 -0400, Rik van Riel wrote: > On Tue, 26 Jul 2005, Badari Pulavarty wrote: > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > iSCSI testing with "dd"s to see if we are throttling writes. > > Could you also try with shared writable mmap, to see if that > works ok or triggers a deadlock ? I can, but lets finish addressing one issue at a time. Last time, I changed too many things at the same time and got no where :( Thanks, Badari -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 21:05 ` Badari Pulavarty @ 2005-07-26 21:33 ` Martin J. Bligh 2005-07-26 22:05 ` Adam Litke 0 siblings, 1 reply; 28+ messages in thread From: Martin J. Bligh @ 2005-07-26 21:33 UTC (permalink / raw) To: Badari Pulavarty, Rik van Riel, agl; +Cc: lkml, linux-mm, Andrew Morton >> > After KS & OLS discussions about memory pressure, I wanted to re-do >> > iSCSI testing with "dd"s to see if we are throttling writes. >> >> Could you also try with shared writable mmap, to see if that >> works ok or triggers a deadlock ? > > > I can, but lets finish addressing one issue at a time. Last time, > I changed too many things at the same time and got no where :( Adam is working that one, but not over iSCSI. M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 21:33 ` Martin J. Bligh @ 2005-07-26 22:05 ` Adam Litke 0 siblings, 0 replies; 28+ messages in thread From: Adam Litke @ 2005-07-26 22:05 UTC (permalink / raw) To: Martin J. Bligh Cc: Badari Pulavarty [imap], Rik van Riel, lkml, linux-mm, Andrew Morton On Tue, 2005-07-26 at 16:33, Martin J. Bligh wrote: > >> > After KS & OLS discussions about memory pressure, I wanted to re-do > >> > iSCSI testing with "dd"s to see if we are throttling writes. > >> > >> Could you also try with shared writable mmap, to see if that > >> works ok or triggers a deadlock ? > > > > > > I can, but lets finish addressing one issue at a time. Last time, > > I changed too many things at the same time and got no where :( > > Adam is working that one, but not over iSCSI. I wrote a simple/ugly C program to demonstrate the MAP_SHARED,PROT_WRITE case. I was able to saturate the system with 75% of all memory in dirty pages before I got bored. To reproduce: - Create a 3GB file with dd - ./map-shared-dirty bigfile <number of chunks> I break up the mmap & dirty operation into chunks in case the system is tight on memory. Choose a large enough number of chunks so the individual mmaps will be small enough for your system to accomodate. -- MemTotal: 4092492 kB MemFree: 786988 kB Buffers: 6372 kB Cached: 3211388 kB SwapCached: 0 kB Active: 3197428 kB Inactive: 36696 kB HighTotal: 3211264 kB HighFree: 1024 kB LowTotal: 881228 kB LowFree: 785964 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 3117300 kB Writeback: 3568 kB Mapped: 24780 kB Slab: 59316 kB Committed_AS: 49760 kB PageTables: 780 kB VmallocTotal: 114680 kB VmallocUsed: 32 kB VmallocChunk: 114648 kB /* * map-shared-dirty.c - Demonstrate a loophole in dirty-ratio when * heavily dirtying MAP_SHARED memory. * * Usage: (I know it's ugly) * ./map-shared-dirty <large file> <number of chunks> */ #include <string.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/mman.h> #include <stdio.h> size_t page_size; void dirty_file(int fd, unsigned long bytes, size_t map_offset) { char *addr; addr = mmap(NULL, bytes, PROT_READ|PROT_WRITE, MAP_SHARED, fd, map_offset); if (addr == MAP_FAILED) { fprintf(stderr, "Failed to map file\n"); fprintf(stderr, "bytes: %i offset: %i\n", bytes,map_offset); exit(1); } /* Dirty the pages */ memset(addr, map_offset%255, bytes); munmap(addr, bytes); } int main(int argc, char **argv) { char *filename = argv[1]; int chunks = atoi(argv[2]); int fd; unsigned long i, chunk_size, bytes; struct stat file_info; fd = open(filename, O_RDWR|0100000); /* O_LARGEFILE */ if (fd <= 0) { fprintf(stderr, "Failed to open file\n"); exit(1); } fstat(fd, &file_info); bytes = file_info.st_size; page_size = getpagesize(); chunk_size = (bytes / chunks) & ~(page_size - 1); printf("Chunk size = %i\n", chunk_size); for (i = 0; i < bytes; i+=chunk_size) dirty_file(fd, chunk_size, i); exit(0); } -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI 2005-07-26 20:59 ` Rik van Riel 2005-07-26 21:05 ` Badari Pulavarty @ 2005-07-26 21:12 ` Andrew Morton 1 sibling, 0 replies; 28+ messages in thread From: Andrew Morton @ 2005-07-26 21:12 UTC (permalink / raw) To: Rik van Riel; +Cc: pbadari, linux-kernel, linux-mm Rik van Riel <riel@redhat.com> wrote: > > On Tue, 26 Jul 2005, Badari Pulavarty wrote: > > > After KS & OLS discussions about memory pressure, I wanted to re-do > > iSCSI testing with "dd"s to see if we are throttling writes. > > Could you also try with shared writable mmap, to see if that > works ok or triggers a deadlock ? > That'll cause problems for sure, but we need to get `dd' right first :( -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2005-07-27 1:47 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty 2005-07-26 18:04 ` Roland Dreier 2005-07-26 18:11 ` Andrew Morton 2005-07-26 18:39 ` Badari Pulavarty 2005-07-26 18:48 ` Andrew Morton 2005-07-26 19:12 ` Andrew Morton 2005-07-26 20:36 ` Badari Pulavarty 2005-07-26 21:11 ` Badari Pulavarty 2005-07-26 21:24 ` Andrew Morton 2005-07-26 21:45 ` Badari Pulavarty 2005-07-26 22:10 ` Andrew Morton 2005-07-26 22:48 ` Badari Pulavarty 2005-07-26 23:07 ` Andrew Morton 2005-07-26 23:26 ` Badari Pulavarty 2005-07-27 0:31 ` Andrew Morton 2005-07-27 1:20 ` Martin J. Bligh 2005-07-27 1:26 ` Andrew Morton 2005-07-27 1:47 ` Martin J. Bligh 2005-07-27 1:31 ` Badari Pulavarty 2005-07-27 1:40 ` Andrew Morton 2005-07-26 19:31 ` Sonny Rao 2005-07-26 20:37 ` Badari Pulavarty 2005-07-26 21:21 ` Andrew Morton 2005-07-26 20:59 ` Rik van Riel 2005-07-26 21:05 ` Badari Pulavarty 2005-07-26 21:33 ` Martin J. Bligh 2005-07-26 22:05 ` Adam Litke 2005-07-26 21:12 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox