* Memory pressure handling with iSCSI
@ 2005-07-26 17:35 Badari Pulavarty
2005-07-26 18:04 ` Roland Dreier
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 17:35 UTC (permalink / raw)
To: lkml, linux-mm; +Cc: akpm
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
Hi Andrew,
After KS & OLS discussions about memory pressure, I wanted to re-do
iSCSI testing with "dd"s to see if we are throttling writes.
I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
50 dds (one per filesystem). System seems to throttle memory properly
and making progress. (Machine doesn't respond very well for anything
else, but my vmstat keeps running - 100% sys time).
Thanks,
Badari
[-- Attachment #2: vmstat.out --]
[-- Type: text/plain, Size: 1461 bytes --]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
38 96 30500 43360 16612 6671064 2 0 103 11079 9860 2960 0 100 0 0
43 94 30500 43872 16704 6670460 0 0 124 11232 10993 3624 0 100 0 0
41 95 30500 44756 16780 6670304 22 0 41 11615 10864 3702 0 100 0 0
43 91 30500 43392 16580 6672096 6 0 11 10885 9736 2528 0 100 0 0
44 88 30500 43268 16468 6672204 6 0 14 12084 10361 1971 0 100 0 0
42 90 30500 43640 16556 6672116 0 0 26 12094 10447 3550 0 100 0 0
45 90 30500 46120 16584 6670016 6 0 22 11546 10690 3815 0 100 0 0
42 89 30500 43516 16560 6672564 11 0 48 12902 9368 3464 0 100 0 0
40 91 30500 43640 16572 6671540 6 0 87 10866 9253 2943 0 100 0 0
37 90 30500 43516 16608 6672040 6 0 25 14411 9374 2595 0 100 0 0
36 99 30500 43268 16568 6672080 0 0 23 14071 9524 2401 0 100 0 0
36 93 30500 43268 16596 6671504 6 0 16 11502 9403 3185 0 100 0 0
33 91 30500 43392 16588 6671540 0 0 11 10191 9837 3374 0 100 0 0
33 91 30500 43392 16552 6672092 0 0 15 11762 9703 2915 0 100 0 0
33 90 30500 43268 16648 6671480 0 0 131 11692 9784 3154 0 100 0 0
33 97 30500 43640 16640 6672004 0 0 18 9253 9491 1998 0 100 0 0
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty
@ 2005-07-26 18:04 ` Roland Dreier
2005-07-26 18:11 ` Andrew Morton
2005-07-26 20:59 ` Rik van Riel
2 siblings, 0 replies; 28+ messages in thread
From: Roland Dreier @ 2005-07-26 18:04 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: lkml, linux-mm, akpm
Thanks, this is a good test. It would be interesting to know if the
system does eventually deadlock with less system memory or with even
more filesystems.
- R.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty
2005-07-26 18:04 ` Roland Dreier
@ 2005-07-26 18:11 ` Andrew Morton
2005-07-26 18:39 ` Badari Pulavarty
2005-07-26 20:59 ` Rik van Riel
2 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 18:11 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> After KS & OLS discussions about memory pressure, I wanted to re-do
> iSCSI testing with "dd"s to see if we are throttling writes.
>
> I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
> 50 dds (one per filesystem). System seems to throttle memory properly
> and making progress. (Machine doesn't respond very well for anything
> else, but my vmstat keeps running - 100% sys time).
It's important to monitor /proc/meminfo too - the amount of dirty/writeback
pages, etc.
btw, 100% system time is quite appalling. Are you sure vmstat is telling
the truth? If so, where's it all being spent?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 18:11 ` Andrew Morton
@ 2005-07-26 18:39 ` Badari Pulavarty
2005-07-26 18:48 ` Andrew Morton
2005-07-26 19:31 ` Sonny Rao
0 siblings, 2 replies; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 18:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > After KS & OLS discussions about memory pressure, I wanted to re-do
> > iSCSI testing with "dd"s to see if we are throttling writes.
> >
> > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
> > 50 dds (one per filesystem). System seems to throttle memory properly
> > and making progress. (Machine doesn't respond very well for anything
> > else, but my vmstat keeps running - 100% sys time).
>
> It's important to monitor /proc/meminfo too - the amount of dirty/writeback
> pages, etc.
>
> btw, 100% system time is quite appalling. Are you sure vmstat is telling
> the truth? If so, where's it all being spent?
>
>
Well, profile doesn't show any time in "default_idle". So
I believe, vmstat is telling the truth.
# cat /proc/meminfo
MemTotal: 7143628 kB
MemFree: 43252 kB
Buffers: 16736 kB
Cached: 6683348 kB
SwapCached: 5336 kB
Active: 14460 kB
Inactive: 6686928 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 43252 kB
SwapTotal: 1048784 kB
SwapFree: 1017920 kB
Dirty: 6225664 kB
Writeback: 447272 kB
Mapped: 10460 kB
Slab: 362136 kB
CommitLimit: 4620596 kB
Committed_AS: 168616 kB
PageTables: 2452 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
# echo 2 > /proc/profile; sleep 5; readprofile -
m /usr/src/*12.3/System.map | sort -nr
1634737 total 0.5464
1468569 shrink_zone 390.5769
21203 unlock_page 331.2969
19497 release_pages 46.8678
19061 __wake_up_bit 397.1042
17936 page_referenced 53.3810
10679 lru_add_drain 133.4875
7348 page_waitqueue 76.5417
5877 tg3_poll 2.4007
4650 cond_resched 41.5179
4476 copy_user_generic 15.0201
1973 do_get_write_access 1.2583
1858 __mod_page_state 38.7083
1754 tg3_start_xmit 0.9876
1348 journal_dirty_metadata 2.1063
1250 __find_get_block 2.7902
1224 journal_add_journal_head 2.6379
1082 kmem_cache_free 11.2708
1077 tcp_sendpage 0.3580
1076 tcp_ack 0.1431
1075 __make_request 0.7999
1035 tg3_interrupt_tagged 2.5875
1022 __pagevec_lru_add 4.5625
928 tcp_transmit_skb 0.4677
924 kmem_cache_alloc 14.4375
900 thread_return 3.5294
819 __ext3_get_inode_loc 0.9307
754 established_get_next 2.2440
711 journal_cancel_revoke 1.4335
684 file_send_actor 7.1250
Thanks,
Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 18:39 ` Badari Pulavarty
@ 2005-07-26 18:48 ` Andrew Morton
2005-07-26 19:12 ` Andrew Morton
2005-07-26 19:31 ` Sonny Rao
1 sibling, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 18:48 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote:
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > >
> > > After KS & OLS discussions about memory pressure, I wanted to re-do
> > > iSCSI testing with "dd"s to see if we are throttling writes.
> > >
> > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
> > > 50 dds (one per filesystem). System seems to throttle memory properly
> > > and making progress. (Machine doesn't respond very well for anything
> > > else, but my vmstat keeps running - 100% sys time).
> >
> > It's important to monitor /proc/meminfo too - the amount of dirty/writeback
> > pages, etc.
> >
> > btw, 100% system time is quite appalling. Are you sure vmstat is telling
> > the truth? If so, where's it all being spent?
> >
> >
>
> Well, profile doesn't show any time in "default_idle". So
> I believe, vmstat is telling the truth.
>
> # cat /proc/meminfo
> MemTotal: 7143628 kB
> MemFree: 43252 kB
> Buffers: 16736 kB
> Cached: 6683348 kB
> SwapCached: 5336 kB
> Active: 14460 kB
> Inactive: 6686928 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 7143628 kB
> LowFree: 43252 kB
> SwapTotal: 1048784 kB
> SwapFree: 1017920 kB
> Dirty: 6225664 kB
> Writeback: 447272 kB
> Mapped: 10460 kB
> Slab: 362136 kB
> CommitLimit: 4620596 kB
> Committed_AS: 168616 kB
> PageTables: 2452 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 9888 kB
> VmallocChunk: 34359728447 kB
> HugePages_Total: 0
> HugePages_Free: 0
> Hugepagesize: 2048 kB
>
That is extremely wrong. dirty memory is *way* too high.
>
> # echo 2 > /proc/profile; sleep 5; readprofile -
> m /usr/src/*12.3/System.map | sort -nr
> 1634737 total 0.5464
> 1468569 shrink_zone 390.5769
> 21203 unlock_page 331.2969
> 19497 release_pages 46.8678
> 19061 __wake_up_bit 397.1042
> 17936 page_referenced 53.3810
> 10679 lru_add_drain 133.4875
And so page reclaim has gone crazy.
We need to work out why the dirty memory levels are so high.
Can you please reduce the number of filesystems, see if that reduces the
dirty levels?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 18:48 ` Andrew Morton
@ 2005-07-26 19:12 ` Andrew Morton
2005-07-26 20:36 ` Badari Pulavarty
2005-07-26 21:11 ` Badari Pulavarty
0 siblings, 2 replies; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 19:12 UTC (permalink / raw)
To: pbadari, linux-kernel, linux-mm
Andrew Morton <akpm@osdl.org> wrote:
>
> Can you please reduce the number of filesystems, see if that reduces the
> dirty levels?
Also, it's conceivable that ext3 is implicated here, so it might be saner
to perform initial investigation on ext2.
(when kjournald writes back a page via its buffers, the page remains
"dirty" as far as the VFS is concerned. Later, someone tries to do a
writepage() on it and we'll discover the buffers' cleanness and the page
will be cleaned without any I/O being performed. All the throttling
_should_ work OK in this case. But ext2 is more straightforward.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 18:39 ` Badari Pulavarty
2005-07-26 18:48 ` Andrew Morton
@ 2005-07-26 19:31 ` Sonny Rao
2005-07-26 20:37 ` Badari Pulavarty
1 sibling, 1 reply; 28+ messages in thread
From: Sonny Rao @ 2005-07-26 19:31 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: Andrew Morton, lkml, linux-mm
On Tue, Jul 26, 2005 at 11:39:11AM -0700, Badari Pulavarty wrote:
> On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote:
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > >
> > > After KS & OLS discussions about memory pressure, I wanted to re-do
> > > iSCSI testing with "dd"s to see if we are throttling writes.
> > >
> > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
> > > 50 dds (one per filesystem). System seems to throttle memory properly
> > > and making progress. (Machine doesn't respond very well for anything
> > > else, but my vmstat keeps running - 100% sys time).
> >
> > It's important to monitor /proc/meminfo too - the amount of dirty/writeback
> > pages, etc.
> >
> > btw, 100% system time is quite appalling. Are you sure vmstat is telling
> > the truth? If so, where's it all being spent?
> >
> >
>
> Well, profile doesn't show any time in "default_idle". So
> I believe, vmstat is telling the truth.
Badari,
You probably covered this, but just to make sure, if you're on a
pentium4 machine, I usually boot w/ "idle=poll" to see proper idle
reporting because otherwise the chip will throttle itself back and
idle time will be skewed -- at least on oprofile.
Sonny
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 19:12 ` Andrew Morton
@ 2005-07-26 20:36 ` Badari Pulavarty
2005-07-26 21:11 ` Badari Pulavarty
1 sibling, 0 replies; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 20:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Tue, 2005-07-26 at 12:12 -0700, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Can you please reduce the number of filesystems, see if that reduces the
> > dirty levels?
>
> Also, it's conceivable that ext3 is implicated here, so it might be saner
> to perform initial investigation on ext2.
>
> (when kjournald writes back a page via its buffers, the page remains
> "dirty" as far as the VFS is concerned. Later, someone tries to do a
> writepage() on it and we'll discover the buffers' cleanness and the page
> will be cleaned without any I/O being performed. All the throttling
> _should_ work OK in this case. But ext2 is more straightforward.)
I will try ext2 next.
- Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 19:31 ` Sonny Rao
@ 2005-07-26 20:37 ` Badari Pulavarty
2005-07-26 21:21 ` Andrew Morton
0 siblings, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 20:37 UTC (permalink / raw)
To: Sonny Rao; +Cc: Andrew Morton, lkml, linux-mm
On Tue, 2005-07-26 at 15:31 -0400, Sonny Rao wrote:
> On Tue, Jul 26, 2005 at 11:39:11AM -0700, Badari Pulavarty wrote:
> > On Tue, 2005-07-26 at 11:11 -0700, Andrew Morton wrote:
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > >
> > > > After KS & OLS discussions about memory pressure, I wanted to re-do
> > > > iSCSI testing with "dd"s to see if we are throttling writes.
> > > >
> > > > I created 50 10-GB ext3 filesystems on iSCSI luns. Test is simple
> > > > 50 dds (one per filesystem). System seems to throttle memory properly
> > > > and making progress. (Machine doesn't respond very well for anything
> > > > else, but my vmstat keeps running - 100% sys time).
> > >
> > > It's important to monitor /proc/meminfo too - the amount of dirty/writeback
> > > pages, etc.
> > >
> > > btw, 100% system time is quite appalling. Are you sure vmstat is telling
> > > the truth? If so, where's it all being spent?
> > >
> > >
> >
> > Well, profile doesn't show any time in "default_idle". So
> > I believe, vmstat is telling the truth.
>
> Badari,
>
> You probably covered this, but just to make sure, if you're on a
> pentium4 machine, I usually boot w/ "idle=poll" to see proper idle
> reporting because otherwise the chip will throttle itself back and
> idle time will be skewed -- at least on oprofile.
>
My machine is AMD64.
- Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty
2005-07-26 18:04 ` Roland Dreier
2005-07-26 18:11 ` Andrew Morton
@ 2005-07-26 20:59 ` Rik van Riel
2005-07-26 21:05 ` Badari Pulavarty
2005-07-26 21:12 ` Andrew Morton
2 siblings, 2 replies; 28+ messages in thread
From: Rik van Riel @ 2005-07-26 20:59 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: lkml, linux-mm, Andrew Morton
On Tue, 26 Jul 2005, Badari Pulavarty wrote:
> After KS & OLS discussions about memory pressure, I wanted to re-do
> iSCSI testing with "dd"s to see if we are throttling writes.
Could you also try with shared writable mmap, to see if that
works ok or triggers a deadlock ?
--
The Theory of Escalating Commitment: "The cost of continuing mistakes is
borne by others, while the cost of admitting mistakes is borne by yourself."
-- Joseph Stiglitz, Nobel Laureate in Economics
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 20:59 ` Rik van Riel
@ 2005-07-26 21:05 ` Badari Pulavarty
2005-07-26 21:33 ` Martin J. Bligh
2005-07-26 21:12 ` Andrew Morton
1 sibling, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 21:05 UTC (permalink / raw)
To: Rik van Riel; +Cc: lkml, linux-mm, Andrew Morton
On Tue, 2005-07-26 at 16:59 -0400, Rik van Riel wrote:
> On Tue, 26 Jul 2005, Badari Pulavarty wrote:
>
> > After KS & OLS discussions about memory pressure, I wanted to re-do
> > iSCSI testing with "dd"s to see if we are throttling writes.
>
> Could you also try with shared writable mmap, to see if that
> works ok or triggers a deadlock ?
I can, but lets finish addressing one issue at a time. Last time,
I changed too many things at the same time and got no where :(
Thanks,
Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 19:12 ` Andrew Morton
2005-07-26 20:36 ` Badari Pulavarty
@ 2005-07-26 21:11 ` Badari Pulavarty
2005-07-26 21:24 ` Andrew Morton
1 sibling, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 21:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]
On Tue, 2005-07-26 at 12:12 -0700, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Can you please reduce the number of filesystems, see if that reduces the
> > dirty levels?
>
> Also, it's conceivable that ext3 is implicated here, so it might be saner
> to perform initial investigation on ext2.
>
> (when kjournald writes back a page via its buffers, the page remains
> "dirty" as far as the VFS is concerned. Later, someone tries to do a
> writepage() on it and we'll discover the buffers' cleanness and the page
> will be cleaned without any I/O being performed. All the throttling
> _should_ work OK in this case. But ext2 is more straightforward.)
ext2 is incredibly better. Machine is very responsive.
# echo 2 > /proc/profile; sleep 5; readprofile -
m /usr/src/*12.3/System.map | sort -nr
28671 total 0.0096
25024 default_idle 521.3333
1987 shrink_zone 0.5285
163 tg3_poll 0.0666
154 unlock_page 2.4062
113 page_referenced 0.3363
106 copy_user_generic 0.3557
98 __wake_up_bit 2.0417
74 release_pages 0.1779
71 page_waitqueue 0.7396
51 tg3_start_xmit 0.0287
39 __make_request 0.0290
36 tcp_ack 0.0048
30 tcp_sendpage 0.0100
30 scsi_request_fn 0.0260
28 tg3_interrupt_tagged 0.0700
27 kmem_cache_alloc 0.4219
23 kmem_cache_free 0.2396
22 rotate_reclaimable_page 0.0859
20 established_get_next 0.0595
20 cond_resched 0.1786
20 __mod_page_state 0.4167
16 tcp_transmit_skb 0.0081
15 memset 0.0781
15 __kfree_skb 0.0521
14 tcp_write_xmit 0.0194
14 handle_IRQ_event 0.1458
12 skb_clone 0.0214
12 kfree 0.0500
12 end_buffer_async_write 0.0469
11 tcp_v4_rcv 0.0041
10 test_set_page_writeback 0.0329
Thanks,
Badari
[-- Attachment #2: vmstat-ext2.out --]
[-- Type: text/plain, Size: 1375 bytes --]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 56 4 33372 12512 6794560 0 0 142 1451 10283 1632 0 7 0 93
0 56 4 35488 12496 6791996 0 0 131 1762 10335 1583 0 3 0 96
0 56 4 33132 12540 6794532 0 0 1 1320 10228 2082 0 4 0 96
0 56 4 33132 12684 6794388 0 0 35 2054 10414 1973 0 7 0 93
0 56 4 33380 12712 6794876 0 0 0 2676 10635 2739 0 6 0 94
0 56 4 33132 12672 6793368 0 0 2 6799 10240 2617 0 10 0 90
0 56 4 33132 12608 6793948 0 0 0 10525 10249 2945 0 10 0 90
2 56 4 33380 12528 6792996 0 0 1 12566 11081 2813 0 12 0 88
1 55 4 33380 12368 6793672 0 0 1 9206 10237 2608 0 13 0 87
0 56 4 33132 12176 6793348 0 0 0 10939 10156 2744 0 17 0 83
2 59 4 33256 12060 6794496 0 0 5 11706 10464 2746 0 15 0 85
0 56 4 33504 11844 6794196 0 0 0 12196 10525 2835 0 17 0 83
0 56 4 33504 11592 6795480 0 0 0 8656 10463 2692 0 10 0 90
0 56 4 33132 11492 6796612 0 0 1 9022 10222 2496 0 11 0 89
2 55 4 33256 11384 6796720 0 0 0 9661 10830 2813 0 9 0 91
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 20:59 ` Rik van Riel
2005-07-26 21:05 ` Badari Pulavarty
@ 2005-07-26 21:12 ` Andrew Morton
1 sibling, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 21:12 UTC (permalink / raw)
To: Rik van Riel; +Cc: pbadari, linux-kernel, linux-mm
Rik van Riel <riel@redhat.com> wrote:
>
> On Tue, 26 Jul 2005, Badari Pulavarty wrote:
>
> > After KS & OLS discussions about memory pressure, I wanted to re-do
> > iSCSI testing with "dd"s to see if we are throttling writes.
>
> Could you also try with shared writable mmap, to see if that
> works ok or triggers a deadlock ?
>
That'll cause problems for sure, but we need to get `dd' right first :(
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 20:37 ` Badari Pulavarty
@ 2005-07-26 21:21 ` Andrew Morton
0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 21:21 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: sonny, linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> > You probably covered this, but just to make sure, if you're on a
> > pentium4 machine, I usually boot w/ "idle=poll" to see proper idle
> > reporting because otherwise the chip will throttle itself back and
> > idle time will be skewed -- at least on oprofile.
> >
>
> My machine is AMD64.
I'd expect the problem to which Sonny refers will occur on many
architectures.
IIRC, the problem is that many (or all) of the counters which oprofile uses
are turned off when the CPU does a halt. So the profiler ends up thinking
that zero time is spent in the idle handler. The net effect is that if
your workload spends 90% of its time idle then all the other profiler hits
are exaggerated by a factor of ten. Making the CPU busywait in idle()
fixes this.
But you're using the old /proc/profile profiler which uses a free-running
timer which doesn't get stopped by halt, so it is unaffected by this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 21:11 ` Badari Pulavarty
@ 2005-07-26 21:24 ` Andrew Morton
2005-07-26 21:45 ` Badari Pulavarty
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 21:24 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> ext2 is incredibly better. Machine is very responsive.
>
OK. Please, always monitor and send /proc/meminfo. I assume that the
dirty-memory clamping is working OK with ext2 and that perhaps it'll work
OK with ext3/data=writeback.
All very odd. I wonder how to reproduce this. Maybe 50 ext3 filesystems
on regular old scsi will do it?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 21:05 ` Badari Pulavarty
@ 2005-07-26 21:33 ` Martin J. Bligh
2005-07-26 22:05 ` Adam Litke
0 siblings, 1 reply; 28+ messages in thread
From: Martin J. Bligh @ 2005-07-26 21:33 UTC (permalink / raw)
To: Badari Pulavarty, Rik van Riel, agl; +Cc: lkml, linux-mm, Andrew Morton
>> > After KS & OLS discussions about memory pressure, I wanted to re-do
>> > iSCSI testing with "dd"s to see if we are throttling writes.
>>
>> Could you also try with shared writable mmap, to see if that
>> works ok or triggers a deadlock ?
>
>
> I can, but lets finish addressing one issue at a time. Last time,
> I changed too many things at the same time and got no where :(
Adam is working that one, but not over iSCSI.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 21:24 ` Andrew Morton
@ 2005-07-26 21:45 ` Badari Pulavarty
2005-07-26 22:10 ` Andrew Morton
0 siblings, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 21:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > ext2 is incredibly better. Machine is very responsive.
> >
>
> OK. Please, always monitor and send /proc/meminfo. I assume that the
> dirty-memory clamping is working OK with ext2 and that perhaps it'll work
> OK with ext3/data=writeback.
Nope. Dirty is still very high..
# cat /proc/meminfo
MemTotal: 7143628 kB
MemFree: 33248 kB
Buffers: 8368 kB
Cached: 6789932 kB
SwapCached: 0 kB
Active: 51316 kB
Inactive: 6769144 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 33248 kB
SwapTotal: 1048784 kB
SwapFree: 1048780 kB
Dirty: 6605704 kB
Writeback: 168452 kB
Mapped: 49724 kB
Slab: 252200 kB
CommitLimit: 4620596 kB
Committed_AS: 163524 kB
PageTables: 2284 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
Thanks,
Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 21:33 ` Martin J. Bligh
@ 2005-07-26 22:05 ` Adam Litke
0 siblings, 0 replies; 28+ messages in thread
From: Adam Litke @ 2005-07-26 22:05 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Badari Pulavarty [imap], Rik van Riel, lkml, linux-mm, Andrew Morton
On Tue, 2005-07-26 at 16:33, Martin J. Bligh wrote:
> >> > After KS & OLS discussions about memory pressure, I wanted to re-do
> >> > iSCSI testing with "dd"s to see if we are throttling writes.
> >>
> >> Could you also try with shared writable mmap, to see if that
> >> works ok or triggers a deadlock ?
> >
> >
> > I can, but lets finish addressing one issue at a time. Last time,
> > I changed too many things at the same time and got no where :(
>
> Adam is working that one, but not over iSCSI.
I wrote a simple/ugly C program to demonstrate the MAP_SHARED,PROT_WRITE
case. I was able to saturate the system with 75% of all memory in dirty
pages before I got bored.
To reproduce:
- Create a 3GB file with dd
- ./map-shared-dirty bigfile <number of chunks>
I break up the mmap & dirty operation into chunks in case the system is
tight on memory. Choose a large enough number of chunks so the
individual mmaps will be small enough for your system to accomodate.
--
MemTotal: 4092492 kB
MemFree: 786988 kB
Buffers: 6372 kB
Cached: 3211388 kB
SwapCached: 0 kB
Active: 3197428 kB
Inactive: 36696 kB
HighTotal: 3211264 kB
HighFree: 1024 kB
LowTotal: 881228 kB
LowFree: 785964 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 3117300 kB
Writeback: 3568 kB
Mapped: 24780 kB
Slab: 59316 kB
Committed_AS: 49760 kB
PageTables: 780 kB
VmallocTotal: 114680 kB
VmallocUsed: 32 kB
VmallocChunk: 114648 kB
/*
* map-shared-dirty.c - Demonstrate a loophole in dirty-ratio when
* heavily dirtying MAP_SHARED memory.
*
* Usage: (I know it's ugly)
* ./map-shared-dirty <large file> <number of chunks>
*/
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdio.h>
size_t page_size;
void dirty_file(int fd, unsigned long bytes, size_t map_offset) {
char *addr;
addr = mmap(NULL, bytes, PROT_READ|PROT_WRITE, MAP_SHARED, fd, map_offset);
if (addr == MAP_FAILED) {
fprintf(stderr, "Failed to map file\n");
fprintf(stderr, "bytes: %i offset: %i\n", bytes,map_offset);
exit(1);
}
/* Dirty the pages */
memset(addr, map_offset%255, bytes);
munmap(addr, bytes);
}
int main(int argc, char **argv)
{
char *filename = argv[1];
int chunks = atoi(argv[2]);
int fd;
unsigned long i, chunk_size, bytes;
struct stat file_info;
fd = open(filename, O_RDWR|0100000); /* O_LARGEFILE */
if (fd <= 0) {
fprintf(stderr, "Failed to open file\n");
exit(1);
}
fstat(fd, &file_info);
bytes = file_info.st_size;
page_size = getpagesize();
chunk_size = (bytes / chunks) & ~(page_size - 1);
printf("Chunk size = %i\n", chunk_size);
for (i = 0; i < bytes; i+=chunk_size)
dirty_file(fd, chunk_size, i);
exit(0);
}
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 21:45 ` Badari Pulavarty
@ 2005-07-26 22:10 ` Andrew Morton
2005-07-26 22:48 ` Badari Pulavarty
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 22:10 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote:
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > >
> > > ext2 is incredibly better. Machine is very responsive.
> > >
> >
> > OK. Please, always monitor and send /proc/meminfo. I assume that the
> > dirty-memory clamping is working OK with ext2 and that perhaps it'll work
> > OK with ext3/data=writeback.
>
> Nope. Dirty is still very high..
That's a relief in a way. Can you please try decreasing the number of
filesystems now?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 22:10 ` Andrew Morton
@ 2005-07-26 22:48 ` Badari Pulavarty
2005-07-26 23:07 ` Andrew Morton
0 siblings, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 22:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
[-- Attachment #1: Type: text/plain, Size: 887 bytes --]
On Tue, 2005-07-26 at 15:10 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > On Tue, 2005-07-26 at 14:24 -0700, Andrew Morton wrote:
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > >
> > > > ext2 is incredibly better. Machine is very responsive.
> > > >
> > >
> > > OK. Please, always monitor and send /proc/meminfo. I assume that the
> > > dirty-memory clamping is working OK with ext2 and that perhaps it'll work
> > > OK with ext3/data=writeback.
> >
> > Nope. Dirty is still very high..
>
> That's a relief in a way. Can you please try decreasing the number of
> filesystems now?
Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo
every 5 seconds. As you can see, we seem to dirty 6GB of data in 20
seconds of starting the test. I am not sure if its bad, since we have
lots of free memory..
Thanks,
Badari
[-- Attachment #2: vmstat-5-ext2.out --]
[-- Type: text/plain, Size: 969 bytes --]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 11 120 32912 10624 6813476 0 0 0 6766 10364 485 0 4 0 96
0 11 120 33036 10652 6813964 0 0 2 8889 10079 475 0 4 0 96
0 11 120 33036 10712 6813904 0 0 0 8077 9984 469 0 4 0 96
0 11 120 32912 10752 6814380 0 0 0 15576 10226 514 0 4 0 95
0 11 120 33036 10668 6813432 0 0 0 11334 10112 488 0 4 0 96
0 11 120 33656 10600 6813500 0 0 0 11811 10238 497 0 4 0 96
0 11 120 33036 10596 6814020 0 0 0 12713 10191 489 0 4 0 96
0 11 120 33036 10648 6813968 0 0 1 15775 10195 508 0 4 0 96
0 10 120 33780 10656 6812928 0 0 2 5390 10265 503 0 3 5 92
0 11 120 33036 10660 6813440 0 0 0 9700 10217 518 0 4 2 94
[-- Attachment #3: meminfo.out --]
[-- Type: text/plain, Size: 3384 bytes --]
MemTotal: 7143628 kB
MemFree: 7001860 kB
Buffers: 5080 kB
Cached: 23300 kB
SwapCached: 0 kB
Active: 48600 kB
Inactive: 5872 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 7001860 kB
SwapTotal: 1048784 kB
SwapFree: 1048780 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 45948 kB
Slab: 56348 kB
CommitLimit: 4620596 kB
Committed_AS: 148436 kB
PageTables: 1544 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
MemTotal: 7143628 kB
MemFree: 4871864 kB
Buffers: 14564 kB
Cached: 2091232 kB
SwapCached: 0 kB
Active: 51380 kB
Inactive: 2081780 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 4871864 kB
SwapTotal: 1048784 kB
SwapFree: 1048780 kB
Dirty: 2070752 kB
Writeback: 0 kB
Mapped: 46368 kB
Slab: 107912 kB
CommitLimit: 4620596 kB
Committed_AS: 148524 kB
PageTables: 1608 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
MemTotal: 7143628 kB
MemFree: 406384 kB
Buffers: 18940 kB
Cached: 6443960 kB
SwapCached: 0 kB
Active: 55688 kB
Inactive: 6435048 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 406384 kB
SwapTotal: 1048784 kB
SwapFree: 1048780 kB
Dirty: 6144652 kB
Writeback: 252152 kB
Mapped: 46380 kB
Slab: 216580 kB
CommitLimit: 4620596 kB
Committed_AS: 148756 kB
PageTables: 1608 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
MemTotal: 7143628 kB
MemFree: 32772 kB
Buffers: 10028 kB
Cached: 6817680 kB
SwapCached: 4 kB
Active: 48180 kB
Inactive: 6804552 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 32772 kB
SwapTotal: 1048784 kB
SwapFree: 1048664 kB
Dirty: 6489496 kB
Writeback: 285264 kB
Mapped: 46000 kB
Slab: 228172 kB
CommitLimit: 4620596 kB
Committed_AS: 148756 kB
PageTables: 1608 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
MemTotal: 7143628 kB
MemFree: 32524 kB
Buffers: 10056 kB
Cached: 6816620 kB
SwapCached: 4 kB
Active: 48672 kB
Inactive: 6803212 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 7143628 kB
LowFree: 32524 kB
SwapTotal: 1048784 kB
SwapFree: 1048664 kB
Dirty: 6465124 kB
Writeback: 268876 kB
Mapped: 46008 kB
Slab: 229580 kB
CommitLimit: 4620596 kB
Committed_AS: 148996 kB
PageTables: 1608 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9888 kB
VmallocChunk: 34359728447 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 22:48 ` Badari Pulavarty
@ 2005-07-26 23:07 ` Andrew Morton
2005-07-26 23:26 ` Badari Pulavarty
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-26 23:07 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo
> every 5 seconds. As you can see, we seem to dirty 6GB of data in 20
> seconds of starting the test. I am not sure if its bad, since we have
> lots of free memory..
It's bad. The logic in balance_dirty_pages() should block those write()
callers as soon as we hit 40% dirty memory or whatever is in
/proc/sys/vm/dirty_ratio. So something is horridly busted.
Can you try reducing the number of filesystems even further?
Either the underlying block driver is doing something most bizarre to the
VFS or something has gone wrong with the arithmetic in page-writeback.c.
If total_pages or ratelimit_pages are totally wrong or if
get_dirty_limits() is returning junk then we'd be seeing something like
this.
It'll be something simple - if you have time, stick some printks in
balance_dirty_pages(), work out why it is not remaining in that `for' loop
until dirty memory has fallen below the 40%.
I'll take a shot at reproducing this on my 4G x86_64 box, but this is so
grossly wrong that I'm sure it would have been noted before now if it was
commonly happening (famous last words).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 23:07 ` Andrew Morton
@ 2005-07-26 23:26 ` Badari Pulavarty
2005-07-27 0:31 ` Andrew Morton
0 siblings, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-26 23:26 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo
> > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20
> > seconds of starting the test. I am not sure if its bad, since we have
> > lots of free memory..
>
> It's bad. The logic in balance_dirty_pages() should block those write()
> callers as soon as we hit 40% dirty memory or whatever is in
> /proc/sys/vm/dirty_ratio. So something is horridly busted.
>
> Can you try reducing the number of filesystems even further?
Single ext2 filesystem. We still dirty pretty quickly (data collected
every 5 seconds).
# grep Dirty OUT
Dirty: 312 kB
Dirty: 1121852 kB
Dirty: 2896952 kB
Dirty: 4344564 kB
Dirty: 5310856 kB
Dirty: 5507812 kB
Dirty: 5714884 kB
Dirty: 5865132 kB
Dirty: 6004276 kB
Dirty: 6206544 kB
Dirty: 6380524 kB
Dirty: 6583200 kB
Dirty: 6727296 kB
Dirty: 6708564 kB
Dirty: 6733768 kB
Dirty: 6737868 kB
Thanks,
Badari
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-26 23:26 ` Badari Pulavarty
@ 2005-07-27 0:31 ` Andrew Morton
2005-07-27 1:20 ` Martin J. Bligh
2005-07-27 1:31 ` Badari Pulavarty
0 siblings, 2 replies; 28+ messages in thread
From: Andrew Morton @ 2005-07-27 0:31 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote:
> > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > >
> > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo
> > > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20
> > > seconds of starting the test. I am not sure if its bad, since we have
> > > lots of free memory..
> >
> > It's bad. The logic in balance_dirty_pages() should block those write()
> > callers as soon as we hit 40% dirty memory or whatever is in
> > /proc/sys/vm/dirty_ratio. So something is horridly busted.
> >
> > Can you try reducing the number of filesystems even further?
>
> Single ext2 filesystem. We still dirty pretty quickly (data collected
> every 5 seconds).
It happens here, a bit. My machine goes up to 60% dirty when it should be
clamping at 40%.
The variable `total_pages' in page-writeback.c (from
nr_free_pagecache_pages()) is too high. I trace it back to here:
On node 0 totalpages: 1572864
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 1568768 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
This machine only has 4G of memory, so the platform code is overestimating
the number of pages by 50%. Can you please check your dmesg, see if your
system is also getting this wrong?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-27 0:31 ` Andrew Morton
@ 2005-07-27 1:20 ` Martin J. Bligh
2005-07-27 1:26 ` Andrew Morton
2005-07-27 1:31 ` Badari Pulavarty
1 sibling, 1 reply; 28+ messages in thread
From: Martin J. Bligh @ 2005-07-27 1:20 UTC (permalink / raw)
To: Andrew Morton, Badari Pulavarty; +Cc: linux-kernel, linux-mm
> It happens here, a bit. My machine goes up to 60% dirty when it should be
> clamping at 40%.
>
> The variable `total_pages' in page-writeback.c (from
> nr_free_pagecache_pages()) is too high. I trace it back to here:
>
> On node 0 totalpages: 1572864
> DMA zone: 4096 pages, LIFO batch:1
> Normal zone: 1568768 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
>
> This machine only has 4G of memory, so the platform code is overestimating
> the number of pages by 50%. Can you please check your dmesg, see if your
> system is also getting this wrong?
I think we're repeatedly iterating over the same zones by walking the
zonelists:
static unsigned int nr_free_zone_pages(int offset)
{
pg_data_t *pgdat;
unsigned int sum = 0;
int i;
for_each_pgdat(pgdat) {
struct zone *zone;
for (i = 0; i < MAX_NR_ZONES; i++) {
unsigned long size, high;
zone = pgdat->node_zones[i];
size = zone->present_pages;
high = zone->pages_high;
if (size > high)
sum += size - high;
}
}
}
Does that look more sensible? I'd send you a real patch, except the
box just crashed ;-)
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-27 1:20 ` Martin J. Bligh
@ 2005-07-27 1:26 ` Andrew Morton
2005-07-27 1:47 ` Martin J. Bligh
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2005-07-27 1:26 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: pbadari, linux-kernel, linux-mm
"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
>
> > It happens here, a bit. My machine goes up to 60% dirty when it should be
> > clamping at 40%.
> >
> > The variable `total_pages' in page-writeback.c (from
> > nr_free_pagecache_pages()) is too high. I trace it back to here:
> >
> > On node 0 totalpages: 1572864
> > DMA zone: 4096 pages, LIFO batch:1
> > Normal zone: 1568768 pages, LIFO batch:31
> > HighMem zone: 0 pages, LIFO batch:1
> >
> > This machine only has 4G of memory, so the platform code is overestimating
> > the number of pages by 50%. Can you please check your dmesg, see if your
> > system is also getting this wrong?
>
> I think we're repeatedly iterating over the same zones by walking the
> zonelists:
>
> static unsigned int nr_free_zone_pages(int offset)
> {
> pg_data_t *pgdat;
> unsigned int sum = 0;
> int i;
>
> for_each_pgdat(pgdat) {
> struct zone *zone;
>
> for (i = 0; i < MAX_NR_ZONES; i++) {
> unsigned long size, high;
>
> zone = pgdat->node_zones[i];
> size = zone->present_pages;
> high = zone->pages_high;
>
> if (size > high)
> sum += size - high;
> }
> }
> }
I don't think so. We're getting the wrong answer out of
calculate_zone_totalpages() which is an init-time thing.
Maybe nr_free_zone_pages() is supposed to fix that up post-facto somehow,
but calculate_zone_totalpages() sure as heck shouldn't be putting 1568768
into my ZONE_NORMAL's ->node_present_pages.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-27 0:31 ` Andrew Morton
2005-07-27 1:20 ` Martin J. Bligh
@ 2005-07-27 1:31 ` Badari Pulavarty
2005-07-27 1:40 ` Andrew Morton
1 sibling, 1 reply; 28+ messages in thread
From: Badari Pulavarty @ 2005-07-27 1:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, linux-mm
On Tue, 2005-07-26 at 17:31 -0700, Andrew Morton wrote:
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> >
> > On Tue, 2005-07-26 at 16:07 -0700, Andrew Morton wrote:
> > > Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > > >
> > > > Here is the data with 5 ext2 filesystems. I also collected /proc/meminfo
> > > > every 5 seconds. As you can see, we seem to dirty 6GB of data in 20
> > > > seconds of starting the test. I am not sure if its bad, since we have
> > > > lots of free memory..
> > >
> > > It's bad. The logic in balance_dirty_pages() should block those write()
> > > callers as soon as we hit 40% dirty memory or whatever is in
> > > /proc/sys/vm/dirty_ratio. So something is horridly busted.
> > >
> > > Can you try reducing the number of filesystems even further?
> >
> > Single ext2 filesystem. We still dirty pretty quickly (data collected
> > every 5 seconds).
>
> It happens here, a bit. My machine goes up to 60% dirty when it should be
> clamping at 40%.
>
> The variable `total_pages' in page-writeback.c (from
> nr_free_pagecache_pages()) is too high. I trace it back to here:
>
> On node 0 totalpages: 1572864
> DMA zone: 4096 pages, LIFO batch:1
> Normal zone: 1568768 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
>
> This machine only has 4G of memory, so the platform code is overestimating
> the number of pages by 50%. Can you please check your dmesg, see if your
> system is also getting this wrong?
On node 0 totalpages: 1572863
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 1568767 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
On node 1 totalpages: 131071
DMA zone: 0 pages, LIFO batch:1
Normal zone: 131071 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
On node 2 totalpages: 131071
DMA zone: 0 pages, LIFO batch:1
Normal zone: 131071 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
On node 3 totalpages: 131071
DMA zone: 0 pages, LIFO batch:1
Normal zone: 131071 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-27 1:31 ` Badari Pulavarty
@ 2005-07-27 1:40 ` Andrew Morton
0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2005-07-27 1:40 UTC (permalink / raw)
To: Badari Pulavarty; +Cc: linux-kernel, linux-mm
Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> > This machine only has 4G of memory, so the platform code is overestimating
> > the number of pages by 50%. Can you please check your dmesg, see if your
> > system is also getting this wrong?
>
>
>
> On node 0 totalpages: 1572863
> DMA zone: 4096 pages, LIFO batch:1
> Normal zone: 1568767 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
> On node 1 totalpages: 131071
> DMA zone: 0 pages, LIFO batch:1
> Normal zone: 131071 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
> On node 2 totalpages: 131071
> DMA zone: 0 pages, LIFO batch:1
> Normal zone: 131071 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
> On node 3 totalpages: 131071
> DMA zone: 0 pages, LIFO batch:1
> Normal zone: 131071 pages, LIFO batch:31
> HighMem zone: 0 pages, LIFO batch:1
That's 7.7GB, yes? On a 6GB machine?
If so, that's a bit off, but not grossly.
Here's the dopey debug patch which I used:
- boot
- dmesg -s 1000000 | grep total_pages > foo
- kill off syslogd (sudo service syslog stop)
- run the dd command
- wait for it to hit steady state (max dirty memory)
- dmesg -s 1000000 >> foo
diff -puN mm/page-writeback.c~a mm/page-writeback.c
--- 25/mm/page-writeback.c~a 2005-07-26 15:53:46.000000000 -0700
+++ 25-akpm/mm/page-writeback.c 2005-07-26 16:21:55.000000000 -0700
@@ -161,7 +161,8 @@ get_dirty_limits(struct writeback_state
dirty_ratio = vm_dirty_ratio;
if (dirty_ratio > unmapped_ratio / 2)
dirty_ratio = unmapped_ratio / 2;
-
+ printk("vm_dirty_ratio=%d unmapped_ratio=%d dirty_ratio=%d\n",
+ vm_dirty_ratio, unmapped_ratio, dirty_ratio);
if (dirty_ratio < 5)
dirty_ratio = 5;
@@ -171,6 +172,8 @@ get_dirty_limits(struct writeback_state
background = (background_ratio * available_memory) / 100;
dirty = (dirty_ratio * available_memory) / 100;
+ printk("dirty_ratio=%d available_memory=%lu dirty=%lu\n",
+ dirty_ratio, available_memory, dirty);
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
background += background / 4;
@@ -209,6 +212,12 @@ static void balance_dirty_pages(struct a
get_dirty_limits(&wbs, &background_thresh,
&dirty_thresh, mapping);
nr_reclaimable = wbs.nr_dirty + wbs.nr_unstable;
+ printk("background_thresh=%ld dirty_thresh=%ld "
+ "nr_dirty=%ld nr_unstable=%ld "
+ "nr_reclaimable=%ld wbs.nr_writeback=%ld\n",
+ background_thresh, dirty_thresh,
+ wbs.nr_dirty, wbs.nr_unstable,
+ nr_reclaimable, wbs.nr_writeback);
if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh)
break;
@@ -532,6 +541,8 @@ void __init page_writeback_init(void)
total_pages = nr_free_pagecache_pages();
+ printk("total_pages=%ld\n", total_pages);
+
correction = (100 * 4 * buffer_pages) / total_pages;
if (correction < 100) {
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Memory pressure handling with iSCSI
2005-07-27 1:26 ` Andrew Morton
@ 2005-07-27 1:47 ` Martin J. Bligh
0 siblings, 0 replies; 28+ messages in thread
From: Martin J. Bligh @ 2005-07-27 1:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: pbadari, linux-kernel, linux-mm
> I don't think so. We're getting the wrong answer out of
> calculate_zone_totalpages() which is an init-time thing.
>
> Maybe nr_free_zone_pages() is supposed to fix that up post-facto somehow,
> but calculate_zone_totalpages() sure as heck shouldn't be putting 1568768
> into my ZONE_NORMAL's ->node_present_pages.
Humpf. I'll look at it again later.
nr_free_pagecache_pages -> nr_free_zone_pages -> nr_free_zone_pages
is it not?
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2005-07-27 1:47 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-26 17:35 Memory pressure handling with iSCSI Badari Pulavarty
2005-07-26 18:04 ` Roland Dreier
2005-07-26 18:11 ` Andrew Morton
2005-07-26 18:39 ` Badari Pulavarty
2005-07-26 18:48 ` Andrew Morton
2005-07-26 19:12 ` Andrew Morton
2005-07-26 20:36 ` Badari Pulavarty
2005-07-26 21:11 ` Badari Pulavarty
2005-07-26 21:24 ` Andrew Morton
2005-07-26 21:45 ` Badari Pulavarty
2005-07-26 22:10 ` Andrew Morton
2005-07-26 22:48 ` Badari Pulavarty
2005-07-26 23:07 ` Andrew Morton
2005-07-26 23:26 ` Badari Pulavarty
2005-07-27 0:31 ` Andrew Morton
2005-07-27 1:20 ` Martin J. Bligh
2005-07-27 1:26 ` Andrew Morton
2005-07-27 1:47 ` Martin J. Bligh
2005-07-27 1:31 ` Badari Pulavarty
2005-07-27 1:40 ` Andrew Morton
2005-07-26 19:31 ` Sonny Rao
2005-07-26 20:37 ` Badari Pulavarty
2005-07-26 21:21 ` Andrew Morton
2005-07-26 20:59 ` Rik van Riel
2005-07-26 21:05 ` Badari Pulavarty
2005-07-26 21:33 ` Martin J. Bligh
2005-07-26 22:05 ` Adam Litke
2005-07-26 21:12 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox