* xmm2 - monitor Linux MM active/inactive lists graphically
@ 2001-10-24 10:42 Zlatko Calusic
2001-10-24 14:26 ` Marcelo Tosatti
0 siblings, 1 reply; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-24 10:42 UTC (permalink / raw)
To: linux-mm, linux-kernel
New version is out and can be found at the same URL:
<URL:http://linux.inet.hr/>
As Linus' MM lost inactive dirty/clean lists in favour of just one
inactive list, the application needed to be modified to support that.
You can still continue to use the older one for kernels <= 2.4.9
and/or Alan's (-ac) kernels, which continued to use older Rik's VM
system.
Enjoy and, as usual, all comments welcome!
--
Zlatko
P.S. BTW, 2.4.13 still has very unoptimal writeout performance and
andrea@suse.de is redirected to /dev/null. <g>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-24 10:42 xmm2 - monitor Linux MM active/inactive lists graphically Zlatko Calusic
@ 2001-10-24 14:26 ` Marcelo Tosatti
2001-10-25 0:25 ` Zlatko Calusic
0 siblings, 1 reply; 37+ messages in thread
From: Marcelo Tosatti @ 2001-10-24 14:26 UTC (permalink / raw)
To: Zlatko Calusic, Linus Torvalds; +Cc: linux-mm, lkml
On 24 Oct 2001, Zlatko Calusic wrote:
> P.S. BTW, 2.4.13 still has very unoptimal writeout performance and
> andrea@suse.de is redirected to /dev/null. <g>
Zlatko,
Could you please show us your case of bad writeout performance ?
Thanks
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-24 14:26 ` Marcelo Tosatti
@ 2001-10-25 0:25 ` Zlatko Calusic
2001-10-25 4:19 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-25 0:25 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-mm, lkml
Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> On 24 Oct 2001, Zlatko Calusic wrote:
>
> > P.S. BTW, 2.4.13 still has very unoptimal writeout performance and
> > andrea@suse.de is redirected to /dev/null. <g>
>
> Zlatko,
>
> Could you please show us your case of bad writeout performance ?
>
> Thanks
>
Sure. Output of 'vmstat 1' follows:
1 0 0 0 254552 5120 183476 0 0 12 24 178 438 2 37 60
0 1 0 0 137296 5232 297760 0 0 4 5284 195 440 3 43 54
1 0 0 0 126520 5244 308260 0 0 0 10588 215 230 0 3 96
0 2 0 0 117488 5252 317064 0 0 0 8796 176 139 1 3 96
0 2 0 0 107556 5264 326744 0 0 0 9704 174 78 0 3 97
0 2 0 0 99552 5268 334548 0 0 0 7880 174 67 0 3 97
0 2 0 0 89448 5280 344392 0 0 0 9804 175 76 0 4 96
0 1 0 0 79352 5288 354236 0 0 0 9852 176 87 0 5 95
0 1 0 0 71220 5300 362156 0 0 4 7884 170 120 0 4 96
0 1 0 0 63088 5308 370084 0 0 0 7936 174 76 0 3 97
0 2 0 0 52988 5320 379924 0 0 0 9920 175 77 0 4 96
0 2 0 0 43148 5328 389516 0 0 0 9548 174 97 0 4 95
0 2 0 0 35144 5336 397316 0 0 0 7820 176 73 0 3 97
0 2 0 0 25172 5344 407036 0 0 0 9724 188 183 0 4 96
0 2 1 0 17300 5352 414708 0 0 0 7744 174 78 0 4 96
0 1 0 0 7068 5360 424684 0 0 0 9920 175 93 0 3 97
0 1 0 0 3128 4132 430132 0 0 0 9920 174 81 0 4 96
Notice how there's planty of RAM. I'm writing sequentially to a file
on the ext2 filesystem. The disk I'm writing on is a 7200rpm IDE,
capable of ~ 22 MB/s and I'm still getting only ~ 9 MB/s. Weird!
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 0:25 ` Zlatko Calusic
@ 2001-10-25 4:19 ` Linus Torvalds
2001-10-25 4:57 ` Linus Torvalds
2001-10-25 9:07 ` Zlatko Calusic
0 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-10-25 4:19 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Marcelo Tosatti, linux-mm, lkml
On 25 Oct 2001, Zlatko Calusic wrote:
>
> Sure. Output of 'vmstat 1' follows:
>
> 1 0 0 0 254552 5120 183476 0 0 12 24 178 438 2 37 60
> 0 1 0 0 137296 5232 297760 0 0 4 5284 195 440 3 43 54
> 1 0 0 0 126520 5244 308260 0 0 0 10588 215 230 0 3 96
> 0 2 0 0 117488 5252 317064 0 0 0 8796 176 139 1 3 96
> 0 2 0 0 107556 5264 326744 0 0 0 9704 174 78 0 3 97
This does not look like a VM issue at all - at this point you're already
getting only 10MB/s, yet the VM isn't even involved (there's definitely no
VM pressure here).
> Notice how there's planty of RAM. I'm writing sequentially to a file
> on the ext2 filesystem. The disk I'm writing on is a 7200rpm IDE,
> capable of ~ 22 MB/s and I'm still getting only ~ 9 MB/s. Weird!
Are you sure you haven't lost some DMA setting or something?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 4:19 ` Linus Torvalds
@ 2001-10-25 4:57 ` Linus Torvalds
2001-10-25 12:48 ` Zlatko Calusic
2001-10-25 9:07 ` Zlatko Calusic
1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-10-25 4:57 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Marcelo Tosatti, linux-mm, lkml
On Wed, 24 Oct 2001, Linus Torvalds wrote:
>
> On 25 Oct 2001, Zlatko Calusic wrote:
> >
> > Sure. Output of 'vmstat 1' follows:
> >
> > 1 0 0 0 254552 5120 183476 0 0 12 24 178 438 2 37 60
> > 0 1 0 0 137296 5232 297760 0 0 4 5284 195 440 3 43 54
> > 1 0 0 0 126520 5244 308260 0 0 0 10588 215 230 0 3 96
> > 0 2 0 0 117488 5252 317064 0 0 0 8796 176 139 1 3 96
> > 0 2 0 0 107556 5264 326744 0 0 0 9704 174 78 0 3 97
>
> This does not look like a VM issue at all - at this point you're already
> getting only 10MB/s, yet the VM isn't even involved (there's definitely no
> VM pressure here).
I wonder if you're getting screwed by bdflush().. You do have a lot of
context switching going on, and you do have a clear pattern: once the
write-out gets going, you're filling new cached pages at about the same
pace that you're writing them out, which definitely means that the dirty
buffer balancing is nice and active.
So the problem is that you're obviously not actually getting the
throughput you should - it's not the VM, as the page cache grows nicely at
the same rate you're writing.
Try something for me: in fs/buffer.c make "balance_dirty_state()" never
return > 0, ie make the "return 1" be a "return 0" instead.
That will cause us to not wake up bdflush at all, and if you're just on
the "border" of 40% dirty buffer usage you'll have bdflush work in
lock-step with you, alternately writing out buffers and waiting for them.
Quite frankly, just the act of doing the "write_some_buffers()" in
balance_dirty() should cause us to block much better than the synchronous
waiting anyway, because then we will block when the request queue fills
up, not at random points.
Even so, considering that you have such a steady 9-10MB/s, please double-
check that it's not something even simpler and embarrassing, like just
having forgotten to enable auto-DMA in the kernel config ;)
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 4:19 ` Linus Torvalds
2001-10-25 4:57 ` Linus Torvalds
@ 2001-10-25 9:07 ` Zlatko Calusic
1 sibling, 0 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-25 9:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-mm, lkml
Linus Torvalds <torvalds@transmeta.com> writes:
> On 25 Oct 2001, Zlatko Calusic wrote:
> >
> > Sure. Output of 'vmstat 1' follows:
> >
> > 1 0 0 0 254552 5120 183476 0 0 12 24 178 438 2 37 60
> > 0 1 0 0 137296 5232 297760 0 0 4 5284 195 440 3 43 54
> > 1 0 0 0 126520 5244 308260 0 0 0 10588 215 230 0 3 96
> > 0 2 0 0 117488 5252 317064 0 0 0 8796 176 139 1 3 96
> > 0 2 0 0 107556 5264 326744 0 0 0 9704 174 78 0 3 97
>
> This does not look like a VM issue at all - at this point you're already
> getting only 10MB/s, yet the VM isn't even involved (there's definitely no
> VM pressure here).
That's true, I'll admit. Anyway, -ac kernels don't have the problem,
and I was misleaded by the fact that only VM implementation differs in
those two branches (at least I think so).
>
> > Notice how there's planty of RAM. I'm writing sequentially to a file
> > on the ext2 filesystem. The disk I'm writing on is a 7200rpm IDE,
> > capable of ~ 22 MB/s and I'm still getting only ~ 9 MB/s. Weird!
>
> Are you sure you haven't lost some DMA setting or something?
>
No. Setup is fine. I wouldn't make such a mistake. :)
If the disk were in some PIO mode, CPU usage would be much higher, but
it isn't.
This all definitely looks like a problem either in the bdflush daemon,
or request queue/elevator, but unfortunately I don't have enough
knowledge of that areas to pinpoint it more precisely.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 4:57 ` Linus Torvalds
@ 2001-10-25 12:48 ` Zlatko Calusic
2001-10-25 16:31 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-25 12:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-mm, lkml
Linus Torvalds <torvalds@transmeta.com> writes:
> I wonder if you're getting screwed by bdflush().. You do have a lot of
> context switching going on, and you do have a clear pattern: once the
> write-out gets going, you're filling new cached pages at about the same
> pace that you're writing them out, which definitely means that the dirty
> buffer balancing is nice and active.
>
Yes, but things are similar when I finally allocate whole memory, and
kswapd kicks in. Everything is behaving in the same way, so it is
definitely not the VM, as you pointed out.
> So the problem is that you're obviously not actually getting the
> throughput you should - it's not the VM, as the page cache grows nicely at
> the same rate you're writing.
>
Yes.
> Try something for me: in fs/buffer.c make "balance_dirty_state()" never
> return > 0, ie make the "return 1" be a "return 0" instead.
>
Sure. I recompiled fresh 2.4.13 at the work an rerun tests. This time
on different setup, so numbers are even smaller (tests were performed
at the last partition of the disk, where disk is capable of ~ 13MB/s)
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 0 6308 600 441592 0 0 0 7788 159 132 0 7 93
0 1 0 0 3692 580 444272 0 0 0 5748 169 197 1 4 95
0 1 0 0 3180 556 444804 0 0 0 5632 228 408 1 5 94
0 1 0 0 3720 556 444284 0 0 0 7672 226 418 3 4 93
0 1 0 0 3836 556 444148 0 0 0 5928 249 509 0 8 92
0 1 0 0 3204 388 444952 0 0 0 7828 156 139 0 6 94
1 1 0 0 3456 392 444692 0 0 0 5952 157 139 0 5 95
0 1 0 0 3728 400 444428 0 0 0 7840 312 750 0 7 93
0 1 0 0 3968 404 444168 0 0 0 5952 216 364 0 5 95
> That will cause us to not wake up bdflush at all, and if you're just on
> the "border" of 40% dirty buffer usage you'll have bdflush work in
> lock-step with you, alternately writing out buffers and waiting for them.
>
> Quite frankly, just the act of doing the "write_some_buffers()" in
> balance_dirty() should cause us to block much better than the synchronous
> waiting anyway, because then we will block when the request queue fills
> up, not at random points.
>
> Even so, considering that you have such a steady 9-10MB/s, please double-
> check that it's not something even simpler and embarrassing, like just
> having forgotten to enable auto-DMA in the kernel config ;)
>
Yes, I definitely have DMA turned ON. All parameters are OK. :)
# hdparm /dev/hda
/dev/hda:
multcount = 16 (on)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 1650/255/63, sectors = 26520480, start = 0
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 12:48 ` Zlatko Calusic
@ 2001-10-25 16:31 ` Linus Torvalds
2001-10-25 17:33 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-10-25 16:31 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Marcelo Tosatti, linux-mm, lkml
On 25 Oct 2001, Zlatko Calusic wrote:
>
> Yes, I definitely have DMA turned ON. All parameters are OK. :)
I suspect it may just be that "queue_nr_requests"/"batch_count" is
different in -ac: what happens if you tweak them to the same values?
(See drivers/block/ll_rw_block.c)
I think -ac made the queues a bit deeper the regular kernel does 128
requests and a batch-count of 16, I _think_ -ac does something like "2
requests per megabyte" and batch_count=32, so if you have 512MB you should
try with
queue_nr_requests = 1024
batch_count = 32
Does that help?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 16:31 ` Linus Torvalds
@ 2001-10-25 17:33 ` Jens Axboe
2001-10-26 9:45 ` Zlatko Calusic
2001-10-26 10:08 ` Zlatko Calusic
2 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2001-10-25 17:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Zlatko Calusic, Marcelo Tosatti, linux-mm, lkml
On Thu, Oct 25 2001, Linus Torvalds wrote:
>
> On 25 Oct 2001, Zlatko Calusic wrote:
> >
> > Yes, I definitely have DMA turned ON. All parameters are OK. :)
>
> I suspect it may just be that "queue_nr_requests"/"batch_count" is
> different in -ac: what happens if you tweak them to the same values?
>
> (See drivers/block/ll_rw_block.c)
>
> I think -ac made the queues a bit deeper the regular kernel does 128
> requests and a batch-count of 16, I _think_ -ac does something like "2
> requests per megabyte" and batch_count=32, so if you have 512MB you should
> try with
>
> queue_nr_requests = 1024
> batch_count = 32
Right, -ac keeps the elevator flow control and proper queue sizes.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 16:31 ` Linus Torvalds
2001-10-25 17:33 ` Jens Axboe
@ 2001-10-26 9:45 ` Zlatko Calusic
2001-10-26 10:08 ` Zlatko Calusic
2 siblings, 0 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-26 9:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-mm, lkml
Linus Torvalds <torvalds@transmeta.com> writes:
> On 25 Oct 2001, Zlatko Calusic wrote:
> >
> > Yes, I definitely have DMA turned ON. All parameters are OK. :)
>
> I suspect it may just be that "queue_nr_requests"/"batch_count" is
> different in -ac: what happens if you tweak them to the same values?
>
> (See drivers/block/ll_rw_block.c)
>
> I think -ac made the queues a bit deeper the regular kernel does 128
> requests and a batch-count of 16, I _think_ -ac does something like "2
> requests per megabyte" and batch_count=32, so if you have 512MB you should
> try with
>
> queue_nr_requests = 1024
> batch_count = 32
>
> Does that help?
>
Unfortunately not. It makes a machine quite unresponsive while it's
writing to disk, and vmstat 1 discovers strange "spiky"
behaviour. Average throughput is ~ 8MB/s (disk is capable of ~ 13MB/s)
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 0 3840 528 441900 0 0 0 34816 188 594 2 34 64
0 1 0 0 3332 536 442384 0 0 4 10624 187 519 2 8 90
0 1 0 0 3324 536 442384 0 0 0 0 182 499 0 0 100
2 1 0 0 3300 536 442384 0 0 0 0 198 486 0 1 99
1 1 0 0 3304 536 442384 0 0 0 0 186 513 0 0 100
0 1 1 0 3304 536 442384 0 0 0 0 193 473 0 1 99
0 1 1 0 3304 536 442384 0 0 0 0 191 508 1 1 98
0 1 0 0 3884 536 441840 0 0 4 44672 189 590 4 40 56
0 1 0 0 3860 536 441840 0 0 0 0 186 526 0 1 99
0 1 0 0 3852 536 441840 0 0 0 0 191 500 0 0 100
0 1 0 0 3844 536 441840 0 0 0 0 193 482 1 0 99
0 1 0 0 3844 536 441840 0 0 0 0 187 511 0 1 99
0 2 1 0 3832 540 441844 0 0 4 0 305 1004 3 2 95
0 3 1 0 3824 544 441844 0 0 4 0 410 1340 2 2 96
0 3 0 0 3764 552 441916 0 0 12 47360 346 915 6 41 53
0 3 0 0 3764 552 441916 0 0 0 0 373 887 0 0 100
0 3 0 0 3764 552 441916 0 0 0 0 278 692 1 2 97
1 3 0 0 3764 552 441916 0 0 0 0 221 579 0 3 97
0 3 0 0 3764 552 441916 0 0 0 0 286 704 0 2 98
I'll now test "batch_count = queue_nr_requests / 3", which I found in
2.4.14-pre2, but with queue_nr_request still left at 1024. And report
results after that.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-25 16:31 ` Linus Torvalds
2001-10-25 17:33 ` Jens Axboe
2001-10-26 9:45 ` Zlatko Calusic
@ 2001-10-26 10:08 ` Zlatko Calusic
2001-10-26 14:39 ` Jens Axboe
2001-10-27 13:14 ` xmm2 - monitor Linux MM active/inactive lists graphically Giuliano Pochini
2 siblings, 2 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-26 10:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-mm, lkml
Linus Torvalds <torvalds@transmeta.com> writes:
> On 25 Oct 2001, Zlatko Calusic wrote:
> >
> > Yes, I definitely have DMA turned ON. All parameters are OK. :)
>
> I suspect it may just be that "queue_nr_requests"/"batch_count" is
> different in -ac: what happens if you tweak them to the same values?
>
Next test:
block: 1024 slots per queue, batch=341
Wrote 600.00 MB in 71 seconds -> 8.39 MB/s (7.5 %CPU)
Still very spiky, and during the write disk is uncapable of doing any
reads. IOW, no serious application can be started before writing has
finished. Shouldn't we favour reads over writes? Or is it just that
the elevator is not doing its job right, so reads suffer?
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 0 3600 424 453416 0 0 0 0 190 510 2 1 97
0 1 1 0 3596 424 453416 0 0 0 40468 189 508 2 2 96
0 1 1 0 3592 424 453416 0 0 0 0 189 541 1 0 99
0 1 1 0 3592 424 453416 0 0 0 0 190 513 1 0 99
1 1 1 0 3592 424 453416 0 0 0 0 192 511 0 1 99
0 1 1 0 3596 424 453416 0 0 0 0 188 528 0 0 100
0 1 1 0 3592 424 453416 0 0 0 0 188 510 1 0 99
0 1 1 0 3592 424 453416 0 0 0 41444 195 507 0 2 98
0 1 1 0 3592 424 453416 0 0 0 0 190 514 1 1 98
1 1 1 0 3588 424 453416 0 0 0 0 192 554 0 2 98
0 1 1 0 3584 424 453416 0 0 0 0 191 506 0 1 99
0 1 1 0 3584 424 453416 0 0 0 0 186 514 0 0 100
0 1 1 0 3584 424 453416 0 0 0 0 186 515 0 0 100
1 1 1 0 3576 424 453416 0 0 0 0 434 1493 3 2 95
1 1 1 0 3564 424 453416 0 0 0 40560 301 936 3 1 96
0 1 1 0 3564 424 453416 0 0 0 0 338 1050 1 2 97
0 1 1 0 3560 424 453416 0 0 0 0 286 893 1 2 97
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 10:08 ` Zlatko Calusic
@ 2001-10-26 14:39 ` Jens Axboe
2001-10-26 14:57 ` Zlatko Calusic
2001-10-27 13:14 ` xmm2 - monitor Linux MM active/inactive lists graphically Giuliano Pochini
1 sibling, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2001-10-26 14:39 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, lkml
On Fri, Oct 26 2001, Zlatko Calusic wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
>
> > On 25 Oct 2001, Zlatko Calusic wrote:
> > >
> > > Yes, I definitely have DMA turned ON. All parameters are OK. :)
> >
> > I suspect it may just be that "queue_nr_requests"/"batch_count" is
> > different in -ac: what happens if you tweak them to the same values?
> >
>
> Next test:
>
> block: 1024 slots per queue, batch=341
That's way too much, batch should just stay around 32, that is fine.
> Still very spiky, and during the write disk is uncapable of doing any
> reads. IOW, no serious application can be started before writing has
> finished. Shouldn't we favour reads over writes? Or is it just that
> the elevator is not doing its job right, so reads suffer?
You are probably just seeing starvation due to the very long queues.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 14:39 ` Jens Axboe
@ 2001-10-26 14:57 ` Zlatko Calusic
2001-10-26 15:01 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-26 14:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, lkml
Jens Axboe <axboe@suse.de> writes:
> On Fri, Oct 26 2001, Zlatko Calusic wrote:
> > Linus Torvalds <torvalds@transmeta.com> writes:
> >
> > > On 25 Oct 2001, Zlatko Calusic wrote:
> > > >
> > > > Yes, I definitely have DMA turned ON. All parameters are OK. :)
> > >
> > > I suspect it may just be that "queue_nr_requests"/"batch_count" is
> > > different in -ac: what happens if you tweak them to the same values?
> > >
> >
> > Next test:
> >
> > block: 1024 slots per queue, batch=341
>
> That's way too much, batch should just stay around 32, that is fine.
OK. Anyway, neither configuration works well, so the problem might be
somewhere else.
While at it, could you give short explanation of those two parameters?
>
> > Still very spiky, and during the write disk is uncapable of doing any
> > reads. IOW, no serious application can be started before writing has
> > finished. Shouldn't we favour reads over writes? Or is it just that
> > the elevator is not doing its job right, so reads suffer?
>
> You are probably just seeing starvation due to the very long queues.
>
Is there anything we could do about that? I remember Linux once had a
favoured reads, but I'm not sure if we do that likewise these days.
When I find some time, I'll dig around that code. It is very
interesting part of the kernel, I'm sure, I just didn't have enough
time so far, to spend hacking on that part.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 14:57 ` Zlatko Calusic
@ 2001-10-26 15:01 ` Jens Axboe
2001-10-26 16:04 ` Linus Torvalds
2001-10-26 16:57 ` Linus Torvalds
2 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2001-10-26 15:01 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, lkml
On Fri, Oct 26 2001, Zlatko Calusic wrote:
> > On Fri, Oct 26 2001, Zlatko Calusic wrote:
> > > Linus Torvalds <torvalds@transmeta.com> writes:
> > >
> > > > On 25 Oct 2001, Zlatko Calusic wrote:
> > > > >
> > > > > Yes, I definitely have DMA turned ON. All parameters are OK. :)
> > > >
> > > > I suspect it may just be that "queue_nr_requests"/"batch_count" is
> > > > different in -ac: what happens if you tweak them to the same values?
> > > >
> > >
> > > Next test:
> > >
> > > block: 1024 slots per queue, batch=341
> >
> > That's way too much, batch should just stay around 32, that is fine.
>
> OK. Anyway, neither configuration works well, so the problem might be
> somewhere else.
Most likely, yes.
> While at it, could you give short explanation of those two parameters?
Sure. queue_nr_requests is the total number of free request slots per
queue. There are queue_nr_requests / 2 free slots for READ and WRITE.
Each request can be anywhere from fs block size and up to 127kB of data
per default. batch only matters once the request free list has been
depleted. In order to give the elevator some input to work with, we free
request slots in batches of 'batch' to get decent merging etc. That's
why numbers bigger than ~ 32 would not be such a good idea and only add
to bad latency.
> > > Still very spiky, and during the write disk is uncapable of doing any
> > > reads. IOW, no serious application can be started before writing has
> > > finished. Shouldn't we favour reads over writes? Or is it just that
> > > the elevator is not doing its job right, so reads suffer?
> >
> > You are probably just seeing starvation due to the very long queues.
> >
>
> Is there anything we could do about that? I remember Linux once had a
> favoured reads, but I'm not sure if we do that likewise these days.
It still favors reads, take a look at the initial sequence numbers given
to reads and writes. We use to favor reads in the request slots too --
you could try and change the blk_init_freelist split so that you get a
1/3 - 2/3 ratio between WRITE's and READ's and see if that makes the
system more smooth.
> When I find some time, I'll dig around that code. It is very
> interesting part of the kernel, I'm sure, I just didn't have enough
> time so far, to spend hacking on that part.
Indeed it is.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 14:57 ` Zlatko Calusic
2001-10-26 15:01 ` Jens Axboe
@ 2001-10-26 16:04 ` Linus Torvalds
2001-10-26 16:57 ` Linus Torvalds
2 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-10-26 16:04 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On 26 Oct 2001, Zlatko Calusic wrote:
>
> OK. Anyway, neither configuration works well, so the problem might be
> somewhere else.
>
> While at it, could you give short explanation of those two parameters?
Did you try the ones 2.4.14-2 does?
Basically, the "queue_nr_requests" means how many requests there can be
for this queue. Half of them are allocated to reads, half of them are
allocated to writes.
The "batch_requests" thing is something that kicks in when the queue has
emptied - we don't want to "trickle" requests to users, because if we do
that means that a new large write will not be able to merge its new
requests sanely because it basically has to do them one at a time. So when
we run out of requests (ie "queue_nr_requests" isn't enough), we start
putting the freed-up requests on a "pending" list, and we release them
only when the pending list is bigger than "batch_requests".
Now, one thing to remember is that "queue_nr_requests" is for the whole
queue (half of them for reads, half for writes), and "batch_requests" is a
per-type thing (ie we batch reads and writes separately). So
"batch_requests" must be less than half of "queue_nr_requests", or we will
never release anything at all.
Now, in Alan's tree, there is a separate tuning thing, which is the "max
nr of _sectors_ in flight", which in my opinion is pretty bogus. It's
really a memory-management thing, but it also does something else: it has
low-and-high water-marks, and those might well be a good idea. It is
possible that we should just ditch the "batch_requests" thing, and use the
watermarks instead.
Side note: all of this is relevant really only for writes - reads pretty
much only care about the maximum queue-size, and it's very hard to get a
_huge_ queue-size with reads unless you do tons of read-ahead.
Now, the "batching" is technically equivalent with water-marking if there
is _one_ writer. But if there are multiple writers, water-marking may
actually has some advantages: it might allow the other writer to make some
progress when the first one has stopped, while the batching will stop
everybody until the batch is released. Who knows.
Anyway, the reason I think Alan's "max nr of sectors" is bogus is because:
- it's a global count, and if you have 10 controllers and want to write
to all 10, you _should_ be able to - you can write 10 times as many
requests in the same latency, so there is nothing "global" with it.
(It turns out that one advantage of the globalism is that it ends up
limiting MM write-outs, but I personally think that is a _MM_ thing, ie
we might want to have a "we have half of all our pages in flight, we
have to throttle now" thing in "writepage()", not in the queue)
- "nr of sectors" has very little to do with request latency on most
hardware. You can do 255 sectors (ie one request) almost as fast as you
can do just one, if you do them in one request. While just _two_
sectors might be much slower than the 255, if they are in separate
requests and cause seeking.
So from a latency standpoint, the "request" is a much better number.
So Alan almost never throttles on requests (on big machines, the -ac tree
allows thousands of requests in flight per queue), while he _does_ have
this water-marking for sectors.
So I have two suspicions:
- 128 requests (ie 64 for writes) like the default kernel should be
_plenty_ enough to keep the disks busy, especially for streaming
writes. It's small enough that you don't get the absolutely _huge_
spikes you get with thousands of requests, while being large enough for
fast writers that even if they _do_ block for 32 of the 64 requests,
they'll have time to refill the next 32 long before the 32 pending one
have finished.
Also: limiting the write queue to 128 requests means that you can
pretty much guarantee that you can get at least a few read requests
per second, even if the write queue is constantly full, and even if
your reader is serialized.
BUT:
- the hard "batch" count is too harsh. It works as a watermark in the
degenerate case, but doesn't allow a second writer to use up _some_ of
the requests while the first writer is blocked due to watermarking.
So with batching, when the queue is full and another process wants
memory, that _OTHER_ process will also always block untilt he queue has
emptied.
With watermarks, when the writer has filled up the queue and starts
waiting, other processes can still do some writing as long as they
don't fill up the queue again. So if you have MM pressure but the
writer is blocked (and some requests _have_ completed, but the writer
waits for the low-water-mark), you can still push out requests.
That's also likely to be a lot more fair - batching tends to give the
whole batch to the big writer, while watermarking automatically allows
others to get a look at the queue.
I'll whip up a patch for testing (2.4.14-2 made the batching slightly
saner, but the same "hard" behaviour is pretty much unavoidable with
batching)
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 14:57 ` Zlatko Calusic
2001-10-26 15:01 ` Jens Axboe
2001-10-26 16:04 ` Linus Torvalds
@ 2001-10-26 16:57 ` Linus Torvalds
2001-10-26 17:19 ` Linus Torvalds
2 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-10-26 16:57 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Jens Axboe, Marcelo Tosatti, linux-mm, lkml
[-- Attachment #1: Type: TEXT/PLAIN, Size: 666 bytes --]
On 26 Oct 2001, Zlatko Calusic wrote:
>
> When I find some time, I'll dig around that code. It is very
> interesting part of the kernel, I'm sure, I just didn't have enough
> time so far, to spend hacking on that part.
Attached is a very untested patch (but hey, it compiles, so it must work,
right?) against 2.4.14-pre2, that makes the batching be a high/low
watermark thing instead. It actually simplified the code, but that is, of
course, assuming that it works at all ;)
(If I got the comparisons wrong, of if I update the counts wrong, your IO
queue will probably stop cold. So be careful. The code is obvious
enough, but typos and thinkos happen).
Linus
[-- Attachment #2: Type: TEXT/PLAIN, Size: 5499 bytes --]
diff -u --recursive pre2/linux/drivers/block/ll_rw_blk.c linux/drivers/block/ll_rw_blk.c
--- pre2/linux/drivers/block/ll_rw_blk.c Fri Oct 26 09:48:25 2001
+++ linux/drivers/block/ll_rw_blk.c Fri Oct 26 09:53:54 2001
@@ -140,21 +140,23 @@
return &blk_dev[MAJOR(dev)].request_queue;
}
-static int __blk_cleanup_queue(struct list_head *head)
+static int __blk_cleanup_queue(struct request_list *list)
{
+ struct list_head *head = &list->free;
struct request *rq;
int i = 0;
- if (list_empty(head))
- return 0;
-
- do {
+ while (!list_empty(head)) {
rq = list_entry(head->next, struct request, queue);
list_del(&rq->queue);
kmem_cache_free(request_cachep, rq);
i++;
- } while (!list_empty(head));
+ };
+ if (i != list->count)
+ printk("request list leak!\n");
+
+ list->count = 0;
return i;
}
@@ -176,10 +178,8 @@
{
int count = queue_nr_requests;
- count -= __blk_cleanup_queue(&q->request_freelist[READ]);
- count -= __blk_cleanup_queue(&q->request_freelist[WRITE]);
- count -= __blk_cleanup_queue(&q->pending_freelist[READ]);
- count -= __blk_cleanup_queue(&q->pending_freelist[WRITE]);
+ count -= __blk_cleanup_queue(&q->rq[READ]);
+ count -= __blk_cleanup_queue(&q->rq[WRITE]);
if (count)
printk("blk_cleanup_queue: leaked requests (%d)\n", count);
@@ -331,11 +331,10 @@
struct request *rq;
int i;
- INIT_LIST_HEAD(&q->request_freelist[READ]);
- INIT_LIST_HEAD(&q->request_freelist[WRITE]);
- INIT_LIST_HEAD(&q->pending_freelist[READ]);
- INIT_LIST_HEAD(&q->pending_freelist[WRITE]);
- q->pending_free[READ] = q->pending_free[WRITE] = 0;
+ INIT_LIST_HEAD(&q->rq[READ].free);
+ INIT_LIST_HEAD(&q->rq[WRITE].free);
+ q->rq[READ].count = 0;
+ q->rq[WRITE].count = 0;
/*
* Divide requests in half between read and write
@@ -349,7 +348,8 @@
}
memset(rq, 0, sizeof(struct request));
rq->rq_status = RQ_INACTIVE;
- list_add(&rq->queue, &q->request_freelist[i & 1]);
+ list_add(&rq->queue, &q->rq[i&1].free);
+ q->rq[i&1].count++;
}
init_waitqueue_head(&q->wait_for_request);
@@ -423,10 +423,12 @@
static inline struct request *get_request(request_queue_t *q, int rw)
{
struct request *rq = NULL;
+ struct request_list *rl = q->rq + rw;
- if (!list_empty(&q->request_freelist[rw])) {
- rq = blkdev_free_rq(&q->request_freelist[rw]);
+ if (!list_empty(&rl->free)) {
+ rq = blkdev_free_rq(&rl->free);
list_del(&rq->queue);
+ rl->count--;
rq->rq_status = RQ_ACTIVE;
rq->special = NULL;
rq->q = q;
@@ -443,17 +445,13 @@
register struct request *rq;
DECLARE_WAITQUEUE(wait, current);
+ generic_unplug_device(q);
add_wait_queue_exclusive(&q->wait_for_request, &wait);
- for (;;) {
- __set_current_state(TASK_UNINTERRUPTIBLE);
- spin_lock_irq(&io_request_lock);
- rq = get_request(q, rw);
- spin_unlock_irq(&io_request_lock);
- if (rq)
- break;
- generic_unplug_device(q);
- schedule();
- }
+ do {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (q->rq[rw].count < batch_requests)
+ schedule();
+ } while ((rq = get_request(q,rw)) == NULL);
remove_wait_queue(&q->wait_for_request, &wait);
current->state = TASK_RUNNING;
return rq;
@@ -542,15 +540,6 @@
list_add(&req->queue, insert_here);
}
-inline void blk_refill_freelist(request_queue_t *q, int rw)
-{
- if (q->pending_free[rw]) {
- list_splice(&q->pending_freelist[rw], &q->request_freelist[rw]);
- INIT_LIST_HEAD(&q->pending_freelist[rw]);
- q->pending_free[rw] = 0;
- }
-}
-
/*
* Must be called with io_request_lock held and interrupts disabled
*/
@@ -564,28 +553,12 @@
/*
* Request may not have originated from ll_rw_blk. if not,
- * asumme it has free buffers and check waiters
+ * assume it has free buffers and check waiters
*/
if (q) {
- /*
- * If nobody is waiting for requests, don't bother
- * batching up.
- */
- if (!list_empty(&q->request_freelist[rw])) {
- list_add(&req->queue, &q->request_freelist[rw]);
- return;
- }
-
- /*
- * Add to pending free list and batch wakeups
- */
- list_add(&req->queue, &q->pending_freelist[rw]);
-
- if (++q->pending_free[rw] >= batch_requests) {
- int wake_up = q->pending_free[rw];
- blk_refill_freelist(q, rw);
- wake_up_nr(&q->wait_for_request, wake_up);
- }
+ list_add(&req->queue, &q->rq[rw].free);
+ if (++q->rq[rw].count >= batch_requests && waitqueue_active(&q->wait_for_request))
+ wake_up(&q->wait_for_request);
}
}
@@ -1144,7 +1117,7 @@
/*
* Batch frees according to queue length
*/
- batch_requests = queue_nr_requests/3;
+ batch_requests = queue_nr_requests/4;
printk("block: %d slots per queue, batch=%d\n", queue_nr_requests, batch_requests);
#ifdef CONFIG_AMIGA_Z2RAM
diff -u --recursive pre2/linux/include/linux/blkdev.h linux/include/linux/blkdev.h
--- pre2/linux/include/linux/blkdev.h Tue Oct 23 22:01:01 2001
+++ linux/include/linux/blkdev.h Fri Oct 26 09:36:41 2001
@@ -66,14 +66,17 @@
*/
#define QUEUE_NR_REQUESTS 8192
+struct request_list {
+ unsigned int count;
+ struct list_head free;
+};
+
struct request_queue
{
/*
* the queue request freelist, one for reads and one for writes
*/
- struct list_head request_freelist[2];
- struct list_head pending_freelist[2];
- int pending_free[2];
+ struct request_list rq[2];
/*
* Together with queue_head for cacheline sharing
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 16:57 ` Linus Torvalds
@ 2001-10-26 17:19 ` Linus Torvalds
2001-10-28 17:30 ` Zlatko Calusic
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-10-26 17:19 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On Fri, 26 Oct 2001, Linus Torvalds wrote:
>
> Attached is a very untested patch (but hey, it compiles, so it must work,
> right?)
And it actually does seem to.
Zlatko, does this make a difference for your disk?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 10:08 ` Zlatko Calusic
2001-10-26 14:39 ` Jens Axboe
@ 2001-10-27 13:14 ` Giuliano Pochini
2001-10-28 5:05 ` Mike Fedyk
1 sibling, 1 reply; 37+ messages in thread
From: Giuliano Pochini @ 2001-10-27 13:14 UTC (permalink / raw)
To: zlatko.calusic; +Cc: Linus Torvalds, Marcelo Tosatti, linux-mm, lkml
> block: 1024 slots per queue, batch=341
>
> Wrote 600.00 MB in 71 seconds -> 8.39 MB/s (7.5 %CPU)
>
> Still very spiky, and during the write disk is uncapable of doing any
> reads. IOW, no serious application can be started before writing has
> finished. Shouldn't we favour reads over writes? Or is it just that
> the elevator is not doing its job right, so reads suffer?
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 1 1 0 3596 424 453416 0 0 0 40468 189 508 2 2 96
341*127K = ~40M.
Batch is too high. It doesn't explain why reads get delayed so much, anyway.
Bye.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-27 13:14 ` xmm2 - monitor Linux MM active/inactive lists graphically Giuliano Pochini
@ 2001-10-28 5:05 ` Mike Fedyk
0 siblings, 0 replies; 37+ messages in thread
From: Mike Fedyk @ 2001-10-28 5:05 UTC (permalink / raw)
To: Giuliano Pochini
Cc: zlatko.calusic, Linus Torvalds, Marcelo Tosatti, linux-mm, lkml
On Sat, Oct 27, 2001 at 03:14:44PM +0200, Giuliano Pochini wrote:
>
> > block: 1024 slots per queue, batch=341
> >
> > Wrote 600.00 MB in 71 seconds -> 8.39 MB/s (7.5 %CPU)
> >
> > Still very spiky, and during the write disk is uncapable of doing any
> > reads. IOW, no serious application can be started before writing has
> > finished. Shouldn't we favour reads over writes? Or is it just that
> > the elevator is not doing its job right, so reads suffer?
> >
> > procs memory swap io system cpu
> > r b w swpd free buff cache si so bi bo in cs us sy id
> > 0 1 1 0 3596 424 453416 0 0 0 40468 189 508 2 2 96
>
> 341*127K = ~40M.
>
> Batch is too high. It doesn't explain why reads get delayed so much, anyway.
>
Try modifying the elivator queue length with elvtune.
BTW, 2.2.19 has the queue lengths in the hundreds, and 2.4.xx has it in the
thousands. I've set 2.4 kernels back to the 2.2 defaults, and interactive
performance has gone up considerably. These are subjective tests though.
Mike
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-26 17:19 ` Linus Torvalds
@ 2001-10-28 17:30 ` Zlatko Calusic
2001-10-28 17:34 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-28 17:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jens Axboe, Marcelo Tosatti, linux-mm, lkml
Linus Torvalds <torvalds@transmeta.com> writes:
> On Fri, 26 Oct 2001, Linus Torvalds wrote:
> >
> > Attached is a very untested patch (but hey, it compiles, so it must work,
> > right?)
>
> And it actually does seem to.
>
> Zlatko, does this make a difference for your disk?
>
First, sorry for such a delay in answering, I was busy.
I compiled 2.4.14-pre3 as it seems to be identical to your p2p3 patch,
with regard to queue processing.
Unfortunately, things didn't change on my first disk (IBM 7200rpm
@home). I'm still getting low numbers, check the vmstat output at the
end of the email.
But, now I found something interesting, other two disk which are on
the standard IDE controller work correctly (writing is at 17-22
MB/sec). The disk which doesn't work well is on the HPT366 interface,
so that may be our culprit. Now I got the idea to check patches
retrogradely to see where it started behaving poorely.
Also, one more thing, I'm pretty sure that under strange circumstances
(specific alignment of stars) it behaves well (with appropriate
writing speed). I just haven't yet pinpointed what needs to be done to
get to that point.
I know I haven't supplied you with a lot of information, but I'll keep
investigating until I have some more solid data on the problem.
BTW, thank you and Jens for nice explanation of the numbers, very good
reading.
0 2 0 13208 2924 516 450716 0 0 0 11808 179 113 0 6 93
0 1 0 13208 2656 524 450964 0 0 0 8432 174 86 1 6 93
0 1 0 13208 3676 532 449924 0 0 0 8432 174 91 1 4 95
0 1 0 13208 3400 540 450172 0 0 0 8432 231 343 1 4 94
0 2 0 13208 3520 548 450036 0 0 0 8440 180 179 2 5 93
0 1 0 20216 3544 728 456976 32 0 32 8432 175 94 0 4 95
0 2 0 20212 3280 728 457232 0 0 0 8440 174 88 0 5 95
0 2 0 20208 3032 728 457480 0 0 0 8364 174 84 1 4 95
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 2 0 20208 3412 732 457092 0 0 0 6964 175 111 0 4 96
0 2 0 20208 3272 728 457224 0 0 0 1216 207 89 0 1 99
0 2 0 20208 3164 728 457352 0 0 0 1300 256 77 1 2 97
0 2 1 20208 2928 732 457604 0 0 0 1444 283 77 1 0 99
0 2 1 20208 2764 732 457732 0 0 0 1316 278 73 1 1 98
0 2 1 20208 3420 728 457096 0 0 0 1652 273 117 0 1 99
0 2 1 20208 3180 732 457348 0 0 0 1404 240 90 0 0 99
0 2 1 20208 3696 728 456840 0 0 0 1784 247 80 0 1 98
0 2 1 20204 3432 728 457096 0 0 0 1404 237 77 1 0 99
0 2 1 20204 2896 732 457604 0 0 0 1672 255 77 1 1 98
0 1 0 20204 3284 728 457224 0 0 0 1976 257 112 0 2 98
0 1 0 20204 2772 728 457736 0 0 0 7628 260 100 0 4 96
0 1 0 20204 3540 728 456968 0 0 0 8492 178 83 1 4 95
0 2 0 20204 3584 736 456916 0 0 4 4848 175 88 0 2 97
Regards,
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:30 ` Zlatko Calusic
@ 2001-10-28 17:34 ` Linus Torvalds
2001-10-28 17:48 ` Alan Cox
2001-10-28 19:13 ` Barry K. Nathan
2001-11-02 5:52 ` Zlatko's I/O slowdown status Andrea Arcangeli
2 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-10-28 17:34 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On 28 Oct 2001, Zlatko Calusic wrote:
>
> But, now I found something interesting, other two disk which are on
> the standard IDE controller work correctly (writing is at 17-22
> MB/sec). The disk which doesn't work well is on the HPT366 interface,
> so that may be our culprit. Now I got the idea to check patches
> retrogradely to see where it started behaving poorely.
Ok. That _is_ indeed a big clue.
Does the -ac patches have any hpt366-specific stuff? Although I suspect
you're right, and that it's just the driver (or controller itself) being
very very sensitive to some random alignment of stars, rather than any
real code itself.
> 0 2 0 13208 2924 516 450716 0 0 0 11808 179 113 0 6 93
> 0 1 0 13208 2656 524 450964 0 0 0 8432 174 86 1 6 93
> 0 1 0 13208 3676 532 449924 0 0 0 8432 174 91 1 4 95
> 0 1 0 13208 3400 540 450172 0 0 0 8432 231 343 1 4 94
> 0 2 0 13208 3520 548 450036 0 0 0 8440 180 179 2 5 93
> 0 1 0 20216 3544 728 456976 32 0 32 8432 175 94 0 4 95
> 0 2 0 20212 3280 728 457232 0 0 0 8440 174 88 0 5 95
> 0 2 0 20208 3032 728 457480 0 0 0 8364 174 84 1 4 95
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 2 0 20208 3412 732 457092 0 0 0 6964 175 111 0 4 96
> 0 2 0 20208 3272 728 457224 0 0 0 1216 207 89 0 1 99
> 0 2 0 20208 3164 728 457352 0 0 0 1300 256 77 1 2 97
> 0 2 1 20208 2928 732 457604 0 0 0 1444 283 77 1 0 99
> 0 2 1 20208 2764 732 457732 0 0 0 1316 278 73 1 1 98
So it actually slows down to just 1.5MB/s at times? That's just
disgusting. I wonder what the driver is doing..
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:34 ` Linus Torvalds
@ 2001-10-28 17:48 ` Alan Cox
2001-10-28 17:59 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2001-10-28 17:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
> Does the -ac patches have any hpt366-specific stuff? Although I suspect
> you're right, and that it's just the driver (or controller itself) being
The IDE code matches between the two. It isnt a driver change
Alan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:48 ` Alan Cox
@ 2001-10-28 17:59 ` Linus Torvalds
2001-10-28 18:22 ` Alan Cox
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-10-28 17:59 UTC (permalink / raw)
To: Alan Cox; +Cc: Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On Sun, 28 Oct 2001, Alan Cox wrote:
>
> > Does the -ac patches have any hpt366-specific stuff? Although I suspect
> > you're right, and that it's just the driver (or controller itself) being
>
> The IDE code matches between the two. It isnt a driver change
It might, of course, just be timing, but that sounds like a bit _too_ easy
an explanation. Even if it could easily be true.
The fact that -ac gets higher speeds, and -ac has a very different
request watermark strategy makes me suspect that that might be the cause.
In particular, the standard kernel _requires_ that in order to get good
performance you can merge many bh's onto one request. That's a very
reasonable assumption: it basically says that any high-performance driver
has to accept merging, because that in turn is required for the elevator
overhead to not grow without bounds. And if the driver doesn't accept big
requests, that driver cannot perform well because it won't have many
requests pending.
In contrast, the -ac logic says roughly "Who the hell cares if the driver
can merge requests or not, we can just give it thousands of small requests
instead, and cap the total number of _sectors_ instead of capping the
total number of requests earlier".
In my opinion, the -ac logic is really bad, but one thing it does allow is
for stupid drivers that look like high-performance drivers. Which may be
why it got implemented.
And it may be that the hpt366 IDE driver has always had this braindamage,
which the -ac code hides. Or something like this.
Does anybody know the hpt driver? Does it, for example, limit the maximum
number of sectors per merge somehow for some reason?
Jens?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:59 ` Linus Torvalds
@ 2001-10-28 18:22 ` Alan Cox
2001-10-28 18:46 ` Linus Torvalds
2001-10-28 18:56 ` Andrew Morton
2001-10-30 8:56 ` Jens Axboe
2 siblings, 1 reply; 37+ messages in thread
From: Alan Cox @ 2001-10-28 18:22 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
> In contrast, the -ac logic says roughly "Who the hell cares if the driver
> can merge requests or not, we can just give it thousands of small requests
> instead, and cap the total number of _sectors_ instead of capping the
> total number of requests earlier"
If you think about it the major resource constraint is sectors - or another
way to think of it "number of pinned pages the VM cannot rescue until the
I/O is done". We also have many devices where the latency is horribly
important - IDE is one because it lacks sensible overlapping I/O. I'm less
sure what the latency trade offs are. Less commands means less turnarounds
so there is counterbalance.
In the case of IDE the -ac tree will do basically the same merging - the
limitations on IDE DMA are pretty reasonable. DMA IDE has scatter gather
tables and is actually smarter than many older scsi controllers. The IDE
layer supports up to 128 chunks of up to just under 64Kb (should be 64K
but some chipsets get 64K = 0 wrong and its not pretty)
> In my opinion, the -ac logic is really bad, but one thing it does allow is
> for stupid drivers that look like high-performance drivers. Which may be
> why it got implemented.
Well I'm all for making dumb hardware go as fast as smart stuff but that
wasn't the original goal - the original goal was to fix the bad behaviour
with the base kernel and large I/O queues to slow devices like M/O disks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 18:22 ` Alan Cox
@ 2001-10-28 18:46 ` Linus Torvalds
2001-10-28 19:29 ` Alan Cox
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-10-28 18:46 UTC (permalink / raw)
To: Alan Cox; +Cc: Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On Sun, 28 Oct 2001, Alan Cox wrote:
>
> > In contrast, the -ac logic says roughly "Who the hell cares if the driver
> > can merge requests or not, we can just give it thousands of small requests
> > instead, and cap the total number of _sectors_ instead of capping the
> > total number of requests earlier"
>
> If you think about it the major resource constraint is sectors - or another
> way to think of it "number of pinned pages the VM cannot rescue until the
> I/O is done".
Yes. But that's a VM decision, and that's a decision the VM _can_ and does
make. At least in newer VM's.
So counting sectors is only hiding problems at a higher level, and it's
hiding problems that the higher level can know about.
In contrast, one thing that the higher level _cannot_ know about is the
latency of the request queue, because that latency depends on the layout
of the requests. Contiguous requests are fast, seeks are slow. So the
number of requests (as long as they aren't infinitely sized) fairly well
approximates the latency.
Note that you are certainly right that the Linux VM system did not use to
be very good at throttling, and you could make it try to write out all of
memory on small machines. But that's really a VM issue.
(And have we have VM's that tried to push all of memory onto the disk, and
then returned Out-of-Memory when all pages were locked? Sure we have. But
I know mine doesn't, don't know about yours).
> We also have many devices where the latency is horribly
> important - IDE is one because it lacks sensible overlapping I/O. I'm less
> sure what the latency trade offs are. Less commands means less turnarounds
> so there is counterbalance.
Note that from a latency standpoint, you only need to have enough requests
to fill the queue - and right now we have a total of 128 requests, of
which half a for reads, and half are for the watermarking, so you end up
having 32 requests "in flight" while you refill the queue.
Which is _plenty_. Because each request can be 255 sectors (or 128,
depending on where the limit is today ;), which means that if you actually
have something throughput-limited, you can certainly keep the disk busy.
(And if the requests aren't localized enough to coalesce well, you cannot
keep the disk at platter-speed _anyway_, plus the requests will take
longer to process, so you'll have even more time to fill the queue).
The important part for real throughput is not to have thousands of
requests in flight, but to have _big_enough_ requests in flight. You can
keep even a fast disk busy with just a few requests, if you just keep
refilling them quickly enough and if they are _big_ enough.
> In the case of IDE the -ac tree will do basically the same merging - the
> limitations on IDE DMA are pretty reasonable. DMA IDE has scatter gather
> tables and is actually smarter than many older scsi controllers. The IDE
> layer supports up to 128 chunks of up to just under 64Kb (should be 64K
> but some chipsets get 64K = 0 wrong and its not pretty)
Yes. My question is more: does the dpt366 thing limit the queueing some
way?
> Well I'm all for making dumb hardware go as fast as smart stuff but that
> wasn't the original goal - the original goal was to fix the bad behaviour
> with the base kernel and large I/O queues to slow devices like M/O disks.
Now, that's a _latency_ issue, and should be fixed by having the max
number of requests (and the max _size_ of a request too) be a per-queue
thing.
But notice how that actually doesn't have anything to do with memory size,
and makes your "scale by max memory" thing illogical.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:59 ` Linus Torvalds
2001-10-28 18:22 ` Alan Cox
@ 2001-10-28 18:56 ` Andrew Morton
2001-10-30 8:56 ` Jens Axboe
2 siblings, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2001-10-28 18:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
Linus Torvalds wrote:
>
> And it may be that the hpt366 IDE driver has always had this braindamage,
> which the -ac code hides. Or something like this.
>
My hpt366, running stock 2.4.14-pre3 performs OK.
time ( dd if=/dev/zero of=foo bs=10240k count=100 ; sync )
takes 35 seconds (30 megs/sec). The same on current -ac kernels.
Maybe Zlatko's drive stopped doing DMA?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:30 ` Zlatko Calusic
2001-10-28 17:34 ` Linus Torvalds
@ 2001-10-28 19:13 ` Barry K. Nathan
2001-10-28 21:42 ` Jonathan Morton
2001-11-02 5:52 ` Zlatko's I/O slowdown status Andrea Arcangeli
2 siblings, 1 reply; 37+ messages in thread
From: Barry K. Nathan @ 2001-10-28 19:13 UTC (permalink / raw)
To: zlatko.calusic
Cc: Linus Torvalds, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
> Unfortunately, things didn't change on my first disk (IBM 7200rpm
> @home). I'm still getting low numbers, check the vmstat output at the
> end of the email.
>
> But, now I found something interesting, other two disk which are on
> the standard IDE controller work correctly (writing is at 17-22
> MB/sec). The disk which doesn't work well is on the HPT366 interface,
> so that may be our culprit. Now I got the idea to check patches
> retrogradely to see where it started behaving poorely.
>
> Also, one more thing, I'm pretty sure that under strange circumstances
> (specific alignment of stars) it behaves well (with appropriate
> writing speed). I just haven't yet pinpointed what needs to be done to
> get to that point.
I didn't read the entire thread, so this is a bit of a stab in the dark,
but:
This really reminds me of a problem I once had with a hard drive of
mine. It would usually go at 15-20MB/sec, but sometimes (under both
Linux and Windows) would slow down to maybe 350KB/sec. The slowdown, or
lack thereof, did seem to depend on the alignment of the stars. I lived
with it for a number of months, then started getting intermittent I/O
errors as well, as if the drive had bad sectors on disk.
The problem turned out to be insufficient ventilation for the controller
board on the bottom of the drive -- it was in the lowest 3.5" drive bay
in my case, so the bottom of the drive was snuggled next to a piece of
metal with ventilation holes. The holes were rather large (maybe 0.5"
diameter) -- and so were the areas without holes. Guess where one of the
drive's controller chips happened to be positioned, relative to the
holes? :( Moving the drive up a bit in the case, so as to allow 0.5"-1"
of space for air beneath the drive, fixed the problem (both the slowdown
and the I/O errors).
I don't know if this is your problem, but I'm mentioning it just in
case it is...
-Barry K. Nathan <barryn@pobox.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 18:46 ` Linus Torvalds
@ 2001-10-28 19:29 ` Alan Cox
0 siblings, 0 replies; 37+ messages in thread
From: Alan Cox @ 2001-10-28 19:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Zlatko Calusic, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
> Yes. My question is more: does the dpt366 thing limit the queueing some
> way?
Nope. The HPT366 is a bog standard DMA IDE controller. At least unless Andre
can point out something I've forgotten any behaviour seen on it should be
the same as seen on any other IDE controller with DMA support.
In practical terms that should mean you can obsere the same HPT366 problem
he does on whatever random IDE controller is on your desktop box
> But notice how that actually doesn't have anything to do with memory size,
> and makes your "scale by max memory" thing illogical.
When you are dealing with the VM limit which the limiter was originally
added for then it makes a lot of sense. When you want to use it solely for
other purposes then it doesnt.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 19:13 ` Barry K. Nathan
@ 2001-10-28 21:42 ` Jonathan Morton
0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Morton @ 2001-10-28 21:42 UTC (permalink / raw)
To: barryn, zlatko.calusic
Cc: Linus Torvalds, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
> > Unfortunately, things didn't change on my first disk (IBM 7200rpm
>> @home). I'm still getting low numbers, check the vmstat output at the
>> end of the email.
>>
>> But, now I found something interesting, other two disk which are on
>> the standard IDE controller work correctly (writing is at 17-22
>> MB/sec). The disk which doesn't work well is on the HPT366 interface,
>> so that may be our culprit. Now I got the idea to check patches
> > retrogradely to see where it started behaving poorely.
>This really reminds me of a problem I once had with a hard drive of
>mine. It would usually go at 15-20MB/sec, but sometimes (under both
>Linux and Windows) would slow down to maybe 350KB/sec. The slowdown, or
>lack thereof, did seem to depend on the alignment of the stars. I lived
>with it for a number of months, then started getting intermittent I/O
>errors as well, as if the drive had bad sectors on disk.
>
>The problem turned out to be insufficient ventilation for the controller
>board on the bottom of the drive
As an extra datapoint, my IBM Deskstar 60GXP's (40Gb version) runs
slightly slower with writing than with reading. This is on a VIA
686a controller, UDMA/66 active. The drive also has plenty of air
around it, being in a 5.25" bracket with fans in front.
Writing 1GB from /dev/zero takes 34.27s = 29.88MB/sec, 19% CPU
Reading 1GB from test file takes 29.64s = 34.58MB/sec, 18% CPU
Hmm, that's almost as fast as the 10000rpm Ultrastar sited just above
it, but with higher CPU usage. Ultrastar gets 36MB/sec on reading
with hdparm, haven't tested write performance due to probable
fragmentation.
Both tests conducted using 'dd bs=1k' on my 1GHz Athlon with 256Mb
RAM. Test file is on a freshly-created ext2 filesystem starting at
10Gb into the 40Gb drive (knowing IBM's recent trend, this'll still
be fairly close to the outer rim). Write test includes a sync at the
end. Kernel is Linus 2.4.9, no relevant patches.
--
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@cyberspace.org (not for attachments)
website: http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline: The key to knowledge is not to rely on people to teach you it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-28 17:59 ` Linus Torvalds
2001-10-28 18:22 ` Alan Cox
2001-10-28 18:56 ` Andrew Morton
@ 2001-10-30 8:56 ` Jens Axboe
2001-10-30 9:26 ` Zlatko Calusic
2 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2001-10-30 8:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Alan Cox, Zlatko Calusic, Marcelo Tosatti, linux-mm, lkml
On Sun, Oct 28 2001, Linus Torvalds wrote:
>
> On Sun, 28 Oct 2001, Alan Cox wrote:
> >
> > > Does the -ac patches have any hpt366-specific stuff? Although I suspect
> > > you're right, and that it's just the driver (or controller itself) being
> >
> > The IDE code matches between the two. It isnt a driver change
>
> It might, of course, just be timing, but that sounds like a bit _too_ easy
> an explanation. Even if it could easily be true.
>
> The fact that -ac gets higher speeds, and -ac has a very different
> request watermark strategy makes me suspect that that might be the cause.
>
> In particular, the standard kernel _requires_ that in order to get good
> performance you can merge many bh's onto one request. That's a very
> reasonable assumption: it basically says that any high-performance driver
> has to accept merging, because that in turn is required for the elevator
> overhead to not grow without bounds. And if the driver doesn't accept big
> requests, that driver cannot perform well because it won't have many
> requests pending.
Nod
> In contrast, the -ac logic says roughly "Who the hell cares if the driver
> can merge requests or not, we can just give it thousands of small requests
> instead, and cap the total number of _sectors_ instead of capping the
> total number of requests earlier".
Not true, that was not the intended goal. We always want the driver to
get merged requests, even if we can have ridicilously large queue
lengths. The large queues were a benchmark win (blush), since it allowed
the elevator to reorder seeks across a big bench run effieciently. I've
later done more real life testing and I don't think it matters too much
here, in fact it only seems to incur greater latency and starvation.
> In my opinion, the -ac logic is really bad, but one thing it does allow is
> for stupid drivers that look like high-performance drivers. Which may be
> why it got implemented.
Don't mix up the larger queues with lack of will to merge, that is not
the case.
> And it may be that the hpt366 IDE driver has always had this braindamage,
> which the -ac code hides. Or something like this.
>
> Does anybody know the hpt driver? Does it, for example, limit the maximum
> number of sectors per merge somehow for some reason?
hpt366 has no special work arounds or stuff it disables, it can't be
anything like that.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: xmm2 - monitor Linux MM active/inactive lists graphically
2001-10-30 8:56 ` Jens Axboe
@ 2001-10-30 9:26 ` Zlatko Calusic
0 siblings, 0 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-10-30 9:26 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Alan Cox, Marcelo Tosatti, linux-mm, lkml
[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]
Jens Axboe <axboe@suse.de> writes:
> hpt366 has no special work arounds or stuff it disables, it can't be
> anything like that.
>
Followup on the problem. Yesterday I was upgrading my Debian Linux. To
do that I have to remount /usr read-write. After the update finished,
I tested once again disk writing speed. And there it was, full
22MB/sec (on the same partition). And once I get to that point, disk
will remain performant. Then I thought (poor man's logic) that poor
performance might have something to do with my /usr mounted read-only
(BTW, it's on the same disk I'm having problems with).
Quick test: reboot (/usr is ro), check speed -> only 8MB/sec, remount
/usr rw, but unfortunately didn't help, writing speed remains low.
So it was just an idea. I still don't know what can be done to return
speed to normal. I don't know if I have mentioned, but reading from
the same disk is always going at the full speed.
Also, I'm pretty sure that I have the same problem on the completely
another machine (at the work) which doesn't use HPT366, but standard
controller (BX chipset).
So, something might be wrong with my setup, but I'm still unable to
find what.
I'm compiling with 2.95.4 20011006 (Debian prerelease) from the Debian
unstable distribution. Kernel is completely monolithic (no modules).
Attached is the _relevant_ part of IDE configuration.
[-- Attachment #2: A --]
[-- Type: text/plain, Size: 2385 bytes --]
#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y
#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
# CONFIG_BLK_DEV_IDEDISK_VENDOR is not set
# CONFIG_BLK_DEV_IDEDISK_FUJITSU is not set
# CONFIG_BLK_DEV_IDEDISK_IBM is not set
# CONFIG_BLK_DEV_IDEDISK_MAXTOR is not set
# CONFIG_BLK_DEV_IDEDISK_QUANTUM is not set
# CONFIG_BLK_DEV_IDEDISK_SEAGATE is not set
# CONFIG_BLK_DEV_IDEDISK_WD is not set
# CONFIG_BLK_DEV_COMMERIAL is not set
# CONFIG_BLK_DEV_TIVO is not set
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
#
# IDE chipset support/bugfixes
#
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_ISAPNP is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_ADMA=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_PCI_WIP=y
# CONFIG_IDEDMA_NEW_DRIVE_LISTINGS is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_AEC62XX_TUNING is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_WDC_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_AMD74XX_OVERRIDE is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_HPT34X_AUTODMA is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_PIIX_TUNING is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# CONFIG_DMA_NONPCI is not set
# CONFIG_BLK_DEV_IDE_MODES is not set
# CONFIG_BLK_DEV_ATARAID is not set
# CONFIG_BLK_DEV_ATARAID_PDC is not set
# CONFIG_BLK_DEV_ATARAID_HPT is not set
[-- Attachment #3: Type: text/plain, Size: 12 bytes --]
--
Zlatko
^ permalink raw reply [flat|nested] 37+ messages in thread
* Zlatko's I/O slowdown status
2001-10-28 17:30 ` Zlatko Calusic
2001-10-28 17:34 ` Linus Torvalds
2001-10-28 19:13 ` Barry K. Nathan
@ 2001-11-02 5:52 ` Andrea Arcangeli
2001-11-02 20:14 ` Zlatko Calusic
2 siblings, 1 reply; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-02 5:52 UTC (permalink / raw)
To: Zlatko Calusic
Cc: Linus Torvalds, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
Hello Zlatko,
I'm not sure how the email thread ended but I noticed different
unplugging of the I/O queues in mainline (mainline was a little more
overkill than -ac) and also wrong bdflush histeresis (pre-wakekup of
bdflush to avoid blocking if the write flood could be sustained by the
bandwith of the HD was missing for example).
So you may want to give a spin to pre6aa1 and see if it makes any
difference, if it makes any difference I'll know what your problem is
(see the buffer.c part of the vm-10 patch in pre6aa1 for more details).
thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Zlatko's I/O slowdown status
2001-11-02 5:52 ` Zlatko's I/O slowdown status Andrea Arcangeli
@ 2001-11-02 20:14 ` Zlatko Calusic
2001-11-02 20:26 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-11-02 20:14 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Linus Torvalds, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
Andrea Arcangeli <andrea@suse.de> writes:
> Hello Zlatko,
>
> I'm not sure how the email thread ended but I noticed different
> unplugging of the I/O queues in mainline (mainline was a little more
> overkill than -ac) and also wrong bdflush histeresis (pre-wakekup of
> bdflush to avoid blocking if the write flood could be sustained by the
> bandwith of the HD was missing for example).
Thank God, today it is finally solved. Just two days ago, I was pretty
sure that disk had started dying on me, and i didn't know of any
solution for that. Today, while I was about to try your patch, I got
another idea and finally pinpointed the problem.
It was write caching. Somehow disk was running with write cache turned
off and I was getting abysmal write performance. Then I found hdparm
-W0 /proc/ide/hd* in /etc/init.d/umountfs which is ran during shutdown
but I don't understand how it survived through reboots and restarts!
And why only two of four disks, which I'm dealing with, got confused
with the command. And finally I don't understand how I could still got
full speed occassionaly. Weird!
I would advise users of Debian unstable to comment that part, I'm sure
it's useless on most if not all setups. You might be pleasantly
surprised with performance gains (write speed doubles).
>
> So you may want to give a spin to pre6aa1 and see if it makes any
> difference, if it makes any difference I'll know what your problem is
> (see the buffer.c part of the vm-10 patch in pre6aa1 for more details).
>
Thanks for your concern. Eventually I compiled aa1 and it is running
correctly (whole day at work, and last hour at home - SMP), although I
now don't see any performance improvements.
I would like to thank all the others that spent time helping me,
especially Linus, Jens and Marcelo, sorry guys for taking your time.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Zlatko's I/O slowdown status
2001-11-02 20:14 ` Zlatko Calusic
@ 2001-11-02 20:26 ` Rik van Riel
2001-11-02 21:22 ` Zlatko Calusic
2001-11-02 20:57 ` Andrea Arcangeli
2001-11-02 23:23 ` Simon Kirby
2 siblings, 1 reply; 37+ messages in thread
From: Rik van Riel @ 2001-11-02 20:26 UTC (permalink / raw)
To: Zlatko Calusic
Cc: Andrea Arcangeli, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On 2 Nov 2001, Zlatko Calusic wrote:
> It was write caching. Somehow disk was running with write cache turned
> off and I was getting abysmal write performance. Then I found hdparm
> -W0 /proc/ide/hd* in /etc/init.d/umountfs which is ran during shutdown
>
> I would advise users of Debian unstable to comment that part,
Why do you want Debian users to loose their data ? ;)
The 'hdparm -W0' is useful in getting the drive to flush
out the data to disk instead of having it linger around
in the drive cache.
regards,
Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/
http://www.surriel.com/ http://distro.conectiva.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Zlatko's I/O slowdown status
2001-11-02 20:14 ` Zlatko Calusic
2001-11-02 20:26 ` Rik van Riel
@ 2001-11-02 20:57 ` Andrea Arcangeli
2001-11-02 23:23 ` Simon Kirby
2 siblings, 0 replies; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-02 20:57 UTC (permalink / raw)
To: Zlatko Calusic
Cc: Linus Torvalds, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
On Fri, Nov 02, 2001 at 09:14:14PM +0100, Zlatko Calusic wrote:
> It was write caching. Somehow disk was running with write cache turned
Ah, I was going to ask you to try with:
/sbin/hdparm -d1 -u1 -W1 -c1 /dev/hda
(my settings, of course not safe for journaling fs, safe to use it only
with ext2 and I -W0 back during /etc/init.d/halt) but I assumed you were
using the same hdparm settings in -ac and mainline. Never mind, good
that it's solved now :).
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Zlatko's I/O slowdown status
2001-11-02 20:26 ` Rik van Riel
@ 2001-11-02 21:22 ` Zlatko Calusic
0 siblings, 0 replies; 37+ messages in thread
From: Zlatko Calusic @ 2001-11-02 21:22 UTC (permalink / raw)
To: Rik van Riel
Cc: Andrea Arcangeli, Jens Axboe, Marcelo Tosatti, linux-mm, lkml
Rik van Riel <riel@conectiva.com.br> writes:
> On 2 Nov 2001, Zlatko Calusic wrote:
>
> > It was write caching. Somehow disk was running with write cache turned
> > off and I was getting abysmal write performance. Then I found hdparm
> > -W0 /proc/ide/hd* in /etc/init.d/umountfs which is ran during shutdown
> >
> > I would advise users of Debian unstable to comment that part,
>
> Why do you want Debian users to loose their data ? ;)
That few lines of code is a recent addition to Debian. It never
existed before, so do you want to say that Debian was buggy for years
and people lost massive amounts of data because of that? :)
No, really, I'm using poweroff on my computer and not once I had a
problem with it (I'm speaking about thousands of poweroffs) losing
data after poweroff. But I have a problem with bad performance. :)
>
> The 'hdparm -W0' is useful in getting the drive to flush
> out the data to disk instead of having it linger around
> in the drive cache.
>
Yes, I know, but it's not THAT important, otherwise it wouldn't be
missing so many years from the init script.
Anyway, this whole debate probably points to a problem of missing
hdparm -W1 in the startup init script. IDE drives really behave
poorely without write caching and there's nothing we could do about
that, beside turning it on and pray to God we don't have too many
power outages. :)
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Zlatko's I/O slowdown status
2001-11-02 20:14 ` Zlatko Calusic
2001-11-02 20:26 ` Rik van Riel
2001-11-02 20:57 ` Andrea Arcangeli
@ 2001-11-02 23:23 ` Simon Kirby
2 siblings, 0 replies; 37+ messages in thread
From: Simon Kirby @ 2001-11-02 23:23 UTC (permalink / raw)
To: Zlatko Calusic
Cc: Andrea Arcangeli, Linus Torvalds, Jens Axboe, Marcelo Tosatti,
linux-mm, lkml
On Fri, Nov 02, 2001 at 09:14:14PM +0100, Zlatko Calusic wrote:
> Thank God, today it is finally solved. Just two days ago, I was pretty
> sure that disk had started dying on me, and i didn't know of any
> solution for that. Today, while I was about to try your patch, I got
> another idea and finally pinpointed the problem.
>
> It was write caching. Somehow disk was running with write cache turned
> off and I was getting abysmal write performance. Then I found hdparm
> -W0 /proc/ide/hd* in /etc/init.d/umountfs which is ran during shutdown
> but I don't understand how it survived through reboots and restarts!
> And why only two of four disks, which I'm dealing with, got confused
> with the command. And finally I don't understand how I could still got
> full speed occassionaly. Weird!
>
> I would advise users of Debian unstable to comment that part, I'm sure
> it's useless on most if not all setups. You might be pleasantly
> surprised with performance gains (write speed doubles).
Aha! That would explain why I was seeing it as well... and why I was
seeing errors from hdparm for /dev/hdc and /dev/hdd, which are CDROMs.
Argh. :)
If they have hdparm -W 0 at shutdown, there should be a -W 1 during
startup.
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2001-11-02 23:23 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-24 10:42 xmm2 - monitor Linux MM active/inactive lists graphically Zlatko Calusic
2001-10-24 14:26 ` Marcelo Tosatti
2001-10-25 0:25 ` Zlatko Calusic
2001-10-25 4:19 ` Linus Torvalds
2001-10-25 4:57 ` Linus Torvalds
2001-10-25 12:48 ` Zlatko Calusic
2001-10-25 16:31 ` Linus Torvalds
2001-10-25 17:33 ` Jens Axboe
2001-10-26 9:45 ` Zlatko Calusic
2001-10-26 10:08 ` Zlatko Calusic
2001-10-26 14:39 ` Jens Axboe
2001-10-26 14:57 ` Zlatko Calusic
2001-10-26 15:01 ` Jens Axboe
2001-10-26 16:04 ` Linus Torvalds
2001-10-26 16:57 ` Linus Torvalds
2001-10-26 17:19 ` Linus Torvalds
2001-10-28 17:30 ` Zlatko Calusic
2001-10-28 17:34 ` Linus Torvalds
2001-10-28 17:48 ` Alan Cox
2001-10-28 17:59 ` Linus Torvalds
2001-10-28 18:22 ` Alan Cox
2001-10-28 18:46 ` Linus Torvalds
2001-10-28 19:29 ` Alan Cox
2001-10-28 18:56 ` Andrew Morton
2001-10-30 8:56 ` Jens Axboe
2001-10-30 9:26 ` Zlatko Calusic
2001-10-28 19:13 ` Barry K. Nathan
2001-10-28 21:42 ` Jonathan Morton
2001-11-02 5:52 ` Zlatko's I/O slowdown status Andrea Arcangeli
2001-11-02 20:14 ` Zlatko Calusic
2001-11-02 20:26 ` Rik van Riel
2001-11-02 21:22 ` Zlatko Calusic
2001-11-02 20:57 ` Andrea Arcangeli
2001-11-02 23:23 ` Simon Kirby
2001-10-27 13:14 ` xmm2 - monitor Linux MM active/inactive lists graphically Giuliano Pochini
2001-10-28 5:05 ` Mike Fedyk
2001-10-25 9:07 ` Zlatko Calusic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox