From: Andrew Morton <akpm@digeo.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: wli@holomorphy.com, mbligh@aracnet.com, mingo@elte.hu,
hugh@veritas.com, dmccr@us.ibm.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: objrmap and vmtruncate
Date: Sat, 5 Apr 2003 17:39:34 -0800 [thread overview]
Message-ID: <20030405173934.766b1e85.akpm@digeo.com> (raw)
In-Reply-To: <20030406001407.GL1326@dualathlon.random>
Andrea Arcangeli <andrea@suse.de> wrote:
>
> 2.4-aa is outperforming 2.5 in almost all tiobenchs results, so I doubt
> the elevator is that bad and could explain such drop in performance.
Well. tiobench doesn't measure concurrent reads and writes.
A quick test shows the anticipatory scheduler runs `tiobench --threads 16'
1.5x faster on reads and 1.15x faster on writes. But that's a damn good
result for a 2.4 elevator.
It has a starvation problem though.
Running this:
while true
do
dd if=/dev/zero of=x bs=1M count=300 conv=notrunc
done
in parallel with five reads from five 200M files shows writes getting stalled
for 20 second periods.
0 8 0 3980 1800 223056 0 0 11688 10168 450 351 0 1 99 0
0 7 0 3516 1792 223548 0 0 20036 4136 491 476 0 5 95 0
2 5 0 3864 1792 223200 0 0 23444 0 469 727 1 2 97 0
0 7 0 3384 1792 223684 0 0 21952 0 456 639 0 6 94 0
1 6 0 3468 1792 223596 0 0 23436 0 475 680 0 2 98 0
0 7 0 4172 1792 222896 0 0 22824 0 469 597 0 2 98 0
0 7 0 3472 1792 223592 0 0 24376 0 493 599 0 5 95 0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
4 4 0 3352 1792 223712 0 0 24680 0 496 574 1 11 88 0
0 7 0 3920 1792 223144 0 0 24120 0 482 708 1 5 94 0
0 7 0 3920 1792 223144 0 0 23536 0 474 556 1 4 95 0
0 7 0 3912 1792 223152 0 0 22524 0 468 502 0 5 95 0
0 7 0 3564 1792 223500 0 0 23120 0 471 510 0 4 96 0
0 7 0 3324 1792 223740 0 0 21732 0 449 657 0 4 96 0
0 7 0 3656 1792 223408 0 0 24236 0 484 554 1 3 96 0
0 7 0 4256 1792 222808 0 0 23076 0 474 561 0 8 92 0
0 7 0 3436 1792 223628 0 0 22312 0 455 501 0 1 99 0
0 7 0 3384 1792 223680 0 0 23588 0 476 611 1 1 98 0
0 8 0 3408 1792 223656 0 0 21464 1312 474 615 0 7 93 0
0 8 0 3328 1792 223736 0 0 13772 10300 478 467 0 1 99 0
0 8 0 3492 1656 223712 0 0 11988 12612 497 409 0 1 99 0
0 8 0 3976 796 224088 0 0 14748 5952 432 367 0 2 98 0
0 8 0 3812 728 224324 0 0 15636 8064 476 449 1 2 97 0
0 8 0 3768 732 224364 0 0 12328 10880 469 361 0 1 99 0
0 8 0 3504 752 224608 0 0 12548 8452 435 354 0 2 98 0
0 8 0 4180 760 223924 0 0 14676 9920 492 419 0 4 96 0
0 8 0 3616 776 224472 0 0 12976 9660 462 367 0 3 97 0
0 7 0 3784 792 224288 0 0 15312 8864 483 401 0 4 96 0
0 7 0 3328 808 224728 0 0 21060 0 443 468 0 2 98 0
3 4 0 3324 832 224708 0 0 21752 0 449 470 0 4 96 0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 7 0 3496 852 224516 0 0 21584 0 447 427 0 5 95 0
0 7 0 3760 876 224228 0 0 22772 0 458 526 1 3 96 0
0 7 0 4240 896 223728 0 0 22172 0 460 448 0 4 96 0
0 7 0 3308 920 224636 0 0 22428 0 461 471 0 5 95 0
0 7 0 3292 944 224628 0 0 22660 0 463 493 0 5 95 0
5 2 0 3592 960 224312 0 0 21328 0 445 459 0 1 99 0
1 6 0 4120 980 223764 0 0 22508 0 453 435 1 1 98 0
1 6 0 3568 1008 224288 0 0 23332 0 475 516 1 5 94 0
0 7 0 3484 1024 224356 0 0 21968 0 457 476 1 4 95 0
1 6 0 3928 1048 223888 0 0 20284 0 427 494 0 3 97 0
0 7 0 3996 1072 223796 0 0 22584 0 461 538 0 3 97 0
0 7 0 3892 1088 223884 0 0 21728 0 455 470 1 5 94 0
0 7 0 3916 728 224220 0 0 22884 0 463 503 1 7 92 0
0 7 0 4340 752 223772 0 0 23000 0 473 502 0 6 94 0
0 8 0 3600 768 224496 0 0 20692 1124 447 519 0 3 97 0
> I suspect it must be something on the lines of the filesystem doing
> synchronous I/O for some reason inside writepage, like doing a
> wait_on_buffer for every writepage, generating the above fake results.
> Note the 0% cpu time. You're not benchmarking the vm here. Infact I
> would be interested to see the above repeated on ext2.
>
> It's not true that ext3 is sharing the same writepage of ext2 as you
> said in a earlier email, the ext3 writepage starts like this:
No, that code's all just fluff. These pages get a disk mapping real early
(I've just added an msync to make sure though). So ext3_writepage() is
really nothing in this test except for block_write_full_page(). The
journalling system does not get involved much at all with overwrites to
blocks on an ordered-data filesystem.
> and even the ext2 writepage can be synchronous if it has to call
> get_block. Infact I would reccomend to fill the "foo" file with zeros
> and not to have holes in it just to avoid additional synchronous fs
> overhead and to only be sync in the inode map lookup.
Yup, I did that. It doesn't make any difference.
But you're right, the problem does not occur on 2.4.21-pre5aa2+ext2. Nor
does it occur on 2.5+ext3, and nor does it occur on 2.4.21-pre5+ext3. It is
something specific to aa+ext3.
I don't know what's gone wrong. It's just stuck in filemap_nopage->lock_page
all the time, seeking all over the disk. It smells like a VM/VFS problem.
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 6380 4064 5836 232120 0 0 2940 0 214 231 0 2 98 0
1 0 6380 3196 5836 233032 0 0 3164 0 217 239 0 0 100 0
0 1 6380 3516 5836 232752 0 0 3136 0 216 234 0 0 100 0
0 1 6380 3764 5836 232612 0 0 3080 0 214 231 0 2 98 0
0 1 6380 4028 5836 232432 0 0 3080 0 214 231 0 1 99 0
1 0 6380 3224 5836 233292 0 0 3108 0 215 231 0 1 99 0
1 0 6380 3396 5836 233176 0 0 3164 0 216 238 0 0 100 0
0 1 6380 3600 5836 233048 0 0 3248 0 220 243 0 3 97 0
0 1 6380 3732 5836 232968 0 0 3192 0 218 239 0 1 99 0
It should look like:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 2 8656 3200 588 241892 0 0 6440 11928 548 545 0 1 99 0
1 0 8656 2272 588 242896 0 0 13804 7172 817 1173 0 3 97 0
0 1 8656 2240 588 243036 20 60 16344 14572 924 1236 0 4 96 0
0 1 8656 2380 528 243188 0 0 16060 16044 965 1183 0 9 91 0
0 2 8656 3192 544 242068 312 4 13104 13316 801 1038 0 5 95 0
0 2 8656 2208 564 243204 0 0 14676 17920 929 1086 0 5 95 0
0 1 8656 2264 576 243160 20 0 16160 11836 892 1231 0 3 97 0
1 0 8656 4444 584 240956 0 0 10200 15640 740 829 0 3 97 0
File layout is OK, same as ext2:
79895-79895: 0-0 (1)
79896-79902: 260848-260854 (7)
79903-79903: 0-0 (1)
79904-79910: 32607-32613 (7)
79911-79911: 0-0 (1)
79912-79918: 260904-260910 (7)
79919-79919: 0-0 (1)
79920-79926: 32614-32620 (7)
79927-79927: 0-0 (1)
79928-79934: 260960-260966 (7)
79935-79935: 0-0 (1)
79936-79942: 32621-32627 (7)
79943-79943: 0-0 (1)
79944-79950: 261016-261022 (7)
79951-79951: 0-0 (1)
I applied the -aa ext3 patches to 2.4.21-pre5 and that ran OK.
It's almost like the VM is refusing to call ext3_writepage() for some reason,
or is only reclaiming clean pagecache, or the filemap_nopage() readaround
isn't working. Very odd.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
next prev parent reply other threads:[~2003-04-06 1:39 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-04 14:34 Hugh Dickins
2003-04-04 16:14 ` William Lee Irwin III
2003-04-04 16:29 ` Hugh Dickins
2003-04-04 18:54 ` Andrew Morton
2003-04-04 21:43 ` Hugh Dickins
2003-04-04 21:45 ` Andrea Arcangeli
2003-04-04 21:58 ` Benjamin LaHaise
2003-04-04 23:07 ` Andrew Morton
2003-04-05 0:03 ` Andrea Arcangeli
2003-04-05 0:31 ` Andrew Morton
2003-04-05 1:31 ` Andrea Arcangeli
2003-04-05 1:52 ` Benjamin LaHaise
2003-04-05 2:22 ` Andrea Arcangeli
2003-04-05 10:01 ` Jamie Lokier
2003-04-05 10:11 ` William Lee Irwin III
2003-04-05 2:06 ` Andrew Morton
2003-04-05 2:24 ` Andrea Arcangeli
2003-04-05 2:13 ` Martin J. Bligh
2003-04-05 2:44 ` Andrea Arcangeli
2003-04-05 3:24 ` Andrew Morton
2003-04-05 12:06 ` Andrew Morton
2003-04-05 15:11 ` Martin J. Bligh
[not found] ` <20030405161758.1ee19bfa.akpm@digeo.com>
2003-04-06 0:17 ` Andrew Morton
2003-04-06 7:07 ` William Lee Irwin III
2003-04-05 16:30 ` Andrea Arcangeli
2003-04-05 19:01 ` Andrea Arcangeli
2003-04-05 20:14 ` Andrew Morton
2003-04-05 21:24 ` Andrew Morton
2003-04-05 22:06 ` Andrea Arcangeli
2003-04-05 22:31 ` Andrew Morton
2003-04-05 23:10 ` Andrea Arcangeli
2003-04-06 1:58 ` Andrew Morton
2003-04-06 14:47 ` Andrea Arcangeli
2003-04-06 21:35 ` William Lee Irwin III
2003-04-06 7:38 ` William Lee Irwin III
2003-04-06 14:51 ` Andrea Arcangeli
2003-04-06 12:37 ` Jamie Lokier
2003-04-06 13:12 ` William Lee Irwin III
2003-04-22 11:00 ` Ingo Molnar
2003-04-22 11:54 ` William Lee Irwin III
2003-04-22 14:31 ` Ingo Molnar
2003-04-22 14:56 ` William Lee Irwin III
2003-04-22 15:26 ` Ingo Molnar
2003-04-22 16:20 ` William Lee Irwin III
2003-04-22 16:57 ` Andrea Arcangeli
2003-04-22 17:21 ` William Lee Irwin III
2003-04-22 18:08 ` Andrea Arcangeli
2003-04-22 17:34 ` Ingo Molnar
2003-04-22 18:04 ` Benjamin LaHaise
2003-04-22 16:58 ` Martin J. Bligh
2003-04-22 12:37 ` Andrea Arcangeli
2003-04-22 13:20 ` William Lee Irwin III
2003-04-22 14:38 ` Martin J. Bligh
2003-04-22 15:10 ` William Lee Irwin III
2003-04-22 15:53 ` Martin J. Bligh
2003-04-22 14:52 ` Andrea Arcangeli
2003-04-22 14:29 ` Martin J. Bligh
2003-04-22 15:07 ` Ingo Molnar
2003-04-22 15:42 ` William Lee Irwin III
2003-04-22 15:55 ` Ingo Molnar
2003-04-22 16:58 ` William Lee Irwin III
2003-04-22 17:07 ` Ingo Molnar
2003-04-22 15:16 ` Andrea Arcangeli
2003-04-22 15:49 ` Ingo Molnar
2003-04-22 16:16 ` Martin J. Bligh
2003-04-22 17:24 ` Ingo Molnar
2003-04-22 17:45 ` John Bradford
2003-04-22 14:32 ` Martin J. Bligh
2003-04-22 15:09 ` Ingo Molnar
2003-04-05 21:34 ` Rik van Riel
2003-04-06 9:29 ` Benjamin LaHaise
2003-04-05 23:25 ` William Lee Irwin III
2003-04-05 23:57 ` Andrew Morton
2003-04-06 0:14 ` Andrea Arcangeli
2003-04-06 1:39 ` Andrew Morton [this message]
2003-04-06 2:13 ` William Lee Irwin III
2003-04-06 9:26 ` Benjamin LaHaise
2003-04-06 9:41 ` William Lee Irwin III
2003-04-06 9:54 ` William Lee Irwin III
2003-04-06 2:23 ` Martin J. Bligh
2003-04-06 3:55 ` Andrew Morton
2003-04-06 3:08 ` Martin J. Bligh
2003-04-06 7:42 ` William Lee Irwin III
2003-04-06 14:49 ` Alan Cox
2003-04-06 16:13 ` Martin J. Bligh
2003-04-06 21:34 ` subobj-rmap Martin J. Bligh
2003-04-06 21:42 ` subobj-rmap Rik van Riel
2003-04-06 21:55 ` subobj-rmap Jamie Lokier
2003-04-06 22:39 ` subobj-rmap William Lee Irwin III
2003-04-06 22:03 ` subobj-rmap Martin J. Bligh
2003-04-06 22:06 ` subobj-rmap Martin J. Bligh
2003-04-06 22:15 ` subobj-rmap Andrea Arcangeli
2003-04-06 22:25 ` subobj-rmap Martin J. Bligh
2003-04-07 21:25 ` subobj-rmap Andrea Arcangeli
2003-04-06 23:06 ` subobj-rmap Jamie Lokier
2003-04-06 23:26 ` subobj-rmap Martin J. Bligh
2003-04-05 3:45 ` objrmap and vmtruncate Martin J. Bligh
2003-04-05 3:59 ` Rik van Riel
2003-04-05 4:10 ` William Lee Irwin III
2003-04-05 4:49 ` Martin J. Bligh
2003-04-05 13:31 ` Rik van Riel
2003-04-05 4:52 ` Martin J. Bligh
2003-04-05 3:22 ` Andrew Morton
2003-04-05 3:35 ` Martin J. Bligh
2003-04-05 3:53 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030405173934.766b1e85.akpm@digeo.com \
--to=akpm@digeo.com \
--cc=andrea@suse.de \
--cc=dmccr@us.ibm.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@aracnet.com \
--cc=mingo@elte.hu \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox