* 2.5.59-mm5
@ 2003-01-24 3:50 Andrew Morton
2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Andrew Morton @ 2003-01-24 3:50 UTC (permalink / raw)
To: linux-kernel, linux-mm
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/
. -mm3 and -mm4 were not announced - they were sync-up patches as we
worked on the I/O scheduler.
. -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler.
Here's the scoop:
The problem being addressed here is (mainly) kernel behaviour when there
is a stream of writeout happening, and someone submits a read.
In 2.4.x, the disk queues contain up to 30 megabytes of writes (say, one
seconds's worth). When a read is submitted the 2.4 I/O scheduler will try
to insert that at the right place between the writes. Usually, there is no
right place and the read is appended to the queue. That is: it will be
serviced in one second.
But the problem with reads is that they are dependent - neither the
application nor the kernel can submit read #N until read #N-1 has
completed. So something as simple as
cat /usr/src/linux/kernel/*.c > /dev/null
requires several hundred dependent reads. And in the presence of a
streaming write, each and every one of those reads gets stuck at the end of
the queue, and takes a second to propagate to the head. The `cat' takes
hundreds of seconds.
The celebrated read-latency2 patch recognises the fact that appending a
read to a tail of writes is dumb, and puts the read near the head of the
queue of writes. It provides an improvement of up to 30x. The deadline
I/O scheduler in 2.5 does the same thing: if reads are queued up, promote
them past writes, even if those writes have been waiting longer.
So far so good, but these fixes are still dumb. Because we're solving
the dependent read problem by creating a seek storm. Every time someone
submits a read, we stop writing, seek over and service the read, and then
*immediately* seek back and start servicing writes again.
But in the common case, the application which submitted a read is about
to go and submit another one, closeby on-disk to the first. So whoops, we
have to seek back to service that one as well.
So what anticipatory scheduling does is very simple: if an application
has performed a read, do *nothing at all* for a few milliseconds. Just
return to userspace (or to the filesystem) in the expectation that the
application or filesystem will quickly submit another read which is
closeby.
If the application _does_ submit the read then fine - we service that
quickly. If it does not submit a read then we lose. Time out and go back
to doing writes.
The end result is a large reduction in seeking - decreased read latency,
increased read bandwidth and increased write bandwidth.
The code as-is has rough spots and still needs quite some work. But it
appears to be stable. The test which I have concentrated on is "how long
does my laptop take to compile util-linux when there is a continuous write
happening". On ext2, mounted noatime:
2.4.20: 538 seconds
2.5.59: 400 seconds
2.5.59-mm5: 70 seconds
No streaming write: 48 seconds
A couple of VFS changes were needed as well.
More details on anticipatory scheduling may be found at
http://www.cs.rice.edu/~ssiyer/r/antsched/
Changes since 2.5.59-mm2:
+preempt-locking.patch
Speed up the smp preempt locking.
+ext2-allocation-failure-fix.patch
ext2 ENOSPC crash fix
+ext2_new_block-fixes.patch
ext2 cleanups
+hangcheck-timer.patch
A form of software watchdog
+slab-irq-fix.patch
Fix a BUG() in slab when memory exhaustion happens at a bad time.
+sendfile-security-hooks.patch
Reinstate lost security hooks around sendfile()
+buffer-io-accounting.patch
Fix IO-wait acounting
+aic79xx-linux-2.5.59-20030122.patch
aic7xxx driver update
+topology-remove-underbars.patch
cleanup
+mandlock-oops-fix.patch
file locking fix
+reiserfs_file_write.patch
reworked reiserfs write code.
-exit_mmap-fix2.patch
Dropped
+generic_file_readonly_mmap-fix.patch
Fix MAP_PRIVATE mmaps for filesystems which don't support ->writepage()
+seq_file-page-defn.patch
Compile fix
+exit_mmap-fix-ppc64.patch
+exit_mmap-ia64-fix.patch
Fix the exit_mmap() problem in arch code.
+show_task-fix.patch
Fix oops in show_task()
+scsi-iothread.patch
software suspend fix
+numaq-ioapic-fix2.patch
NUMAQ stuff
+misc.patch
Random fixes
+writeback-sync-cleanup.patch
remove some junk from fs-writeback.c
+dont-wait-on-inode.patch
Fix large delays in the writeback path
+unlink-latency-fix.patch
Fix large delays in unlink()
+anticipatory_io_scheduling-2_5_59-mm3.patch
Anticipatory scheduling implementation
All 65 patches:
kgdb.patch
devfs-fix.patch
deadline-np-42.patch
(undescribed patch)
deadline-np-43.patch
(undescribed patch)
setuid-exec-no-lock_kernel.patch
remove lock_kernel() from exec of setuid apps
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
reiserfs-readpages.patch
reiserfs v3 readpages support
fadvise.patch
implement posix_fadvise64()
ext3-scheduling-storm.patch
ext3: fix scheduling storm and lockups
auto-unplug.patch
self-unplugging request queues
less-unplugging.patch
Remove most of the blk_run_queues() calls
lockless-current_kernel_time.patch
Lockless current_kernel_timer()
scheduler-tunables.patch
scheduler tunables
htlb-2.patch
hugetlb: fix MAP_FIXED handling
kirq.patch
kirq-up-fix.patch
Subject: Re: 2.5.59-mm1
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
prune-icache-stats.patch
add stats for page reclaim via inode freeing
vma-file-merge.patch
mmap-whitespace.patch
read_cache_pages-cleanup.patch
cleanup in read_cache_pages()
remove-GFP_HIGHIO.patch
remove __GFP_HIGHIO
quota-lockfix.patch
quota locking fix
quota-offsem.patch
quota semaphore fix
oprofile-p4.patch
oprofile_cpu-as-string.patch
oprofile cpu-as-string
preempt-locking.patch
Subject: spinlock efficiency problem [was 2.5.57 IO slowdown with CONFIG_PREEMPT enabled)
wli-11_pgd_ctor.patch
(undescribed patch)
wli-11_pgd_ctor-update.patch
pgd_ctor update
stack-overflow-fix.patch
stack overflow checking fix
ext2-allocation-failure-fix.patch
Subject: [PATCH] ext2 allocation failures
ext2_new_block-fixes.patch
ext2_new_block cleanups and fixes
hangcheck-timer.patch
hangcheck-timer
slab-irq-fix.patch
slab IRQ fix
Richard_Henderson_for_President.patch
Subject: [PATCH] Richard Henderson for President!
parenthesise-pgd_index.patch
Subject: i386 pgd_index() doesn't parenthesize its arg
sendfile-security-hooks.patch
Subject: [RFC][PATCH] Restore LSM hook calls to sendfile
macro-double-eval-fix.patch
Subject: Re: i386 pgd_index() doesn't parenthesize its arg
mmzone-parens.patch
asm-i386/mmzone.h macro paren/eval fixes
blkdev-fixes.patch
blkdev.h fixes
remove-will_become_orphaned_pgrp.patch
remove will_become_orphaned_pgrp()
buffer-io-accounting.patch
correct wait accounting in wait_on_buffer()
aic79xx-linux-2.5.59-20030122.patch
aic7xxx update
MAX_IO_APICS-ifdef.patch
MAX_IO_APICS #ifdef'd wrongly
dac960-error-retry.patch
Subject: [PATCH] linux2.5.56 patch to DAC960 driver for error retry
topology-remove-underbars.patch
Remove __ from topology macros
mandlock-oops-fix.patch
ftruncate/truncate oopses with mandatory locking
put_user-warning-fix.patch
Subject: Re: Linux 2.5.59
reiserfs_file_write.patch
Subject: reiserfs file_write patch
vmlinux-fix.patch
vmlinux fix
smalldevfs.patch
smalldevfs
sound-firmware-load-fix.patch
soundcore.c referenced non-existent errno variable
generic_file_readonly_mmap-fix.patch
Fix generic_file_readonly_mmap()
seq_file-page-defn.patch
Include <asm/page.h> in fs/seq_file.c, as it uses PAGE_SIZE
exit_mmap-fix-ppc64.patch
exit_mmap-ia64-fix.patch
Fix ia64's 64bit->32bit app switching
show_task-fix.patch
Subject: [PATCH] 2.5.59: show_task() oops
scsi-iothread.patch
scsi_eh_* needs to run even during suspend
numaq-ioapic-fix2.patch
NUMAQ io_apic programming fix
misc.patch
misc fixes
writeback-sync-cleanup.patch
dont-wait-on-inode.patch
unlink-latency-fix.patch
anticipatory_io_scheduling-2_5_59-mm3.patch
Subject: [PATCH] 2.5.59-mm3 antic io sched
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: 2.5.59-mm5 2003-01-24 3:50 2.5.59-mm5 Andrew Morton @ 2003-01-24 11:03 ` Alex Bligh - linux-kernel 2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton 2003-01-24 11:23 ` 2.5.59-mm5 Jens Axboe 2003-01-24 13:59 ` 2.5.59-mm5 got stuck during boot Helge Hafting 2003-01-25 8:33 ` 2.5.59-mm5 Andres Salomon 2 siblings, 2 replies; 32+ messages in thread From: Alex Bligh - linux-kernel @ 2003-01-24 11:03 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm; +Cc: Alex Bligh - linux-kernel --On 23 January 2003 19:50 -0800 Andrew Morton <akpm@digeo.com> wrote: > So what anticipatory scheduling does is very simple: if an application > has performed a read, do *nothing at all* for a few milliseconds. Just > return to userspace (or to the filesystem) in the expectation that the > application or filesystem will quickly submit another read which is > closeby. I'm sure this is a really dumb question, as I've never played with this subsystem, in which case I apologize in advance. Why not follow (by default) the old system where you put the reads effectively at the back of the queue. Then rather than doing nothing for a few milliseconds, you carry on with doing the writes. However, promote the reads to the front of the queue when you have a "good lump" of them. If you get further reads while you are processing a lump of them, put them behind the lump. Switch back to the putting reads at the end when we have done "a few lumps worth" of reads, or exhausted the reads at the start of the queue (or perhaps are short of memory). IE (with a "lump" = 20) and "a few" = 3. W0 W1 W2 ... W50 W51 [Read arrives, we process some writes] W5 ... W50 W51 R0 [More reads arrive, more writes processed] W10 ... W50 W51 R0 R1 R2 .. R7 [Haven't got a big enough lump, but a write arrives] W12 W13... W50 W51 W52 R0 R1 R2 .. R7 [More reads arrive, more writes processed] W14 W15 ... W50 W51 W52 R0 R1 R2 .. R7 R8 R9.. R19 [Another read arrives, after 4 more writes have been processed, and we move the lump to the front] R0 R1 R2 .. R7 R8 R9.. R19 R20 W18 W19 ... W50 W51 W52 [Some reads are processed, and some more arrive, which we insert into our lump at the front] R0 R1 R2 .. R7 R8 R9.. R19 R20 R21 R22 W18 W19 ... W50 W51 W52 Then either if the reads are processed at the front of the queue faster than they arrive, and the "lump" disappears, or if we've processed 3 x 20 = 60 reads, we revert to sticking reads back at the end. All this does is lump between 20 and 60 reads together. The advantage being that you don't "do nothing" for a few milliseconds, and can attract larger lumps, than by waiting without incurring additional latency. Now of course you have the ordering problem (in that I've assumed you can insert things into the queue at will), but you have that anyway. -- Alex Bligh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel @ 2003-01-24 11:16 ` Andrew Morton 2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas 2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov 2003-01-24 11:23 ` 2.5.59-mm5 Jens Axboe 1 sibling, 2 replies; 32+ messages in thread From: Andrew Morton @ 2003-01-24 11:16 UTC (permalink / raw) To: Alex Bligh - linux-kernel; +Cc: linux-kernel, linux-mm Alex Bligh - linux-kernel <linux-kernel@alex.org.uk> wrote: > > > > --On 23 January 2003 19:50 -0800 Andrew Morton <akpm@digeo.com> wrote: > > > So what anticipatory scheduling does is very simple: if an application > > has performed a read, do *nothing at all* for a few milliseconds. Just > > return to userspace (or to the filesystem) in the expectation that the > > application or filesystem will quickly submit another read which is > > closeby. > > I'm sure this is a really dumb question, as I've never played > with this subsystem, in which case I apologize in advance. > > Why not follow (by default) the old system where you put the reads > effectively at the back of the queue. Then rather than doing nothing > for a few milliseconds, you carry on with doing the writes. However, > promote the reads to the front of the queue when you have a "good > lump" of them. That is the problem. Reads do not come in "lumps". They are dependent. Consider the case of reading a file: 1: Read the directory. This is a single read, and we cannot do anything until it has completed. 2: The directory told us where the inode is. Go read the inode. This is a single read, and we cannot do anything until it has completed. 3: Go read the first 12 blocks of the file and the first indirect. This is a single read, and we cannot do anything until it has completed. The above process can take up to three trips through the request queue. In this very common scenario, the only way we'll ever get "lumps" of reads is if some other processes come in and happen to want to read nearby sectors. In the best case, the size of the lump is proportional to the number of processes which are concurrently trying to read something. This just doesn't happen enough to be significant or interesting. But writes are completely different. There is no dependency between them and at any point in time we know where on-disk a lot of writes will be placed. We don't know that for reads, which is why we need to twiddle thumbs until the application or filesystem makes up its mind. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton @ 2003-01-24 11:23 ` Alex Tomas 2003-01-24 11:50 ` 2.5.59-mm5 Andrew Morton 2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov 1 sibling, 1 reply; 32+ messages in thread From: Alex Tomas @ 2003-01-24 11:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Alex Bligh - linux-kernel, linux-kernel, linux-mm >>>>> Andrew Morton (AM) writes: AM> But writes are completely different. There is no dependency AM> between them and at any point in time we know where on-disk a lot AM> of writes will be placed. We don't know that for reads, which is AM> why we need to twiddle thumbs until the application or filesystem AM> makes up its mind. it's significant that application doesn't want to wait read completion long and doesn't wait for write completion in most cases. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas @ 2003-01-24 11:50 ` Andrew Morton 2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas 2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron 0 siblings, 2 replies; 32+ messages in thread From: Andrew Morton @ 2003-01-24 11:50 UTC (permalink / raw) To: Alex Tomas; +Cc: linux-kernel, linux-kernel, linux-mm Alex Tomas <bzzz@tmi.comex.ru> wrote: > > >>>>> Andrew Morton (AM) writes: > > AM> But writes are completely different. There is no dependency > AM> between them and at any point in time we know where on-disk a lot > AM> of writes will be placed. We don't know that for reads, which is > AM> why we need to twiddle thumbs until the application or filesystem > AM> makes up its mind. > > > it's significant that application doesn't want to wait read completion > long and doesn't wait for write completion in most cases. That's correct. Reads are usually synchronous and writes are rarely synchronous. The most common place where the kernel forces a user process to wait on completion of a write is actually in unlink (truncate, really). Because truncate must wait for in-progress I/O to complete before allowing the filesystem to free (and potentially reuse) the affected blocks. If there's a lot of writeout happening then truncate can take _ages_. Hence this patch: Truncates can take a very long time. Especially if there is a lot of writeout happening, because truncate must wait on in-progress I/O. And sys_unlink() is performing that truncate while holding the parent directory's i_sem. This basically shuts down new accesses to the entire directory until the synchronous I/O completes. In the testing I've been doing, that directory is /tmp, and this hurts. So change sys_unlink() to perform the actual truncate outside i_sem. When there is a continuous streaming write to the same disk, this patch reduces the time for `make -j4 bzImage' from 370 seconds to 220. namei.c | 12 ++++++++++++ 1 files changed, 12 insertions(+) diff -puN fs/namei.c~unlink-latency-fix fs/namei.c --- 25/fs/namei.c~unlink-latency-fix 2003-01-24 02:41:04.000000000 -0800 +++ 25-akpm/fs/namei.c 2003-01-24 02:47:36.000000000 -0800 @@ -1659,12 +1659,19 @@ int vfs_unlink(struct inode *dir, struct return error; } +/* + * Make sure that the actual truncation of the file will occur outside its + * diretory's i_sem. truncate can take a long time if there is a lot of + * writeout happening, and we don't want to prevent access to the directory + * while waiting on the I/O. + */ asmlinkage long sys_unlink(const char * pathname) { int error = 0; char * name; struct dentry *dentry; struct nameidata nd; + struct inode *inode = NULL; name = getname(pathname); if(IS_ERR(name)) @@ -1683,6 +1690,9 @@ asmlinkage long sys_unlink(const char * /* Why not before? Because we want correct error value */ if (nd.last.name[nd.last.len]) goto slashes; + inode = dentry->d_inode; + if (inode) + inode = igrab(inode); error = vfs_unlink(nd.dentry->d_inode, dentry); exit2: dput(dentry); @@ -1693,6 +1703,8 @@ exit1: exit: putname(name); + if (inode) + iput(inode); /* truncate the inode here */ return error; slashes: _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:50 ` 2.5.59-mm5 Andrew Morton @ 2003-01-24 12:05 ` Alex Tomas 2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton 2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron 1 sibling, 1 reply; 32+ messages in thread From: Alex Tomas @ 2003-01-24 12:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, linux-kernel, linux-mm >>>>> Andrew Morton (AM) writes: AM> That's correct. Reads are usually synchronous and writes are AM> rarely synchronous. AM> The most common place where the kernel forces a user process to AM> wait on completion of a write is actually in unlink (truncate, AM> really). Because truncate must wait for in-progress I/O to AM> complete before allowing the filesystem to free (and potentially AM> reuse) the affected blocks. looks like I miss something here. why do wait for write completion in truncate? getblk (blockmap); getblk (bitmap); set 0 in blockmap->b_data[N]; mark_buffer_dirty (blockmap); clear_bit (N, &bitmap); mark_buffer_dirty (bitmap); isn't that enough? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas @ 2003-01-24 19:12 ` Andrew Morton 2003-01-24 19:58 ` 2.5.59-mm5 Alex Tomas 2003-01-25 17:32 ` 2.5.59-mm5 Ed Tomlinson 0 siblings, 2 replies; 32+ messages in thread From: Andrew Morton @ 2003-01-24 19:12 UTC (permalink / raw) To: Alex Tomas; +Cc: linux-kernel, linux-kernel, linux-mm Alex Tomas <bzzz@tmi.comex.ru> wrote: > > >>>>> Andrew Morton (AM) writes: > > AM> That's correct. Reads are usually synchronous and writes are > AM> rarely synchronous. > > AM> The most common place where the kernel forces a user process to > AM> wait on completion of a write is actually in unlink (truncate, > AM> really). Because truncate must wait for in-progress I/O to > AM> complete before allowing the filesystem to free (and potentially > AM> reuse) the affected blocks. > > looks like I miss something here. > > why do wait for write completion in truncate? We cannot free disk blocks until I/O against them has completed. Otherwise the block could be reused for something else, then the old IO will scribble on the new data. What we _can_ do is to defer the waiting - only wait on the I/O when someone reuses the disk blocks. So there are actually unused blocks with I/O in flight against them. We do that for metadata (the wait happens in unmap_underlying_metadata()) but for file data blocks there is no mechanism in place to look them up. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton @ 2003-01-24 19:58 ` Alex Tomas 2003-01-25 17:32 ` 2.5.59-mm5 Ed Tomlinson 1 sibling, 0 replies; 32+ messages in thread From: Alex Tomas @ 2003-01-24 19:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, linux-kernel, linux-mm >>>>> Andrew Morton (AM) writes: AM> We cannot free disk blocks until I/O against them has completed. AM> Otherwise the block could be reused for something else, then the AM> old IO will scribble on the new data. AM> What we _can_ do is to defer the waiting - only wait on the I/O AM> when someone reuses the disk blocks. So there are actually AM> unused blocks with I/O in flight against them. AM> We do that for metadata (the wait happens in AM> unmap_underlying_metadata()) but for file data blocks there is no AM> mechanism in place to look them up yeah! indeed. my stupid mistake ... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton 2003-01-24 19:58 ` 2.5.59-mm5 Alex Tomas @ 2003-01-25 17:32 ` Ed Tomlinson 2003-01-25 17:41 ` 2.5.59-mm5 Andrew Morton 1 sibling, 1 reply; 32+ messages in thread From: Ed Tomlinson @ 2003-01-25 17:32 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm Hi Andrew, I am seeing a strange problem with mm5. This occurs both with and without the anticipatory scheduler changes. What happens is I see very high system times and X responds very very slowly. I first noticed this when switching between folders in kmail and have seen it rebuilding db files for squidguard. Here is what happened during the db rebuild (no anticipatory ioscheduler): oscar# readprofile -r; vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 6 0 348 15824 115900 183148 0 0 191 134 1064 770 22 6 66 6 5 4 348 15312 115936 183420 0 0 0 4392 1027 537 28 72 0 0 5 0 348 14872 115936 183956 0 0 0 422 1079 553 33 68 0 0 7 0 348 14552 115936 184316 0 0 0 0 1001 536 42 58 0 0 6 0 348 13912 116012 184900 0 0 0 126 1019 560 32 68 0 0 5 0 348 13272 116024 185468 0 0 4 0 1002 560 27 73 0 0 5 4 348 12696 116060 186052 0 0 0 86 1014 519 28 73 0 0 5 0 348 12368 116060 186356 0 0 0 0 1001 509 24 76 0 0 5 0 348 11920 116060 186772 0 0 0 34 1003 519 27 74 0 0 6 0 348 11672 116084 187044 0 0 0 88 1186 1199 29 71 0 0 8 1 348 8536 116276 188148 0 0 468 0 1118 761 39 61 0 0 5 5 348 5016 114468 188120 0 0 614 304 1118 811 59 41 0 0 6 0 348 5144 113336 186548 0 0 648 0 1036 770 54 46 0 0 7 0 348 5080 113252 185920 0 0 132 0 1013 707 42 58 0 0 6 0 348 4688 113188 185528 0 0 184 262 1049 784 64 36 0 0 6 0 348 6032 111292 185160 0 0 406 0 1038 725 39 62 0 0 6 0 348 5200 111392 185908 0 0 216 1096 1032 733 35 65 0 0 6 0 348 4312 111392 186744 0 0 166 0 1023 668 39 62 0 0 6 1 348 5096 111396 187196 0 0 10 0 1002 701 25 76 0 0 6 1 348 4328 111436 187692 0 0 16 3778 1207 755 24 76 0 0 6 1 348 6120 110460 186728 0 0 14 0 1201 841 30 70 0 0 7 2 348 5608 110548 187108 0 0 6 64 1083 753 23 77 0 0 6 1 348 4960 110548 187600 0 0 14 74 1105 783 24 77 0 0 6 1 348 4448 110548 187988 0 0 8 0 1122 700 25 75 0 0 6 1 348 5224 109732 187940 0 0 6 142 1066 813 42 59 0 0 6 1 348 4648 109740 188380 0 0 10 0 1003 682 25 76 0 0 8 1 348 4264 109740 188724 0 0 8 0 1110 740 27 73 0 0 6 1 348 6184 109000 187380 0 0 18 164 1026 727 23 78 0 0 7 1 348 5800 109000 187684 0 0 8 0 1002 694 25 76 0 0 6 1 348 5152 109056 188048 0 0 14 126 1022 743 25 75 0 0 7 1 348 4768 109060 188340 0 0 6 0 1002 699 24 76 0 0 6 0 348 4384 109060 188612 0 0 6 0 1002 681 26 74 0 0 8 1 348 5160 109032 187768 0 0 8 118 1018 709 23 78 0 0 7 1 348 4840 109032 188024 0 0 12 0 1004 655 23 78 0 0 6 1 348 5800 109076 186864 0 0 2 3246 1244 808 46 54 0 0 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 7 0 348 6184 109084 185740 0 0 304 0 1027 717 32 68 0 0 6 1 348 6440 109084 185988 0 0 4 0 1001 676 34 66 0 0 6 1 348 6112 109168 186304 0 0 12 4414 1242 813 23 77 0 0 6 1 348 5664 109172 186636 0 0 6 0 1005 727 24 76 0 0 6 0 348 5224 109216 186924 0 0 6 108 1173 838 24 76 0 0 6 1 348 4840 109216 187260 0 0 16 0 1099 686 25 76 0 0 6 1 348 4328 109220 187644 0 0 6 0 1002 637 24 76 0 0 7 1 1016 6248 108640 185376 0 0 14 108 1021 778 25 76 0 0 8 1 1016 5800 108644 185748 0 0 6 0 1002 627 21 79 0 0 6 1 1016 5344 108696 186012 0 0 8 158 1025 764 44 56 0 0 6 1 1016 4832 108696 186460 0 0 12 0 1003 735 27 73 0 0 6 1 1016 4384 108700 186888 0 0 4 0 1002 648 26 75 0 0 6 0 1016 4968 108152 186612 0 0 12 254 1047 764 25 76 0 0 6 1 1016 4392 108156 187116 0 0 16 0 1002 718 24 77 0 0 6 0 1016 7080 108080 184172 0 0 6 92 1014 720 30 71 0 0 6 1 1016 6760 108092 184584 0 0 12 0 1004 695 24 76 0 0 6 1 1016 6376 108096 184876 0 0 6 0 1002 675 21 79 0 0 6 1 1016 5536 108204 185256 0 0 90 4642 1250 838 26 75 0 0 6 1 1016 5088 108212 185628 0 0 10 36 1006 705 24 76 0 0 8 2 1016 4776 108244 185836 0 0 6 2900 1138 783 57 43 0 0 6 1 1016 5544 108316 184704 0 0 228 3294 1260 874 37 62 0 1 6 1 1016 5096 108316 185088 0 0 6 0 1008 658 24 76 0 0 6 1 1016 4192 108448 185424 0 0 18 276 1047 694 23 77 0 0 7 0 1016 6432 108080 183236 0 0 68 0 1057 742 26 74 0 0 6 1 1016 5848 108220 183744 0 0 126 236 1043 732 26 75 0 0 6 1 1016 5400 108220 184072 0 0 8 0 1056 698 24 76 0 0 7 0 1016 4824 108220 184448 0 0 16 0 1002 662 24 76 0 0 7 1 1016 4384 108280 184796 0 0 12 118 1019 721 25 76 0 0 6 1 1016 5728 108272 183268 0 0 4 0 1056 662 25 75 0 0 9 2 1016 4448 107924 183288 0 0 164 304 1062 796 28 72 0 0 6 2 1016 7512 106888 182268 0 0 8 32 1017 866 47 54 0 0 5 1 1016 5720 106892 183048 0 0 14 0 1045 700 43 57 0 0 2 1 1016 5776 105212 182628 0 0 24 386 1058 741 45 56 0 0 2 1 1016 5464 105216 182828 0 0 38 0 1061 753 20 80 0 0 3 2 1016 6112 105276 181404 0 0 234 1848 1114 774 32 68 0 0 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 1 1016 5416 105280 181852 0 0 150 3654 1292 848 24 76 0 0 2 1 1016 5040 105284 182112 0 0 36 0 1090 726 23 78 0 0 2 2 1016 6128 105344 180228 0 0 52 3724 1262 859 21 79 0 0 2 2 1016 5360 105344 180500 0 0 40 2782 1231 758 18 82 0 0 4 1 1016 4328 105424 180888 0 0 62 1018 1144 724 21 79 0 0 3 0 1016 5160 105408 180100 0 0 48 0 1087 849 38 62 0 0 2 1 1016 4776 105448 180388 0 0 36 0 1234 781 18 82 0 0 2 1 1016 4272 105596 180676 0 0 30 122 1025 706 17 83 0 0 0 2 1016 4656 105644 179832 0 0 104 1136 1077 761 24 70 0 6 0 2 1016 5616 105164 174620 0 0 422 3392 1394 933 43 12 0 44 0 2 1016 9200 105868 175916 0 0 532 1096 1152 852 50 26 0 23 3 1 1016 5496 104644 177692 0 0 1410 2 1157 936 37 14 0 50 0 3 1016 5336 103132 177448 0 334 292 3106 1244 784 74 13 0 14 2 1 1020 11096 100876 168948 0 0 566 1356 1118 752 82 18 0 0 1 1 1020 18088 100976 168120 0 0 616 0 1082 789 50 7 0 43 0 1 1020 10856 101660 169780 0 0 562 666 1150 841 59 8 0 33 0 1 1020 5040 101692 169428 0 2 568 1724 1112 727 43 6 0 50 0 1 1020 6024 101080 163120 0 0 588 1368 1180 779 48 9 0 44 0 1 1020 4360 97712 162408 0 0 568 472 1131 787 42 7 0 51 2 0 1020 4800 91872 161560 0 0 596 8 1090 784 46 7 0 47 1 0 1512 4608 87900 160428 0 246 548 686 1129 785 42 8 0 51 2 1 1512 4736 83968 157512 0 0 640 0 1093 807 45 8 0 48 1 1 1512 5320 76640 157896 0 0 604 0 1088 780 47 7 0 47 0 1 1512 5128 71820 157204 0 0 568 444 1127 766 40 7 0 53 1 1 1528 4808 65792 157160 0 8 600 8 1085 798 48 8 0 45 2 0 1536 4616 63268 157108 0 4 892 464 1136 810 76 6 0 19 1 0 2472 4488 62680 158428 0 452 890 744 1075 794 89 5 0 6 3 0 2916 4416 61812 159912 12 222 1148 222 1056 805 81 5 0 15 0 1 3048 5056 60108 159328 0 66 990 228 1122 858 46 5 0 50 0 1 3328 4744 55496 159560 0 140 584 140 1095 863 39 6 0 55 1 0 3704 4428 52604 158456 0 188 568 572 1126 801 36 6 0 58 1 0 4800 5396 51944 154448 0 548 556 554 1088 851 43 7 0 51 1 0 4948 5668 49528 151096 48 74 674 74 1091 793 45 8 0 48 2 0 5896 5648 49392 146584 0 474 598 794 1132 815 38 6 0 56 0 1 6748 6032 49364 142004 16 426 592 436 1085 765 47 8 0 46 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 7720 5376 48920 139236 0 486 554 800 1126 745 40 8 0 53 2 0 7720 4928 46020 137268 0 0 596 0 1095 774 65 7 0 28 0 1 12396 5440 45420 135712 16 2338 576 2628 1152 804 45 14 0 42 0 1 15264 5184 45556 134552 0 1434 456 1806 1130 759 36 7 0 58 1 0 17432 4864 43640 134168 0 1084 584 1084 1099 739 43 8 0 48 8 0 22028 4928 42256 133512 0 2298 528 2592 1231 810 39 9 0 51 0 1 24148 5940 40412 133016 0 982 524 982 1142 771 39 8 0 53 0 1 25916 4936 37448 133184 16 884 594 884 1100 740 44 9 0 48 1 1 28856 4892 36868 132172 0 1470 490 1766 1122 729 39 7 0 54 3 1 30236 4292 33800 130832 144 690 836 690 1116 812 46 9 0 45 0 0 32176 5408 33792 131384 32 970 690 1696 1180 1220 43 7 14 36 0 0 32176 5408 33792 131396 0 0 2 0 1001 553 4 1 94 0 0 0 32176 4896 33796 132032 16 0 46 0 1141 928 29 3 66 1 1 0 32176 4864 33904 132036 0 0 6 90 1017 532 4 1 93 2 55091 default_idle 1377.2750 62640 __copy_from_user_ll 1204.6154 33595 __copy_to_user_ll 646.0577 432 system_call 9.0000 100 ide_outb 8.3333 488 current_kernel_time 8.1333 167 block_commit_write 5.2188 119 delay_tsc 4.2500 38 syscall_call 3.4545 81 get_offset_tsc 3.3750 203 fget 3.1719 349 radix_tree_lookup 2.8145 1549 do_anonymous_page 2.7080 32 ide_inb 2.6667 548 reiserfs_copy_from_user_to_file_region 2.6346 156 mark_page_accessed 2.6000 46 fput 2.3000 131 unlock_page 2.1833 347 reiserfs_submit_file_region_for_write 2.1688 302 update_atime 2.0972 67 init_journal_hash 2.0938 422 find_lock_page 1.9906 126 reiserfs_can_fit_pages 1.7500 21 user_schedule 1.7500 100 kmem_cache_free 1.6667 193 unix_poll 1.5078 50 task_vsize 1.3889 105 handle_IRQ_event 1.3816 60 reiserfs_claim_blocks_to_be_allocated 1.2500 91 sys_pread64 1.1974 171 __block_commit_write 1.1875 56 pathrelse 1.1667 90 sys_pwrite64 1.1250 13 ide_outl 1.0833 101 atomic_dec_and_lock 1.0521 279 SHATransform 1.0257 This is on a K6-3 400, 512m debian, kernel built with gcc 2.95-4 Ideas? Ed Tomlinson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-25 17:32 ` 2.5.59-mm5 Ed Tomlinson @ 2003-01-25 17:41 ` Andrew Morton 2003-01-25 20:34 ` 2.5.59-mm5 Ed Tomlinson 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2003-01-25 17:41 UTC (permalink / raw) To: Ed Tomlinson; +Cc: linux-mm Ed Tomlinson <tomlins@cam.org> wrote: > > Hi Andrew, > > I am seeing a strange problem with mm5. This occurs both with and without > the anticipatory scheduler changes. What happens is I see very high system > times and X responds very very slowly. I first noticed this when switching > between folders in kmail and have seen it rebuilding db files for squidguard. > Here is what happened during the db rebuild (no anticipatory ioscheduler): Could you please try reverting the reiserfs changes? http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/reiserfs-readpages.patch and http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/reiserfs_file_write.patch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-25 17:41 ` 2.5.59-mm5 Andrew Morton @ 2003-01-25 20:34 ` Ed Tomlinson 2003-01-25 22:33 ` 2.5.59-mm5 Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Ed Tomlinson @ 2003-01-25 20:34 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm On January 25, 2003 12:41 pm, Andrew Morton wrote: > Ed Tomlinson <tomlins@cam.org> wrote: > > Hi Andrew, > > > > I am seeing a strange problem with mm5. This occurs both with and > > without the anticipatory scheduler changes. What happens is I see very > > high system times and X responds very very slowly. I first noticed this > > when switching between folders in kmail and have seen it rebuilding db > > files for squidguard. Here is what happened during the db rebuild (no > > anticipatory ioscheduler): > > Could you please try reverting the reiserfs changes? > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/ >reiserfs-readpages.patch > > and > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/ >reiserfs_file_write.patch Reverting reiserfs_file_write.patch seems to cure the interactivity problems. I still see the high system times but they in themselves are not a problem. Reverting the second patch does not change the situation. I am currently running with reiserfs_file_write.patch removed - so far so good. Thanks Ed Tomlinson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-25 20:34 ` 2.5.59-mm5 Ed Tomlinson @ 2003-01-25 22:33 ` Andrew Morton 2003-01-26 1:43 ` 2.5.59-mm5 Ed Tomlinson 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2003-01-25 22:33 UTC (permalink / raw) To: Ed Tomlinson; +Cc: linux-mm, Oleg Drokin Ed Tomlinson <tomlins@cam.org> wrote: > > On January 25, 2003 12:41 pm, Andrew Morton wrote: > > Ed Tomlinson <tomlins@cam.org> wrote: > > > Hi Andrew, > > > > > > I am seeing a strange problem with mm5. This occurs both with and > > > without the anticipatory scheduler changes. What happens is I see very > > > high system times and X responds very very slowly. I first noticed this > > > when switching between folders in kmail and have seen it rebuilding db > > > files for squidguard. Here is what happened during the db rebuild (no > > > anticipatory ioscheduler): > > > > Could you please try reverting the reiserfs changes? > > > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/ > >reiserfs-readpages.patch > > > > and > > > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/ > >reiserfs_file_write.patch > > Reverting reiserfs_file_write.patch seems to cure the interactivity problems. > I still see the high system times but they in themselves are not a problem. > Reverting the second patch does not change the situation. I am currently > running with reiserfs_file_write.patch removed - so far so good. > Well, high system time _is_ a problem, isn't it? Do you always see that? Or perhaps userspace monitoring tools are confusing I/O wait with CPU busyness. Does a revert of http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/buffer-io-accounting.patch make the numbers look different? If so, then it's a procps bug... WRT the excessive copy_foo_user() times: I shall forward your initial email to Oleg, thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-25 22:33 ` 2.5.59-mm5 Andrew Morton @ 2003-01-26 1:43 ` Ed Tomlinson 2003-01-26 2:17 ` 2.5.59-mm5 Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Ed Tomlinson @ 2003-01-26 1:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, Oleg Drokin On January 25, 2003 05:33 pm, Andrew Morton wrote: > Ed Tomlinson <tomlins@cam.org> wrote: > > On January 25, 2003 12:41 pm, Andrew Morton wrote: > > > Ed Tomlinson <tomlins@cam.org> wrote: > > > > Hi Andrew, > > > > > > > > I am seeing a strange problem with mm5. This occurs both with and > > > > without the anticipatory scheduler changes. What happens is I see > > > > very high system times and X responds very very slowly. I first > > > > noticed this when switching between folders in kmail and have seen it > > > > rebuilding db files for squidguard. Here is what happened during the > > > > db rebuild (no anticipatory ioscheduler): > > > > > > Could you please try reverting the reiserfs changes? > > > > > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken- > > >out/ reiserfs-readpages.patch > > > > > > and > > > > > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken- > > >out/ reiserfs_file_write.patch > > > > Reverting reiserfs_file_write.patch seems to cure the interactivity > > problems. I still see the high system times but they in themselves are > > not a problem. Reverting the second patch does not change the situation. > > I am currently running with reiserfs_file_write.patch removed - so far so > > good. > > Well, high system time _is_ a problem, isn't it? Do you always see that? > > Or perhaps userspace monitoring tools are confusing I/O wait with CPU > busyness. Does a revert of > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/broken-out/ >buffer-io-accounting.patch > > make the numbers look different? If so, then it's a procps bug... > > WRT the excessive copy_foo_user() times: I shall forward your initial email > to Oleg, thanks. The excessive copy_foo_user times are still there with Oleg (and Chris's) patch removed. Here is what I see doing: "apt-get install --reinstall squidguard chastity-list" (with file_write from my first message) 55091 default_idle 1377.2750 62640 __copy_from_user_ll 1204.6154 33595 __copy_to_user_ll 646.0577 (without file_write) 40259 __copy_from_user_ll 774.2115 18735 default_idle 468.3750 21524 __copy_to_user_ll 413.9231 386 system_call 8.0417 428 current_kernel_time 7.1333 988 established_get_next 6.8611 60 ide_outb 5.0000 509 reiserfs_prepare_write 4.2417 100 get_offset_tsc 4.1667 38 syscall_call 3.4545 159 fget 2.4844 279 radix_tree_lookup 2.2500 61 init_journal_hash 1.9062 68 task_vsize 1.8889 105 mark_page_accessed 1.7500 366 find_lock_page 1.7264 48 delay_tsc 1.7143 89 block_prepare_write 1.7115 237 update_atime 1.6458 32 fput 1.6000 90 unlock_page 1.5000 210 inode_update_time 1.3816 108 sys_pwrite64 1.3500 16 ide_inb 1.3333 78 mark_buffer_dirty 1.3000 192 reiserfs_wait_on_write_block 1.2632 93 handle_IRQ_event 1.2237 76 fault_in_pages_readable 1.1875 4 reiserfs_check_lock_depth 1.0000 So removing file_read seems to have reduced the copy_foo_user() issue but has not removed it. Using a vmstat hacked to show iowait with the above running... oscar% vmstat -a 5 procs memory (mB) swap io system cpu r b w swpd free inact act si so bi bo in cs us sy io id 3 0 0 42 6 13 434 0 3 36 69 1061 61 25 3 1 71 5 0 0 42 4 15 434 0 0 1189 893 1184 18253 28 11 10 51 4 0 0 42 5 8 440 0 66 353 274 1070 7874 74 7 10 9 6 0 0 42 6 9 438 0 0 468 343 1081 2936 93 7 0 0 5 0 0 46 4 5 444 0 714 1453 976 1147 8891 87 13 0 0 4 0 0 51 5 1 447 0 1086 626 1877 1279 23445 57 43 0 0 4 1 1 52 4 3 446 0 290 615 1206 1219 22018 68 32 0 0 6 0 0 53 8 10 434 0 82 690 1020 1141 14962 59 41 0 0 10 0 0 53 36 14 403 0 0 2 599 1206 1988 85 15 0 0 5 0 0 53 27 9 417 0 0 35 94 1072 1269 94 6 0 0 5 0 0 53 31 11 411 0 0 188 761 1089 2401 88 12 0 0 8 0 0 53 26 11 416 0 0 1 298 1052 9013 42 28 3 27 7 0 0 53 25 11 417 0 0 0 22 1021 574 38 62 0 0 10 0 0 53 24 11 418 0 0 0 34 1014 546 53 47 0 0 11 0 0 53 23 11 419 0 0 0 1814 1142 634 43 57 0 0 9 0 0 53 22 11 421 0 0 2 39 1019 556 40 60 0 0 13 0 0 53 20 10 423 0 0 0 32 1031 1183 51 47 0 2 9 0 0 53 18 10 425 0 0 0 1946 1083 560 36 64 0 0 9 0 0 53 17 10 426 0 0 0 28 1016 575 38 62 0 0 10 0 0 53 16 10 427 0 0 0 47 1022 560 52 48 0 0 9 0 0 53 15 10 428 0 0 0 36 1015 540 28 72 0 0 9 0 0 53 14 10 429 0 0 0 27 1023 603 48 52 0 0 8 0 0 53 13 10 430 0 0 0 36 1019 536 48 52 0 0 9 0 0 53 12 10 431 0 0 0 367 1029 539 36 64 0 0 11 0 0 53 11 10 432 0 0 0 1785 1112 587 32 68 0 0 10 0 0 53 11 10 433 0 0 0 58 1030 610 75 25 0 0 10 0 0 53 10 10 433 0 0 0 38 1037 599 67 33 0 0 12 0 0 53 10 10 434 0 0 0 34 1056 679 81 19 0 0 14 0 0 53 10 10 434 26 0 26 44 1059 647 42 58 0 0 13 0 0 53 9 10 435 0 0 0 45 1050 686 56 44 0 0 10 0 0 53 9 10 435 0 0 0 585 1083 678 59 41 0 0 procs memory (mB) swap io system cpu r b w swpd free inact act si so bi bo in cs us sy io id 9 0 1 53 8 10 435 0 0 0 2518 1200 727 48 52 0 0 10 0 0 53 8 10 436 0 0 0 43 1065 660 38 62 0 0 11 0 0 53 7 10 437 0 0 0 39 1044 661 29 71 0 0 9 0 0 53 6 9 438 0 0 0 196 1063 676 44 56 0 0 9 0 0 53 5 10 438 0 0 0 732 1169 681 27 73 0 0 6 4 0 53 4 10 440 0 0 0 633 1121 1987 52 48 0 0 10 0 0 53 10 12 431 0 0 2 3294 1203 8145 54 46 0 0 11 0 0 53 24 17 412 0 0 0 806 1133 686 60 40 0 0 Unless its an accounting error, its not iowait (confirmed on a nonbusy system too). There is no change with or with out the io_schedule() changed back to schedule(). Ed Tomlinson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-26 1:43 ` 2.5.59-mm5 Ed Tomlinson @ 2003-01-26 2:17 ` Andrew Morton 2003-01-26 3:51 ` 2.5.59-mm5 Ed Tomlinson 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2003-01-26 2:17 UTC (permalink / raw) To: Ed Tomlinson; +Cc: linux-mm, green Ed Tomlinson <tomlins@cam.org> wrote: > > The excessive copy_foo_user times are still there with Oleg (and Chris's) patch > removed. Here is what I see doing: > > "apt-get install --reinstall squidguard chastity-list" > > (with file_write from my first message) > 55091 default_idle 1377.2750 > 62640 __copy_from_user_ll 1204.6154 > 33595 __copy_to_user_ll 646.0577 > > (without file_write) > 40259 __copy_from_user_ll 774.2115 > 18735 default_idle 468.3750 > 21524 __copy_to_user_ll 413.9231 > 386 system_call 8.0417 > 428 current_kernel_time 7.1333 Is this different from 2.5.59 base? It's beginning to look like copy_foo_user() itself has gone silly. I don't know what's causing this, Ed. Could you please dig into it a little more? Does it happen with a bare `dd'? Or is it networking? etcetera... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-26 2:17 ` 2.5.59-mm5 Andrew Morton @ 2003-01-26 3:51 ` Ed Tomlinson 2003-01-26 4:04 ` 2.5.59-mm5 Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Ed Tomlinson @ 2003-01-26 3:51 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, green On January 25, 2003 09:17 pm, Andrew Morton wrote: > Is this different from 2.5.59 base? Same in 59 and as far back as 51(ish) which is the oldest that I have prebuilt here... > It's beginning to look like copy_foo_user() itself has gone silly. > > I don't know what's causing this, Ed. Could you please dig into it a > little more? Does it happen with a bare `dd'? Or is it networking? > etcetera... What I see is this. apt installs squidguard squidguard starts 5 processes atp installs chastity-list and the squidguard processes proceed to take most of the cpu. Each of the squidguard processes takes about 17% of the cpu. These keep running after apt finshes and the system time drops when they end... I started a strace of one of the offending processes and saw lots like: pread(6, "\0\0\0\0\1\0\0\0\325\0\0\0\243\0\0\0\267\0\0\0t\1@\16\1"..., 8192, 1744896) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\267\0\0\0\325\0\0\0\320\0\0\0000\2\270"..., 8192, 1499136) = 8192 pread(6, "\0\0\0\0\1\0\0\0\305\0\0\0\330\0\0\0\332\0\0\0d\1\f\16"..., 8192, 1613824) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\273\0\0\0\323\0\0\0\327\0\0\0n\1\210\r"..., 8192, 1531904) = 8192 pread(6, "\0\0\0\0\1\0\0\0\330\0\0\0\10\0\0\0\305\0\0\0j\1`\r\1\5"..., 8192, 1769472) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\342\0\0\0\303\0\0\0\262\0\0\0.\1\314\20"..., 8192, 1851392) = 8192 pread(6, "\0\0\0\0\1\0\0\0\346\0\0\0\310\0\0\0\266\0\0\0X\1$\20\1"..., 8192, 1884160) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\6\0\0\0\317\0\0\0\315\0\0\0\34\2d\4\1"..., 8192, 49152) = 8192 pread(6, "\0\0\0\0\1\0\0\0\5\0\0\0\363\0\0\0\362\0\0\0$\1\224\21"..., 8192, 40960) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\10\0\0\0\341\0\0\0\330\0\0\0\220\1\230"..., 8192, 65536) = 8192 pread(6, "\0\0\0\0\1\0\0\0\331\0\0\0\277\0\0\0\250\0\0\0b\1l\r\1"..., 8192, 1777664) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\350\0\0\0\37\0\0\0\303\0\0\0H\1`\20\1"..., 8192, 1900544) = 8192 pread(6, "\0\0\0\0\1\0\0\0\267\0\0\0\325\0\0\0\320\0\0\0000\2\270"..., 8192, 1499136) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\310\0\0\0\362\0\0\0\346\0\0\0B\1|\20\1"..., 8192, 1638400) = 8192 pread(6, "\0\0\0\0\1\0\0\0\302\0\0\0\326\0\0\0\335\0\0\0l\1\354\16"..., 8192, 1589248) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\313\0\0\0\356\0\0\0\270\0\0\0\26\2\254"..., 8192, 1662976) = 8192 pread(6, "\0\0\0\0\1\0\0\0\307\0\0\0\354\0\0\0\347\0\0\0N\1l\17\1"..., 8192, 1630208) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\314\0\0\0\361\0\0\0\265\0\0\0\24\2d\4"..., 8192, 1671168) = 8192 pread(6, "\0\0\0\0\1\0\0\0\10\0\0\0\341\0\0\0\330\0\0\0\220\1\230"..., 8192, 65536) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\321\0\0\0\266\0\0\0\243\0\0\0p\1\320\r"..., 8192, 1712128) = 8192 pread(6, "\0\0\0\0\1\0\0\0\336\0\0\0 \0\0\0\300\0\0\0>\0010\17\1"..., 8192, 1818624) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\322\0\0\0\272\0\0\0\244\0\0\0\274\1`\t"..., 8192, 1720320) = 8192 pread(6, "\0\0\0\0\1\0\0\0\4\0\0\0\344\0\0\0\361\0\0\0(\1\240\21"..., 8192, 32768) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\262\0\0\0\342\0\0\0\316\0\0\0\350\1\340"..., 8192, 1458176) = 8192 pread(6, "\0\0\0\0\1\0\0\0\324\0\0\0\274\0\0\0!\0\0\0\250\1\220\f"..., 8192, 1736704) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0!\0\0\0\324\0\0\0\356\0\0\0008\1p\21\1"..., 8192, 270336) = 8192 pread(6, "\0\0\0\0\1\0\0\0\310\0\0\0\362\0\0\0\346\0\0\0B\1|\20\1"..., 8192, 1638400) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\271\0\0\0\347\0\0\0\326\0\0\0\202\1\334"..., 8192, 1515520) = 8192 pread(6, "\0\0\0\0\1\0\0\0\266\0\0\0\346\0\0\0\321\0\0\0t\1\354\v"..., 8192, 1490944) = 8192 pwrite(6, "\0\0\0\0\1\0\0\0\3\0\0\0\351\0\0\0\357\0\0\0\"\1\270\20"..., 8192, 24576) = 8192 Does this help? Ed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-26 3:51 ` 2.5.59-mm5 Ed Tomlinson @ 2003-01-26 4:04 ` Andrew Morton 0 siblings, 0 replies; 32+ messages in thread From: Andrew Morton @ 2003-01-26 4:04 UTC (permalink / raw) To: Ed Tomlinson; +Cc: linux-mm, green Ed Tomlinson <tomlins@cam.org> wrote: > > and the squidguard processes proceed to take most of the cpu. Each > of the squidguard processes takes about 17% of the cpu. These keep > running after apt finshes and the system time drops when they end... > > ... > > Does this help? Not a lot. Looks like squidguard has gone berzerk reading lots of stuff from pagecache. Could be that it has a bug which is triggered by subtly altered kernel behaviour, or a subtle bug in the kernel broke it. Do any other applications exhibit the same behaviour? Can you generate a simple, standalone usage of squidguard which exhibits this behaviour? Just starting them up?? You may need to build your own squidguard and attach gdb to one, see what it's up to. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:50 ` 2.5.59-mm5 Andrew Morton 2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas @ 2003-01-24 15:56 ` Oliver Xymoron 2003-01-24 16:04 ` 2.5.59-mm5 Nick Piggin 1 sibling, 1 reply; 32+ messages in thread From: Oliver Xymoron @ 2003-01-24 15:56 UTC (permalink / raw) To: Andrew Morton; +Cc: Alex Tomas, linux-kernel, linux-kernel, linux-mm On Fri, Jan 24, 2003 at 03:50:17AM -0800, Andrew Morton wrote: > Alex Tomas <bzzz@tmi.comex.ru> wrote: > > > > >>>>> Andrew Morton (AM) writes: > > > > AM> But writes are completely different. There is no dependency > > AM> between them and at any point in time we know where on-disk a lot > > AM> of writes will be placed. We don't know that for reads, which is > > AM> why we need to twiddle thumbs until the application or filesystem > > AM> makes up its mind. > > > > > > it's significant that application doesn't want to wait read completion > > long and doesn't wait for write completion in most cases. > > That's correct. Reads are usually synchronous and writes are rarely > synchronous. > > The most common place where the kernel forces a user process to wait on > completion of a write is actually in unlink (truncate, really). Because > truncate must wait for in-progress I/O to complete before allowing the > filesystem to free (and potentially reuse) the affected blocks. > > If there's a lot of writeout happening then truncate can take _ages_. Hence > this patch: An alternate approach might be to change the way the scheduler splits things. That is, rather than marking I/O read vs write and scheduling based on that, add a flag bit to mark them all sync vs async since that's the distinction we actually care about. The normal paths can all do read+sync and write+async, but you can now do things like marking your truncate writes sync and readahead async. And dependent/nondependent or stalling/nonstalling might be a clearer terminology. -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron @ 2003-01-24 16:04 ` Nick Piggin 2003-01-24 17:09 ` 2.5.59-mm5 Giuliano Pochini 0 siblings, 1 reply; 32+ messages in thread From: Nick Piggin @ 2003-01-24 16:04 UTC (permalink / raw) To: Oliver Xymoron Cc: Andrew Morton, Alex Tomas, linux-kernel, linux-kernel, linux-mm Oliver Xymoron wrote: >On Fri, Jan 24, 2003 at 03:50:17AM -0800, Andrew Morton wrote: > >>Alex Tomas <bzzz@tmi.comex.ru> wrote: >> >>>>>>>>Andrew Morton (AM) writes: >>>>>>>> >>> AM> But writes are completely different. There is no dependency >>> AM> between them and at any point in time we know where on-disk a lot >>> AM> of writes will be placed. We don't know that for reads, which is >>> AM> why we need to twiddle thumbs until the application or filesystem >>> AM> makes up its mind. >>> >>> >>>it's significant that application doesn't want to wait read completion >>>long and doesn't wait for write completion in most cases. >>> >>That's correct. Reads are usually synchronous and writes are rarely >>synchronous. >> >>The most common place where the kernel forces a user process to wait on >>completion of a write is actually in unlink (truncate, really). Because >>truncate must wait for in-progress I/O to complete before allowing the >>filesystem to free (and potentially reuse) the affected blocks. >> >>If there's a lot of writeout happening then truncate can take _ages_. Hence >>this patch: >> > >An alternate approach might be to change the way the scheduler splits >things. That is, rather than marking I/O read vs write and scheduling >based on that, add a flag bit to mark them all sync vs async since >that's the distinction we actually care about. The normal paths can >all do read+sync and write+async, but you can now do things like >marking your truncate writes sync and readahead async. > >And dependent/nondependent or stalling/nonstalling might be a clearer >terminology. > That will be worth investigating to see if the complexity is worth it. I think from a disk point of view, we still want to split batches between reads and writes. Could be wrong. Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 16:04 ` 2.5.59-mm5 Nick Piggin @ 2003-01-24 17:09 ` Giuliano Pochini 2003-01-24 17:22 ` 2.5.59-mm5 Nick Piggin 0 siblings, 1 reply; 32+ messages in thread From: Giuliano Pochini @ 2003-01-24 17:09 UTC (permalink / raw) To: Nick Piggin Cc: linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron >>An alternate approach might be to change the way the scheduler splits >>things. That is, rather than marking I/O read vs write and scheduling >>based on that, add a flag bit to mark them all sync vs async since >>that's the distinction we actually care about. The normal paths can >>all do read+sync and write+async, but you can now do things like >>marking your truncate writes sync and readahead async. > That will be worth investigating to see if the complexity is worth it. > I think from a disk point of view, we still want to split batches between > reads and writes. Could be wrong. Yes, sync vs async is a better way to classify io requests than read vs write and it's more correct from OS point of view. IMHO it's not more complex then now. Just replace r/w with sy/as and it will work. Bye. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 17:09 ` 2.5.59-mm5 Giuliano Pochini @ 2003-01-24 17:22 ` Nick Piggin 2003-01-24 19:34 ` 2.5.59-mm5 Valdis.Kletnieks 0 siblings, 1 reply; 32+ messages in thread From: Nick Piggin @ 2003-01-24 17:22 UTC (permalink / raw) To: Giuliano Pochini Cc: linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron Giuliano Pochini wrote: >>>An alternate approach might be to change the way the scheduler splits >>>things. That is, rather than marking I/O read vs write and scheduling >>>based on that, add a flag bit to mark them all sync vs async since >>>that's the distinction we actually care about. The normal paths can >>>all do read+sync and write+async, but you can now do things like >>>marking your truncate writes sync and readahead async. >>> > >>That will be worth investigating to see if the complexity is worth it. >>I think from a disk point of view, we still want to split batches between >>reads and writes. Could be wrong. >> > >Yes, sync vs async is a better way to classify io requests than >read vs write and it's more correct from OS point of view. IMHO >it's not more complex then now. Just replace r/w with sy/as and >it will work. > We probably wouldn't want to go that far as you obviously can only merge reads with reads and writes with writes, a flag would be fine. We have to get the basics working first though ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 17:22 ` 2.5.59-mm5 Nick Piggin @ 2003-01-24 19:34 ` Valdis.Kletnieks 2003-01-24 20:04 ` 2.5.59-mm5 Jens Axboe 0 siblings, 1 reply; 32+ messages in thread From: Valdis.Kletnieks @ 2003-01-24 19:34 UTC (permalink / raw) To: Nick Piggin Cc: Giuliano Pochini, linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron [-- Attachment #1: Type: text/plain, Size: 541 bytes --] On Sat, 25 Jan 2003 04:22:39 +1100, Nick Piggin said: > We probably wouldn't want to go that far as you obviously can > only merge reads with reads and writes with writes, a flag would > be fine. We have to get the basics working first though ;) "obviously can only"? Admittedly, merging reads and writes is a lot trickier, and probably "too hairy to bother", but I'm not aware of a fundamental "cant" that applies across IDE/SCSI/USB/1394/fiberchannel/etc. -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 19:34 ` 2.5.59-mm5 Valdis.Kletnieks @ 2003-01-24 20:04 ` Jens Axboe 2003-01-24 22:02 ` 2.5.59-mm5 Valdis.Kletnieks 0 siblings, 1 reply; 32+ messages in thread From: Jens Axboe @ 2003-01-24 20:04 UTC (permalink / raw) To: Valdis.Kletnieks, Nick Piggin Cc: Giuliano Pochini, linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron On Fri, Jan 24 2003, Valdis.Kletnieks@vt.edu wrote: > On Sat, 25 Jan 2003 04:22:39 +1100, Nick Piggin said: > > We probably wouldn't want to go that far as you obviously can > > only merge reads with reads and writes with writes, a flag would > > be fine. We have to get the basics working first though ;) > > "obviously can only"? Admittedly, merging reads and writes is a lot > trickier, and probably "too hairy to bother", but I'm not aware of a > fundamental "cant" that applies across IDE/SCSI/USB/1394/fiberchannel/etc. Nicks comment refers to the block layer situation, we obviously cannot merge reads and writes there. You would basically have to rewrite the entire request submission structure and break all drivers. And for zero benefit. Face it, it would be stupid to even attempt such a manuever. Since you bring it up, you must know if a device which can take a single command that says "read blocks a to b, and write blocks x to z"? Even if such a thing existed, it would be much better implemented by the driver as pulling more requests of the queue and constructing these weirdo commands itself. Something as ugly as that would never invade the Linux block layer, at least not as long as I have any input on the design of it. So I quite agree with the "obviously". -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 20:04 ` 2.5.59-mm5 Jens Axboe @ 2003-01-24 22:02 ` Valdis.Kletnieks 2003-01-25 12:28 ` 2.5.59-mm5 Jens Axboe 0 siblings, 1 reply; 32+ messages in thread From: Valdis.Kletnieks @ 2003-01-24 22:02 UTC (permalink / raw) To: Jens Axboe Cc: Nick Piggin, Giuliano Pochini, linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron [-- Attachment #1: Type: text/plain, Size: 1309 bytes --] On Fri, 24 Jan 2003 21:04:34 +0100, Jens Axboe said: > Nicks comment refers to the block layer situation, we obviously cannot > merge reads and writes there. You would basically have to rewrite the > entire request submission structure and break all drivers. And for zero > benefit. Face it, it would be stupid to even attempt such a manuever. As I *said* - "hairy beyond benefit", not "cant". > Since you bring it up, you must know if a device which can take a single > command that says "read blocks a to b, and write blocks x to z"? Even > such thing existed, They do exist. IBM mainframe disks (the 3330/50/80 series) are able to do much more than that in one CCW chain So it was *quite* possible to even express things like "Go to this cylinder/track, search for each record that has value XYZ in the 'key' field, and if found, write value ABC in the data field". (In fact, the DASD I/O opcodes for CCW chains are Turing-complete). > it would be much better implemented by the driver > as pulling more requests of the queue and constructing these weirdo The only operating system I'm aware of that actually uses that stuff is MVS. > So I quite agree with the "obviously". My complaint was the confusion of "obviously cant" with "we have decided we don't want to". /Valdis [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 22:02 ` 2.5.59-mm5 Valdis.Kletnieks @ 2003-01-25 12:28 ` Jens Axboe 0 siblings, 0 replies; 32+ messages in thread From: Jens Axboe @ 2003-01-25 12:28 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Nick Piggin, Giuliano Pochini, linux-mm, linux-kernel, linux-kernel, Alex Tomas, Andrew Morton, Oliver Xymoron On Fri, Jan 24 2003, Valdis.Kletnieks@vt.edu wrote: > On Fri, 24 Jan 2003 21:04:34 +0100, Jens Axboe said: > > > Nicks comment refers to the block layer situation, we obviously cannot > > merge reads and writes there. You would basically have to rewrite the > > entire request submission structure and break all drivers. And for zero > > benefit. Face it, it would be stupid to even attempt such a manuever. > > As I *said* - "hairy beyond benefit", not "cant". Hairy is ok as long as it provides substantial benefit in some way, and this does definitely not qualify. > > Since you bring it up, you must know if a device which can take a single > > command that says "read blocks a to b, and write blocks x to z"? Even > > such thing existed, > > They do exist. > > IBM mainframe disks (the 3330/50/80 series) are able to do much more > than that in one CCW chain So it was *quite* possible to even express > things like "Go to this cylinder/track, search for each record that > has value XYZ in the 'key' field, and if found, write value ABC in the > data field". (In fact, the DASD I/O > opcodes for CCW chains are Turing-complete). Well as interesting as that is, it is still an obscurity that will not be generally supported. As I said, if you wanted to do such a thing you can do it in the driver. Complicating the block layer in this way is totally unacceptable, and is just bound to be an endless source of data corrupting driver bugs. > > So I quite agree with the "obviously". > > My complaint was the confusion of "obviously cant" with "we have decided we > don't want to". Ok fair enough, make that a strong "obviously wont" instead then. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton 2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas @ 2003-01-24 12:14 ` Nikita Danilov 2003-01-24 16:00 ` 2.5.59-mm5 Nick Piggin 1 sibling, 1 reply; 32+ messages in thread From: Nikita Danilov @ 2003-01-24 12:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Alex Bligh - linux-kernel, linux-kernel, linux-mm Andrew Morton writes: [...] > > In this very common scenario, the only way we'll ever get "lumps" of reads is > if some other processes come in and happen to want to read nearby sectors. Or if you have read-ahead for meta-data, which is quite useful. Isn't read ahead targeting the same problem as this anticipatory scheduling? > In the best case, the size of the lump is proportional to the number of > processes which are concurrently trying to read something. This just doesn't > happen enough to be significant or interesting. > > But writes are completely different. There is no dependency between them and > at any point in time we know where on-disk a lot of writes will be placed. > We don't know that for reads, which is why we need to twiddle thumbs until the > application or filesystem makes up its mind. > Nikita. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov @ 2003-01-24 16:00 ` Nick Piggin 0 siblings, 0 replies; 32+ messages in thread From: Nick Piggin @ 2003-01-24 16:00 UTC (permalink / raw) To: Nikita Danilov Cc: Andrew Morton, Alex Bligh - linux-kernel, linux-kernel, linux-mm Nikita Danilov wrote: >Andrew Morton writes: > >[...] > > > > > In this very common scenario, the only way we'll ever get "lumps" of reads is > > if some other processes come in and happen to want to read nearby sectors. > >Or if you have read-ahead for meta-data, which is quite useful. Isn't >read ahead targeting the same problem as this anticipatory scheduling? > Finesse vs brute force. A bit of readahead is good. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel 2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton @ 2003-01-24 11:23 ` Jens Axboe 1 sibling, 0 replies; 32+ messages in thread From: Jens Axboe @ 2003-01-24 11:23 UTC (permalink / raw) To: Alex Bligh - linux-kernel, Andrew Morton, linux-kernel, linux-mm On Fri, Jan 24 2003, Alex Bligh - linux-kernel wrote: > > --On 23 January 2003 19:50 -0800 Andrew Morton <akpm@digeo.com> wrote: > > > So what anticipatory scheduling does is very simple: if an application > > has performed a read, do *nothing at all* for a few milliseconds. Just > > return to userspace (or to the filesystem) in the expectation that the > > application or filesystem will quickly submit another read which is > > closeby. > > I'm sure this is a really dumb question, as I've never played > with this subsystem, in which case I apologize in advance. > > Why not follow (by default) the old system where you put the reads > effectively at the back of the queue. Then rather than doing nothing > for a few milliseconds, you carry on with doing the writes. However, > promote the reads to the front of the queue when you have a "good > lump" of them. If you get further reads while you are processing > a lump of them, put them behind the lump. Switch back to the putting > reads at the end when we have done "a few lumps worth" of > reads, or exhausted the reads at the start of the queue (or > perhaps are short of memory). The whole point of anticipatory disk scheduling is that the one process that submits a read is not going to do anything before that reads completes. However, maybe it will issue a _new_ read right after the first one completes. The anticipation being that the same process will submit a close read immediately. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 got stuck during boot 2003-01-24 3:50 2.5.59-mm5 Andrew Morton 2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel @ 2003-01-24 13:59 ` Helge Hafting 2003-01-24 17:44 ` Ed Tomlinson 2003-01-25 8:33 ` 2.5.59-mm5 Andres Salomon 2 siblings, 1 reply; 32+ messages in thread From: Helge Hafting @ 2003-01-24 13:59 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm Andrew Morton wrote: > . -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler. Interesting, but it didn't boot completely. It came all the way to mount root from /dev/md0 (dirty raid1) freed 316k of kernel memory, and then nothing happened. numloc and capslock worked, and so did sysrq. It was as if the kernel "forgot" to run init. Nothing happened, but it wasn't hanging either. sysrq "show pc" told me something about default idle. I noticed that the root raid-1 came up dirty. (2.5.X seems unable to shut down a raid-1 device "clean" if it happens to be the root fs. So there's _always_ a bootup resync that starts as soon as the raid is autodetected. (Before mounting root) This is a UP P4, preempt, no module support, compiled with gcc 2.95.4 from debian. Stock 2.5.59 works, the only config change is to enable that new CONFIG_HANGCHECK_TIMER. Helge Hafting -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 got stuck during boot 2003-01-24 13:59 ` 2.5.59-mm5 got stuck during boot Helge Hafting @ 2003-01-24 17:44 ` Ed Tomlinson 2003-01-24 17:56 ` Nick Piggin 0 siblings, 1 reply; 32+ messages in thread From: Ed Tomlinson @ 2003-01-24 17:44 UTC (permalink / raw) To: Andrew Morton, Nick Piggin; +Cc: linux-mm On January 24, 2003 08:59 am, Helge Hafting wrote: > Andrew Morton wrote: > > . -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler. > > Interesting, but it didn't boot completely. > It came all the way to mount root from /dev/md0 (dirty raid1) > freed 316k of kernel memory, and then nothing happened. > numloc and capslock worked, and so did sysrq. > It was as if the kernel "forgot" to run init. > Nothing happened, but it wasn't hanging either. > > sysrq "show pc" told me something about default idle. > I noticed that the root raid-1 came up dirty. (2.5.X > seems unable to shut down a raid-1 device "clean" if > it happens to be the root fs. So there's _always_ > a bootup resync that starts as soon as the raid > is autodetected. (Before mounting root) > > > This is a UP P4, preempt, no module support, > compiled with gcc 2.95.4 from debian. > > Stock 2.5.59 works, the only config change is to enable > that new CONFIG_HANGCHECK_TIMER. Same story here - almost. No raid, using debian and the same compiler along with multiple disks and fs(es). Following are the messages and a sysrq+T: Hope this helps, Ed Tomlinson --------- Linux version 2.5.59-mm5 (ed@oscar) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Fri Jan 24 12:09:29 EST 2003 Video mode to be used for restore is f00 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001fff0000 (usable) BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS) BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 511MB LOWMEM available. On node 0 totalpages: 131056 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 126960 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 Building zonelist for node : 0 Kernel command line: auto BOOT_IMAGE=Linux ro root=2103 console=tty0 console=ttyS0,38400 vga=ask idebus=33 profile=1 ide_setup: idebus=33 kernel profiling enabled Initializing CPU#0 PID hash table entries: 2048 (order 11: 16384 bytes) Detected 400.850 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 790.52 BogoMIPS Memory: 513308k/524224k available (1336k kernel code, 10184k reserved, 713k data, 80k init, 0k highmem) Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) -> /dev -> /dev/console -> /root Enabling new style K6 write allocation for 511 Mb CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line) CPU: L2 Cache: 256K (32 bytes/line) CPU: AMD-K6(tm) 3D+ Processor stepping 01 Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket mtrr: v2.0 (20020519) PCI: PCI BIOS revision 2.10 entry at 0xfb520, last bus=1 PCI: Using configuration type 1 BIO: pool of 256 setup, 15Kb (60 bytes/bio) biovec pool[0]: 1 bvecs: 256 entries (12 bytes) biovec pool[1]: 4 bvecs: 256 entries (48 bytes) biovec pool[2]: 16 bvecs: 256 entries (192 bytes) biovec pool[3]: 64 bvecs: 256 entries (768 bytes) biovec pool[4]: 128 bvecs: 256 entries (1536 bytes) biovec pool[5]: 256 bvecs: 256 entries (3072 bytes) Linux Plug and Play Support v0.94 (c) Adam Belay pnp: Enabling Plug and Play Card Services. PnPBIOS: Found PnP BIOS installation structure at 0xc00fc160 PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc188, dseg 0xf0000 PnPBIOS: 14 nodes reported by PnP BIOS; 14 recorded by driver isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found block request queues: 128 requests per read queue 128 requests per write queue 8 requests per batch enter congestion at 15 exit congestion at 17 drivers/usb/core/usb.c: registered new driver usbfs drivers/usb/core/usb.c: registered new driver hub PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI: Using IRQ router VIA [1106/0586] at 00:07.0 aio_setup: sizeof(struct page) = 40 Journalled Block Device driver loaded Initializing Cryptographic API Activating ISA DMA hang workarounds. Serial: 8250/16550 driver $Revision: 1.90 $ IRQn sharing disablttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A pty: 256 Unix98 ptys configured Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected VIA MVP3 chipset agpgart: Maximum main memory to use for agp memory: 439M agpgart: AGP aperture is 64M @ 0xe0000000 [drm] Initialized mga 3.1.0 20021029 on minor 0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes VP_IDE: IDE controller at PCI slot 00:07.1 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c586b (rev 47) IDE UDMA33 controller on pci00:07.1 ide0: BM-DMA at 0xa000-0xa007, BIOS settings: hda:DMA, hdb:DMA ide1: eBM-DMA at 0xa00-0xa00f, BIOS settings: hdc:DMA, hdd:DMA hda: QUANTUM FIREBALLP KA13.6, ATA DISK drive hda: DMA disabled ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: AOPEN 16XDVD-ROM/AMH 20020328, ATAPI CD/DVD-ROM drive hdd: HP COLORADO 20GB, ATAPI TAPE drive hdc: DMA disabled hdd: DMA disabled ide1 at 0x170-0x177,0x376 on irq 15 PDC20267: IDE controller at PCI slot 00:09.0 PCI: Found IRQ 12 for device 00:09.0 PDC20267: chipset revision 2 PDC20267: not 100% native mode: will probe irqs later PDC20267: ROM enabled at 0xeb000000 PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode. ide2: BM-DMA at 0xbc00-0xbc07, BIOS settings: hde:DMA, hdf:pio ide3: BM-DMA at 0xbc08-0xbc0f, BIOS settings: hdg:DMA, hdh:DMA hde: QUANTUM FIREBALLP AS40.0, ATA DISK drive ide2 at 0xac00-0xac07,0xb002 on irq 12 hdg: QUANTUM FIREBALLP AS40.0, ATA DISK drive ide3 at 0xb400-0xb407,0xb802 on irq 12 hda: host protected area => 1 hda: 27067824 sectors (13859 MB) w/371KiB Cache, CHS=26853/16/63, UDMA(33) hda: hda1 hda2 hda3 hda4 < hda5 > hde: host protected area => 1 hde: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=77557/16/63, UDMA(100) hde: hde1 hde2 hde3 hde4 < hde5 > hdg: host protected area => 1 hdg: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=77557/16/63, UDMA(100) hdg: hdg1 hdg2 hdg3 hdg4 < hdg5 > drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.0 uhci-hcd 00:07.2: VIA Technologies, In USB uhci-hcd 00:07.2: irq 10, io base 0000a400 Please use the 'usbfs' filetype instead, the 'usbdevfs' name is deprecated. uhci-hcd 00:07.2: new USB bus registered, assigned bus number 1 hub 1-0:0: USB hub found hub 1-0:0: 2 ports detected mice: PS/2 mouse device common for all mice input: AT Set 2 keyboard on isa0060/serio0 serio: i8042 KBD port at 0x60,0x64 irq 1 NET4: Linux TCP/IP 1.0 for NET4.0 IP: routing cache hash table of 4096 buckets, 32Kbytes TCP: Hash tables configured (established 32768 bind 32768) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. found reiserfs format "3.6" with standard journal hub 1-0:0: debounce: port 1: delay 100ms stable 4 status 0x101 hub 1-0:0: new USB device on port 1, assigned address 2 hub 1-1:0: USB hub found Reiserfs journal params: device ide2(33,3), size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 reiserfs: checking transaction log (ide2(33,3)) for (ide2(33,3)) hub 1-1:0: 4 ports detected Using r5 hash to sort names VFS: Mounted root (reiserfs filesystem) readonly. Freeing unused kernel memory: 80k freed hub 1-0:0: debounce: port 2: delay 100ms stable 4 status 0x301 hub 1-0:0: new USB device on port 2, assigned address 3 SysRq : Show State free sibling task PC stack pid father child younger older init D 00000086 12112 1 0 2 (NOTLB) Call Trace: [<c0113f5a>] io_schedule+0xe/0x18 [<c0127654>] __lock_page+0x90/0xac [<c0114694>] autoremove_wake_function+0x0/0x38 [<c0114694>] autoremove_wake_function+0x0/0x38 [<c01284cb>] filemap_nopage+0x16b/0x2ac [<c01322d4>] do_no_page+0x78/0x2b4 [<c013257d>e] handle_mm_fau+0x6d/0x10c [<c0111cb7>] do_page_fault+0x137/0x414 [<c0111b80>] do_page_fault+0x0/0x414 [<c013e9aa>] __fput+0xe6/0x108 [<c0133f01>] unmap_vma+0x69/0x70 [<c0133f1c>] unmap_vma_list+0x14/0x20 [<c013423b>] do_munmap+0x127/0x134 [<c013428c>] sys_munmap+0x44/0x60 [<c0108cbd>] error_code+0x2d/0x40 ksoftirqd/0 S 00000046 4294963856 2 1 3 (L-TLB) Call Trace: [<c01196e9>] ksoftirqd+0x59/0xc8 [<c0119711>] ksoftirqd+0x81/0xc8 [<c0119690>] ksoftirqd+0x0/0xc8 [<c0106e45>] kernel_thread_helper+0x5/0xc events/0 D 00000046 4294953780 3 1 12 4 2 (L-TLB) Call Trace: [<c0113463>] wait_for_completion+0x1b/0xe0 [<c01134e5>] wait_for_completion+0x9d/0xe0 [<c01132c8>] default_wake_function+0x0/0x2c [<c01132c8>] default_wake_function+0x0/0x2c [<c0115cba>] do_fork+0x10e/0x130 [<c0106ec5>] kernel_thread+0x79/0x94 [<c0121758>] ____call_usermodehelper+0x0/0x3c [<c0106e40>] kernel_thread_helper+0x0/0xc [<c01217a9>] __call_usermodehelper+0x15/0x28 [<c0121758>] ____call_usermodehelper+0x0/0x3c [<c0121cf2>] worker_thread+0x1fa/0x2dc [<c0121af8>] worker_thread+0x0/0x2dc [<c0121794>] __call_usermodehelper+0x0/0x28 [<c01132c8>] default_wake_function+0x0/0x2c [<c01132c8>] default_wake_function+0x0/0x2c [<c0106e45>] kernel_thread_helper+0x5/0xc khubd D 00000046 4292756256 4 1 5 3 (L-TLB) Call Trace: [<c0113463>] wait_for_completion+0x1b/0xe0 [<c01134e5>] wait_for_completion+0x9d/0xe0 [<c01132c8>] default_wake_function+0x0/0x2c [<c01132c8>] default_wake_function+0x0/0x2c [<c0121903>] call_usermodehelper+0x147/0x15c [<c01ec6d0>] usb_hotplug+0x0/0x1d8 [<c0121794>] __call_usermodehelper+0x0/0x28 [<c0121794>] __call_usermodehelper+0x0/0x28 [<c01b0fc9>] do_hotplug+0x1e9/0x21c [<c01b102c>] dev_hotplug+0x30/0x3c [<c01ec6d0>] usb_hotplug+0x0/0x1d8 [<c01af34e>] device_add+0x112/0x148 [<c01ed112>] usb_new_device+0x366/0x4c4 [<c0116a26>] printk+0x11e/0x140 [<c01eec0f>] usb_hub_port_connect_change+0x24f/0x2e4 [<c01eeddb>] usb_hub_events+0x137/0x2c4 [<c01eef98>] usb_hub_thread+0x30/0xd8 [<c01eef68>] usb_hub_thread+0x0/0xd8 [<c01132c8>] default_wake_function+0x0/0x2c [<c0106e45>] kernel_thread_helper+0x5/0xc pdflush S 00000046 4292616332 5 1 6 4 (L-TLB) Call Trace: [<c012ba65>] __pdflush+0xf5/0x1f8 [<c012bb68>] pdflush+0x0/0x14 [<c012bb73>] pdflush+0xb/0x14 [<c0106e45>] kernel_thread_helper+0x5/0xc pdflush S 00000046 14412 6 1 7 5 (L-TLB) Call Trace: [<c012ba65>] __pdflush+0xf5/0x1f8 [<c012bb68>] pdflush+0x0/0x14 [<c012bb73>] pdflush+0xb/0x14 [<c0106e45>] kernel_thread_helper+0x5/0xc kswapd0 S 00000046 4294958936 7 1 8 6 (L-TLB) Call Trace: [<c012fb7a>] kswapd+0xea/0x10c [<c012fa90>] kswapd+0x0/0x10c [<c0109c3b>] math_state_restore+0x27/0x38 [<c0108d15>] device_not_available+0x25/0x2a [<c010e170>] save_init_fpu+0x1c/0x38 [<c01132b0>] preempt_schedule+0x28/0x40 [<c0112b7c>] schedule_tail+0x1c/0x4c [<c0108915>] ret_from_fork+0x5/0x20 [<c012fa90>] kswapd+0x0/0x10c [<c0114694>] autoremove_wake_function+0x0/0x38 [<c0114694>] autoremove_wake_function+0x0/0x38 [<c0106e45>] kernel_thre<ad_helper+0x5/0 aio/0 S 00000046 429488[6880 8 9 7 (L-TLB) Call Trace: [<c0121c49>] worker_thread+0x151/0x2dc [<c0121af8>] worker_thread+0x0/0x2dc [<c0108915>] ret_from_fork+0x5/0x20 [<c01132c8>] default_wake_function+0x0/0x2c [<c01132c8>] default_wake_function+0x0/0x2c [<c0106e45>] kernel_thread_helper+0x5/0xc kpnpbiosd T 00000046 4294880228 9 1 10 8 (L-TLB) Call Trace: [<c011820c>] do_exit+0x3c4/0x3d4 [<c0118232>] complete_and_exit+0x16/0x18 [<c01a769d>] pnp_dock_thread+0x99/0xf4 [<c01a7604>] pnp_dock_thread+0x0/0xf4 [<c0106e45>] kernel_thread_helper+0x5/0xc kseriod S 00000046 4294112016 10 1 11 9 (L-TLB) Call Trace: [<c01ff629>] serio_thread+0x9d/0x124 [<c01ff58c>] serio_thread+0x0/0x124 [<c01132c8>] default_wake_function+0x0/0x2c [<c0106e45>] kernel_thread_helper+0x5/0xc reiserfs/0 S 00000046 8096 11 1 10 (L-TLB) Call Trace: [<c0121c49>] worker_thread+0x151/0x2dc [<c0121af8>] worker_thread+0x0/0x2dc [<c0108915>] ret_from_fork+0x5/0x20 [<c01132c8>] default_wake_function+0x0/0x2c [<c01132c8>] default_wake_function+0x0/0x2c [<c0106e45>] kernel_thread_helper+0x5/0xc events/0 D 00000046 4294304092 12 3 (L-TLB) Call Trace: [<c0113f5a>] io_schedule+0xe/0x18 [<c013ec50>] __wait_on_buffer+0x78/0x94 [<c0114694>] autoremove_wake_function+0x0/0x38 [<c0114694>] autoremove_wake_function+0x0/0x38 [<c013fbfc>] __bread_slow+0x6c/0x94 [<c013fe4c>] __bread+0x28/0x30 [<c018d5c9>] search_by_key+0x65/0xd64 [<c01792a4>] search_by_entry_key+0x20/0x1b4 [<c01797e9>] reiserfs_find_entry+0x7d/0x134 [<c0179919>] reiserfs_lookup+0x79/0x168 [<c012d14e>] kmem_cache_alloc+0x22/0x5c [<c01515ef>] d_alloc+0x1b/0x18c [<c0148b5f>] real_lookup+0x5f/0xcc [<c0148dfe>] do_lookup+0xb2/0x1fc [<c01494c7>] link_path_walk+0x57f/0x8c4 [<c0149af4>] path_lookup+0x128/0x12c [<c014640b>] open_exec+0x1b/0xb8 [<c01471ca>] do_execve+0x1e/0x204 [<c012d14e>] kmem_cache_alloc+0x22/0x5c [<c014887e>] getname+0x5e/0x9c [<c0107584>] sys_execve+0x2c/0x64 [<c0108a57>] syscall_call+0x7/0xb [<c01214e3>] exec_usermodehelper+0x333/0x360 [<c0121785>] ____call_usermodehelper+0x2d/0x3c [<c0121758>] ____call_usermodehelper+0x0/0x3c [<c0106e45>] kernel_thread_helper+0x5/0xc SysRq : Emergency Sync Syncing device ide2(33,3) ... OK Done. SysRq : Emergency Remount R/O Remounting device ide2(33,3) ... R/O Done. SysRq : Resetting -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 got stuck during boot 2003-01-24 17:44 ` Ed Tomlinson @ 2003-01-24 17:56 ` Nick Piggin 2003-01-24 19:18 ` Ed Tomlinson 0 siblings, 1 reply; 32+ messages in thread From: Nick Piggin @ 2003-01-24 17:56 UTC (permalink / raw) To: Ed Tomlinson; +Cc: Andrew Morton, linux-mm Ed Tomlinson wrote: >On January 24, 2003 08:59 am, Helge Hafting wrote: > >>Andrew Morton wrote: >> >>>. -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler. >>> >>Interesting, but it didn't boot completely. >>It came all the way to mount root from /dev/md0 (dirty raid1) >>freed 316k of kernel memory, and then nothing happened. >>numloc and capslock worked, and so did sysrq. >>It was as if the kernel "forgot" to run init. >>Nothing happened, but it wasn't hanging either. >> >>sysrq "show pc" told me something about default idle. >>I noticed that the root raid-1 came up dirty. (2.5.X >>seems unable to shut down a raid-1 device "clean" if >>it happens to be the root fs. So there's _always_ >>a bootup resync that starts as soon as the raid >>is autodetected. (Before mounting root) >> >> >>This is a UP P4, preempt, no module support, >>compiled with gcc 2.95.4 from debian. >> >>Stock 2.5.59 works, the only config change is to enable >>that new CONFIG_HANGCHECK_TIMER. >> > >Same story here - almost. No raid, using debian and the same >compiler along with multiple disks and fs(es). > >Following are the messages and a sysrq+T: > >Hope this helps, > Yes thanks for the nice report. > > free sibling > task PC stack pid father child younger older >init D 00000086 12112 1 0 2 (NOTLB) >Call Trace: > [<c0113f5a>] io_schedule+0xe/0x18 > [<c0127654>] __lock_page+0x90/0xac > [<c0114694>] autoremove_wake_function+0x0/0x38 > [<c0114694>] autoremove_wake_function+0x0/0x38 > [<c01284cb>] filemap_nopage+0x16b/0x2ac > [<c01322d4>] do_no_page+0x78/0x2b4 > [<c013257d>e] handle_mm_fau+0x6d/0x10c > [<c0111cb7>] do_page_fault+0x137/0x414 > [<c0111b80>] do_page_fault+0x0/0x414 > [<c013e9aa>] __fput+0xe6/0x108 > [<c0133f01>] unmap_vma+0x69/0x70 > [<c0133f1c>] unmap_vma_list+0x14/0x20 > [<c013423b>] do_munmap+0x127/0x134 > [<c013428c>] sys_munmap+0x44/0x60 > [<c0108cbd>] error_code+0x2d/0x40 > Processes get sleep waiting for a page and never wake up. It doesn't seem to be an anticipatory scheduling problem but if you have time, try changing drivers/block/deadline-iosched.c static int antic_expire = HZ / 25; to static int antic_expire = 0; And see if you can reproduce. Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 got stuck during boot 2003-01-24 17:56 ` Nick Piggin @ 2003-01-24 19:18 ` Ed Tomlinson 0 siblings, 0 replies; 32+ messages in thread From: Ed Tomlinson @ 2003-01-24 19:18 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-mm On January 24, 2003 12:56 pm, Nick Piggin wrote: > Processes get sleep waiting for a page and never wake up. > It doesn't seem to be an anticipatory scheduling problem but > if you have time, try changing drivers/block/deadline-iosched.c > > static int antic_expire = HZ / 25; > to > static int antic_expire = 0; > > And see if you can reproduce. It boots with this change. Ed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.5.59-mm5 2003-01-24 3:50 2.5.59-mm5 Andrew Morton 2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel 2003-01-24 13:59 ` 2.5.59-mm5 got stuck during boot Helge Hafting @ 2003-01-25 8:33 ` Andres Salomon 2 siblings, 0 replies; 32+ messages in thread From: Andres Salomon @ 2003-01-25 8:33 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel My atyfb_base.c compile fix (from 2.5.54) still hasn't found its way into any of the main kernel trees. The original patch generates a reject against 2.5.59-mm5, so here's an updated patch. On Thu, 23 Jan 2003 19:50:44 -0800, Andrew Morton wrote: > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm5/ > > . -mm3 and -mm4 were not announced - they were sync-up patches as we > worked on the I/O scheduler. > > . -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler. > Here's the scoop: > [...] > > anticipatory_io_scheduling-2_5_59-mm3.patch > Subject: [PATCH] 2.5.59-mm3 antic io sched > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ --- a/drivers/video/aty/atyfb_base.c 2003-01-25 03:02:35.000000000 -0500 +++ b/drivers/video/aty/atyfb_base.c 2003-01-25 03:21:48.000000000 -0500 @@ -2587,12 +2587,12 @@ if (info->screen_base) iounmap((void *) info->screen_base); #ifdef __BIG_ENDIAN - if (info->cursor && par->cursor->ram) + if (par->cursor && par->cursor->ram) iounmap(par->cursor->ram); #endif #endif - if (info->cursor) - kfree(info->cursor); + if (par->cursor) + kfree(par->cursor); #ifdef __sparc__ if (par->mmap_map) kfree(par->mmap_map); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2003-01-26 4:04 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-01-24 3:50 2.5.59-mm5 Andrew Morton 2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel 2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton 2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas 2003-01-24 11:50 ` 2.5.59-mm5 Andrew Morton 2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas 2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton 2003-01-24 19:58 ` 2.5.59-mm5 Alex Tomas 2003-01-25 17:32 ` 2.5.59-mm5 Ed Tomlinson 2003-01-25 17:41 ` 2.5.59-mm5 Andrew Morton 2003-01-25 20:34 ` 2.5.59-mm5 Ed Tomlinson 2003-01-25 22:33 ` 2.5.59-mm5 Andrew Morton 2003-01-26 1:43 ` 2.5.59-mm5 Ed Tomlinson 2003-01-26 2:17 ` 2.5.59-mm5 Andrew Morton 2003-01-26 3:51 ` 2.5.59-mm5 Ed Tomlinson 2003-01-26 4:04 ` 2.5.59-mm5 Andrew Morton 2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron 2003-01-24 16:04 ` 2.5.59-mm5 Nick Piggin 2003-01-24 17:09 ` 2.5.59-mm5 Giuliano Pochini 2003-01-24 17:22 ` 2.5.59-mm5 Nick Piggin 2003-01-24 19:34 ` 2.5.59-mm5 Valdis.Kletnieks 2003-01-24 20:04 ` 2.5.59-mm5 Jens Axboe 2003-01-24 22:02 ` 2.5.59-mm5 Valdis.Kletnieks 2003-01-25 12:28 ` 2.5.59-mm5 Jens Axboe 2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov 2003-01-24 16:00 ` 2.5.59-mm5 Nick Piggin 2003-01-24 11:23 ` 2.5.59-mm5 Jens Axboe 2003-01-24 13:59 ` 2.5.59-mm5 got stuck during boot Helge Hafting 2003-01-24 17:44 ` Ed Tomlinson 2003-01-24 17:56 ` Nick Piggin 2003-01-24 19:18 ` Ed Tomlinson 2003-01-25 8:33 ` 2.5.59-mm5 Andres Salomon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox