* 2.5.34-mm4 @ 2002-09-14 4:06 Andrew Morton 2002-09-14 4:01 ` 2.5.34-mm4 Rik van Riel 2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth 0 siblings, 2 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-14 4:06 UTC (permalink / raw) To: lkml, linux-mm, lse-tech url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/ Some additional work has been performed on the new, faster sleep/wakeup facilities. I have converted TCP/IPV4 over to use the faster wakeups. It would be appreciated if the people who are interested in (and set up for testing) high performance networking could test this out. Note however that there is no benefit to select()/poll(). That's quite a large change. So please bear in mind that this code will only help if applications are generally sleeping in accept(), connect(), etc. At this stage I'd like to know whether this work is generally something which should be pursued further - let's be careful that the measurements are not swamped by select()/poll() wakeups. The individual patches are: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/wake-speedup.patch http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/tcp-wakeups.patch These apply against 2.5.26 and possibly earlier, and testing against earlier kernels would be valid. Thanks. Changes have been made to /proc/stat which break top(1) and vmstat(1). New versions are available at http://www.zip.com.au/~akpm/linux/patches/procps-2.5.34-mm4.tar.gz and newer versions will appear at http://surriel.com/procps/ +aio-sync-iocb.patch Ben's AIO patch conflicted with the readv/writev patch. This is Ben's patch reworked to fit on top of readv-writev.patch +pagevec_lru_add.patch Fix a bogon which broke reiserfs4 +taka-writev.patch Hirokazu Takahashi's writev() speedup. +vm-wakeups.patch Use the auto waitqueues in the VM and block layers. Broken out of the wake-speedup patch. +per-node-kswapd.patch David Hansen's per-NUMA-node kswapd patch. +topology-api.patch Matthew Dobson's topology API. +kswapd-reclaim-stats.patch Add `kswapd_steal' and `pgrefill' to /proc/vmstat. The former indicates that, on a quick test, 99% of page reclaim is being performed by kswapd. +iowait.patch Instrumentation to show how much time is spent in disk wait. (Doesn't appear to come out in the new top(1) though?) +tcp-wakeups.patch Use auto-waitqueues in TCP/IPV4 linus.patch cset-1.568.19.4-to-1.661.txt.gz scsi_hack.patch Fix block-highmem for scsi ext3-htree.patch Indexed directories for ext3 spin-lock-check.patch spinlock/rwlock checking infrastructure rd-cleanup.patch Cleanup and fix the ramdisk driver (doesn't work right yet) readv-writev.patch O_DIRECT support for readv/writev aio-sync-iocb.patch Use a sync iocb for generic_file_read llzpr.patch Reduce scheduling latency across zap_page_range buffermem.patch Resurrect buffermem accounting lpp.patch ia32 huge tlb pages lpp-update.patch hugetlbpage fixes reversemaps-leak.patch Fix reverse map accounting leak sharedmem.patch Add /proc/meminfo:Mapped - tha amount of memory which is mapped into pagetables ext3-sb.patch u.ext3_sb -> generic_sbp pagevec_lru_add.patch Run readpage before dropping the page refcount oom-fix.patch Fix an OOM condition on big highmem machines tlb-cleanup.patch Clean up the tlb gather code dump-stack.patch arch-neutral dump_stack() function wli-cleanup.patch random cleanups madvise-move.patch move mdavise implementation into mm/madvise.c split-vma.patch VMA splitting patch mmap-fixes.patch mmap.c cleanup and lock ranking fixes buffer-ops-move.patch Move submit_bh() and ll_rw_block() into fs/buffer.c slab-stats.patch Display total slab memory in /proc/meminfo writeback-control.patch Cleanup and extension of the writeback paths free_area_init-cleanup.patch free_area_init() code cleanup alloc_pages-cleanup.patch alloc_pages cleanup and optimisation statm_pgd_range-sucks.patch Remove the pagetable walk from /proc/stat remove-sync_thresh.patch Remove /proc/sys/vm/dirty_sync_thresh taka-writev.patch Speed up writev pf_nowarn.patch Fix up the handling of PF_NOWARN jeremy.patch Spel Jermy's naim wright queue-congestion.patch Infrastructure for communicating request queue congestion to the VM nonblocking-ext2-preread.patch avoid ext2 inode prereads if the queue is congested nonblocking-pdflush.patch non-blocking writeback infrastructure, use it for pdflush nonblocking-vm.patch Non-blocking page reclaim wake-speedup.patch Faster wakeup code vm-wakeups.patch Use the faster wakeups in the VM and block layers sync-helper.patch Speed up sys_sync() against multiple spindles slabasap.patch Early and smarter shrinking of slabs write-deadlock.patch Fix the generic_file_write-from-same-mmapped-page deadlock buddyinfo.patch Add /proc/buddyinfo - stats on the free pages pool free_area.patch Remove struct free_area_struct and free_area_t, use `struct free_area' per-node-kswapd.patch Per-node kswapd instance topology-api.patch NUMA topology API radix_tree_gang_lookup.patch radix tree gang lookup truncate_inode_pages.patch truncate/invalidate_inode_pages rewrite proc_vmstat.patch Move the vm accounting out of /proc/stat kswapd-reclaim-stats.patch Add kswapd_steal to /proc/vmstat iowait.patch I/O wait statistics tcp-wakeups.patch Use fast wakeups in TCP/IPV4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-14 4:06 2.5.34-mm4 Andrew Morton @ 2002-09-14 4:01 ` Rik van Riel 2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth 1 sibling, 0 replies; 18+ messages in thread From: Rik van Riel @ 2002-09-14 4:01 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm, lse-tech On Fri, 13 Sep 2002, Andrew Morton wrote: > +iowait.patch > > Instrumentation to show how much time is spent in disk wait. (Doesn't > appear to come out in the new top(1) though?) Will add it now that you're shipping it again. Note that this will be available as patches on my home page and from my bk tree only for now. I'll merge the needed patches into the main procps tree once this stuff gets merged into the kernel. Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-14 4:06 2.5.34-mm4 Andrew Morton 2002-09-14 4:01 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 10:50 ` Axel Siebenwirth 2002-09-15 14:31 ` 2.5.34-mm4 Rik van Riel 2002-09-15 17:41 ` 2.5.34-mm4 Andrew Morton 1 sibling, 2 replies; 18+ messages in thread From: Axel Siebenwirth @ 2002-09-15 10:50 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm, lse-tech Hi Andrew! On Fri, 13 Sep 2002, Andrew Morton wrote: > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/ With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of quite unresponsive behaviour. For example I am building X which at that special moment causes pretty heavy disk load and the system doesn't respond at all. I was using X and was not able to switch consoles or move mouse only extremely sluggish. I have seen that it used more swap that usual. total used free shared buffers cached Mem: 191096 159340 31756 0 10568 94100 -/+ buffers/cache: 54672 136424 Swap: 289160 0 289160 This is how it looks like under normal circumstances and when building X I had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong. Unfortunately I was not able to start vmstat, first because I can't start vmstat when system is not responding and second it doesn't work anyway because of your changes. Best regards, Axel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth @ 2002-09-15 14:31 ` Rik van Riel 2002-09-16 18:33 ` 2.5.34-mm4 Bill Davidsen 2002-09-15 17:41 ` 2.5.34-mm4 Andrew Morton 1 sibling, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-15 14:31 UTC (permalink / raw) To: Axel Siebenwirth; +Cc: Andrew Morton, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Axel Siebenwirth wrote: > On Fri, 13 Sep 2002, Andrew Morton wrote: > > > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/ > > With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of > quite unresponsive behaviour. Don't worry, it's supposed to do that. You can't measure desktop interactivity, so it doesn't exist ;) Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 14:31 ` 2.5.34-mm4 Rik van Riel @ 2002-09-16 18:33 ` Bill Davidsen 0 siblings, 0 replies; 18+ messages in thread From: Bill Davidsen @ 2002-09-16 18:33 UTC (permalink / raw) To: Rik van Riel; +Cc: lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Rik van Riel wrote: > On Sun, 15 Sep 2002, Axel Siebenwirth wrote: > > On Fri, 13 Sep 2002, Andrew Morton wrote: > > > > > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/ > > > > With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of > > quite unresponsive behaviour. > > Don't worry, it's supposed to do that. You can't measure desktop > interactivity, so it doesn't exist ;) But now we have `contest' and we can, so it does. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth 2002-09-15 14:31 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 17:41 ` Andrew Morton 2002-09-15 17:36 ` 2.5.34-mm4 Rik van Riel 2002-09-15 17:39 ` 2.5.34-mm4 Rik van Riel 1 sibling, 2 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-15 17:41 UTC (permalink / raw) To: Axel Siebenwirth, Con Kolivas; +Cc: lkml, linux-mm, lse-tech Axel Siebenwirth wrote: > > Hi Andrew! > > On Fri, 13 Sep 2002, Andrew Morton wrote: > > > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/ > > With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of > quite unresponsive behaviour. For example I am building X which at that > special moment causes pretty heavy disk load and the system doesn't respond > at all. I was using X and was not able to switch consoles or move mouse only > extremely sluggish. There are large IDE updates in -mm4, and this is consistent with a disk which isn't doing DMA any more. Could you (and Con) please double-check with `hdparm -i' and `hdparm -t' that the disk subsystem is behaving properly? Yes, it could well be a VM bug, but I wouldn't want to run round in confused circles all day ;) Thanks. > I have seen that it used more swap that usual. 2.5 is much more swaphappy than 2.4. I believe that this is actually correct behaviour for optimum throughput. But it just happens that people (me included) hate it. We don't notice the improved runtimes for the pagecache-intensive operations but we do notice the time it takes to get the xterms working again. We have not yet sat down and worked out what to do about this. > total used free shared buffers cached > Mem: 191096 159340 31756 0 10568 94100 > -/+ buffers/cache: 54672 136424 > Swap: 289160 0 289160 > > This is how it looks like under normal circumstances and when building X I > had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong. > Unfortunately I was not able to start vmstat, first because I can't start > vmstat when system is not responding and second it doesn't work anyway > because of your changes. > Yeah, sorry. The burden of back-compatibility weighed too heavy and Rik decided that we just have to fix userspace to follow kernel changes. There will be breakage for a while; updates are at http://surriel.com/procps/. Unfortunately, those updates cause odd-but-not-serious things to happen to Red Hat initscripts. This happens when you install standard util-linux as well. It is due to the initscripts passing in arguments which the standard tools do not understand. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 17:41 ` 2.5.34-mm4 Andrew Morton @ 2002-09-15 17:36 ` Rik van Riel 2002-09-15 17:39 ` 2.5.34-mm4 Rik van Riel 1 sibling, 0 replies; 18+ messages in thread From: Rik van Riel @ 2002-09-15 17:36 UTC (permalink / raw) To: Andrew Morton; +Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Andrew Morton wrote: > Unfortunately, those updates cause odd-but-not-serious things to > happen to Red Hat initscripts. This happens when you install standard > util-linux as well. It is due to the initscripts passing in arguments > which the standard tools do not understand. I'm about to add all patches from the RH procps rpm to the procps cvs tree, so this should go away soon. Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 17:41 ` 2.5.34-mm4 Andrew Morton 2002-09-15 17:36 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 17:39 ` Rik van Riel 2002-09-15 17:49 ` 2.5.34-mm4 M. Edward Borasky 1 sibling, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-15 17:39 UTC (permalink / raw) To: Andrew Morton; +Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Andrew Morton wrote: > Axel Siebenwirth wrote: > > I have seen that it used more swap that usual. > > 2.5 is much more swaphappy than 2.4. I believe that this is actually > correct behaviour for optimum throughput. But it just happens that > people (me included) hate it. Time for a corollary to "if you can't measure it, it doesn't exist": "If you can't measure desktop performance, our method of development will ensure it won't exist" cheers, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: 2.5.34-mm4 2002-09-15 17:39 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 17:49 ` M. Edward Borasky 2002-09-15 17:54 ` 2.5.34-mm4 Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: M. Edward Borasky @ 2002-09-15 17:49 UTC (permalink / raw) To: Rik van Riel, Andrew Morton Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech Borasky's Corollary 1: If you *can* measure it and it *does* exist, the cheapest solution may still be to buy more memory, more disks or a faster processor. Borasky's Corollary 2: When you try to measure the performance of people the way you measure performance of computers, you need psychological help. M. Edward (Ed) Borasky mailto: znmeb@borasky-research.net http://www.pdxneurosemantics.com http://www.meta-trading-coach.com http://www.borasky-research.net Coaching: It's Not Just for Athletes and Executives Any More! -----Original Message----- From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org]On Behalf Of Rik van Riel Sent: Sunday, September 15, 2002 10:39 AM To: Andrew Morton Cc: Axel Siebenwirth; Con Kolivas; lkml; linux-mm@kvack.org; lse-tech@lists.sourceforge.net Subject: Re: 2.5.34-mm4 On Sun, 15 Sep 2002, Andrew Morton wrote: > Axel Siebenwirth wrote: > > I have seen that it used more swap that usual. > > 2.5 is much more swaphappy than 2.4. I believe that this is actually > correct behaviour for optimum throughput. But it just happens that > people (me included) hate it. Time for a corollary to "if you can't measure it, it doesn't exist": "If you can't measure desktop performance, our method of development will ensure it won't exist" cheers, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: 2.5.34-mm4 2002-09-15 17:49 ` 2.5.34-mm4 M. Edward Borasky @ 2002-09-15 17:54 ` Rik van Riel 2002-09-15 18:55 ` 2.5.34-mm4 Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-15 17:54 UTC (permalink / raw) To: M. Edward Borasky Cc: Andrew Morton, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, M. Edward Borasky wrote: > Borasky's Corollary 1: If you *can* measure it and it *does* exist, the > cheapest solution may still be to buy more memory, more disks or a > faster processor. Current 2.5 is sluggish on systems with a fast CPU and 768 MB of RAM, whereas current -ac runs the same workload smoothly with 128 MB of RAM. Now tell me, what's your point ? Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 17:54 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 18:55 ` Andrew Morton 2002-09-15 18:56 ` 2.5.34-mm4 Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Andrew Morton @ 2002-09-15 18:55 UTC (permalink / raw) To: Rik van Riel Cc: M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech Rik van Riel wrote: > > On Sun, 15 Sep 2002, M. Edward Borasky wrote: > > > Borasky's Corollary 1: If you *can* measure it and it *does* exist, the > > cheapest solution may still be to buy more memory, more disks or a > > faster processor. > > Current 2.5 is sluggish on systems with a fast CPU and 768 MB > of RAM, whereas current -ac runs the same workload smoothly > with 128 MB of RAM. > I've been running 2.5 on my desktop at work (800MHz/256M UP) since 2.5.26 and on the machine at home (Dual 850MHz/768M) on-and-off (recent freizures sent that machine back to Marcelo; need to try again). I also ran 2.4.19-ac-something for a couple of weeks. Impressions are: - 2.5 swaps a lot in response to heavy pagecache activity. SEGQ didn't change that, actually. And this is correct, as-designed behaviour. We'll need some "don't be irritating" knob to prevent this. Or speculative pagein when the load has subsided, which would be a fair-sized project. - In both -ac and 2.5 the scheduler is prone to starving interactive applications (netscape 4, gkrellm, command-line gdb, others) when there is a compilation happening. This is very, very noticeable; and it afects applications which do not use sched_yield(). Ingo has put some extra stuff in since then and I need to retest. - In -ac, there are noticeable stalls during heavy writeout. This may be an ext3 thing, but I can't think of any IO scheduling differences in -ac ext3. I'd be guessing that it is due to bdflush/kupdate lumpiness. Overall I find Marcelo kernels to be the most comfortable, followed by 2.5. Alan's kernels I find to be the least comfortable in a "developer's desktop" situation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 18:55 ` 2.5.34-mm4 Andrew Morton @ 2002-09-15 18:56 ` Rik van Riel 2002-09-16 1:33 ` 2.5.34-mm4 Alan Cox 2002-09-15 19:10 ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen 2002-09-16 18:48 ` 2.5.34-mm4 Bill Davidsen 2 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2002-09-15 18:56 UTC (permalink / raw) To: Andrew Morton Cc: M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Andrew Morton wrote: > - In -ac, there are noticeable stalls during heavy writeout. This > may be an ext3 thing, but I can't think of any IO scheduling > differences in -ac ext3. I'd be guessing that it is due to > bdflush/kupdate lumpiness. This is also due to the fact that -ac has an older -rmap VM. As in current 2.5, rmap can write out all inactive pages ... and it did in some worst case situations. This is fixed in rmap14. (I hope Alan is done playing with IDE soon so I can push him a VM update) regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 18:56 ` 2.5.34-mm4 Rik van Riel @ 2002-09-16 1:33 ` Alan Cox 2002-09-16 2:32 ` [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Alan Cox @ 2002-09-16 1:33 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Morton, M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech On Sun, 2002-09-15 at 19:56, Rik van Riel wrote: > On Sun, 15 Sep 2002, Andrew Morton wrote: > > > - In -ac, there are noticeable stalls during heavy writeout. This > > may be an ext3 thing, but I can't think of any IO scheduling > > differences in -ac ext3. I'd be guessing that it is due to > > bdflush/kupdate lumpiness. I think so. I've always been conservative, I need rmap to pass cerberus still. But the rmap in -ac is out of date a little with the 2.5 tuning > This is also due to the fact that -ac has an older -rmap > VM. As in current 2.5, rmap can write out all inactive > pages ... and it did in some worst case situations. > > This is fixed in rmap14. > > (I hope Alan is done playing with IDE soon so I can push > him a VM update) The big one left to fix is the simplex device bug - which is an "I know why". The great mystery is the affair of taskfile pio write. Other than that its annoying glitches not big problems now. So send me rmap-14a patches by all means -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) 2002-09-16 1:33 ` 2.5.34-mm4 Alan Cox @ 2002-09-16 2:32 ` Rik van Riel 0 siblings, 0 replies; 18+ messages in thread From: Rik van Riel @ 2002-09-16 2:32 UTC (permalink / raw) To: Alan Cox; +Cc: linux-mm, lkml On 16 Sep 2002, Alan Cox wrote: > So send me rmap-14a patches by all means Here they come. This first patch updates 2.4.20-pre5-ac6 to rmap14. An incremental patch to rmap14a + misc bugfixes will be in your mailbox in a few minutes... Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ Spamtraps of the month: september@surriel.com trac@trac.org --- linux-2.4.19-pre2-ac3/mm/filemap.c.rmap13b 2002-08-15 23:53:06.000000000 -0300 +++ linux-2.4.19-pre2-ac3/mm/filemap.c 2002-08-15 23:56:37.000000000 -0300 @@ -237,12 +237,11 @@ static void truncate_complete_page(struct page *page) { - /* Page has already been removed from processes, by vmtruncate() */ - if (page->pte_chain) - BUG(); - - /* Leave it on the LRU if it gets converted into anonymous buffers */ - if (!page->buffers || do_flushpage(page, 0)) + /* + * Leave it on the LRU if it gets converted into anonymous buffers + * or anonymous process memory. + */ + if ((!page->buffers || do_flushpage(page, 0)) && !page->pte_chain) lru_cache_del(page); /* --- linux-2.4.19-pre2-ac3/mm/memory.c.rmap13b 2002-08-15 23:53:14.000000000 -0300 +++ linux-2.4.19-pre2-ac3/mm/memory.c 2002-08-15 23:59:04.000000000 -0300 @@ -380,49 +380,65 @@ return freed; } -/* - * remove user pages in a given range. +#define ZAP_BLOCK_SIZE (256 * PAGE_SIZE) + +/** + * zap_page_range - remove user pages in a given range + * @mm: mm_struct containing the applicable pages + * @address: starting address of pages to zap + * @size: number of bytes to zap */ void zap_page_range(struct mm_struct *mm, unsigned long address, unsigned long size) { mmu_gather_t *tlb; pgd_t * dir; - unsigned long start = address, end = address + size; - int freed = 0; - - dir = pgd_offset(mm, address); - + unsigned long start, end, addr, block; + int freed; + /* - * This is a long-lived spinlock. That's fine. - * There's no contention, because the page table - * lock only protects against kswapd anyway, and - * even if kswapd happened to be looking at this - * process we _want_ it to get stuck. + * Break the work up into blocks of ZAP_BLOCK_SIZE pages: + * this decreases lock-hold time for the page_table_lock + * dramatically, which could otherwise be held for a very + * long time. This decreases lock contention and increases + * periods of preemptibility. */ - if (address >= end) - BUG(); - spin_lock(&mm->page_table_lock); - flush_cache_range(mm, address, end); - tlb = tlb_gather_mmu(mm); + while (size) { + if (size > ZAP_BLOCK_SIZE) + block = ZAP_BLOCK_SIZE; + else + block = size; + + freed = 0; + start = addr = address; + end = address + block; + dir = pgd_offset(mm, address); - do { - freed += zap_pmd_range(tlb, dir, address, end - address); - address = (address + PGDIR_SIZE) & PGDIR_MASK; - dir++; - } while (address && (address < end)); + BUG_ON(address >= end); - /* this will flush any remaining tlb entries */ - tlb_finish_mmu(tlb, start, end); + spin_lock(&mm->page_table_lock); + flush_cache_range(mm, start, end); + tlb = tlb_gather_mmu(mm); - /* - * Update rss for the mm_struct (not necessarily current->mm) - * Notice that rss is an unsigned long. - */ - if (mm->rss > freed) - mm->rss -= freed; - else - mm->rss = 0; - spin_unlock(&mm->page_table_lock); + do { + freed += zap_pmd_range(tlb, dir, addr, end - addr); + addr = (addr + PGDIR_SIZE) & PGDIR_MASK; + dir++; + } while (addr && (addr < end)); + + /* this will flush any remaining tlb entries */ + tlb_finish_mmu(tlb, start, end); + + /* Update rss for the mm_struct (need not be current->mm) */ + if (mm->rss > freed) + mm->rss -= freed; + else + mm->rss = 0; + + spin_unlock(&mm->page_table_lock); + + address += block; + size -= block; + } } /* @@ -873,18 +889,19 @@ static inline int remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address, unsigned long size, unsigned long phys_addr, pgprot_t prot) { - unsigned long end; + unsigned long base, end; + base = address & PGDIR_MASK; address &= ~PGDIR_MASK; end = address + size; if (end > PGDIR_SIZE) end = PGDIR_SIZE; phys_addr -= address; do { - pte_t * pte = pte_alloc(mm, pmd, address); + pte_t * pte = pte_alloc(mm, pmd, address + base); if (!pte) return -ENOMEM; - remap_pte_range(pte, address, end - address, address + phys_addr, prot); + remap_pte_range(pte, base + address, end - address, address + phys_addr, prot); address = (address + PMD_SIZE) & PMD_MASK; pmd++; } while (address && (address < end)); --- linux-2.4.19-pre2-ac3/mm/vmscan.c.rmap13b 2002-08-15 23:53:26.000000000 -0300 +++ linux-2.4.19-pre2-ac3/mm/vmscan.c 2002-08-15 23:59:04.000000000 -0300 @@ -195,6 +195,7 @@ * page_launder_zone - clean dirty inactive pages, move to inactive_clean list * @zone: zone to free pages in * @gfp_mask: what operations we are allowed to do + * @full_flush: full-out page flushing, if we couldn't get enough clean pages * * This function is called when we are low on free / inactive_clean * pages, its purpose is to refill the free/clean list as efficiently @@ -208,19 +209,30 @@ * This code is heavily inspired by the FreeBSD source code. Thanks * go out to Matthew Dillon. */ -#define CAN_DO_FS ((gfp_mask & __GFP_FS) && should_write) -int page_launder_zone(zone_t * zone, int gfp_mask, int priority) +int page_launder_zone(zone_t * zone, int gfp_mask, int full_flush) { - int maxscan, cleaned_pages, target; - struct list_head * entry; + int maxscan, cleaned_pages, target, maxlaunder, iopages; + struct list_head * entry, * next; target = free_plenty(zone); - cleaned_pages = 0; + cleaned_pages = iopages = 0; + + /* If we can get away with it, only flush 2 MB worth of dirty pages */ + if (full_flush) + maxlaunder = 1000000; + else { + maxlaunder = min_t(int, 512, zone->inactive_dirty_pages / 4); + maxlaunder = max(maxlaunder, free_plenty(zone)); + } /* The main launder loop. */ +rescan: spin_lock(&pagemap_lru_lock); - maxscan = zone->inactive_dirty_pages >> priority; - while (maxscan-- && !list_empty(&zone->inactive_dirty_list)) { + maxscan = zone->inactive_dirty_pages; + entry = zone->inactive_dirty_list.prev; + next = entry->prev; + while (maxscan-- && !list_empty(&zone->inactive_dirty_list) && + next != &zone->inactive_dirty_list) { struct page * page; /* Low latency reschedule point */ @@ -231,14 +243,20 @@ continue; } - entry = zone->inactive_dirty_list.prev; + entry = next; + next = entry->prev; page = list_entry(entry, struct page, lru); + /* This page was removed while we looked the other way. */ + if (!PageInactiveDirty(page)) + goto rescan; + if (cleaned_pages > target) break; - list_del(entry); - list_add(entry, &zone->inactive_dirty_list); + /* Stop doing IO if we've laundered too many pages already. */ + if (maxlaunder < 0) + gfp_mask &= ~(__GFP_IO|__GFP_FS); /* Wrong page on list?! (list corruption, should not happen) */ if (!PageInactiveDirty(page)) { @@ -257,7 +275,6 @@ /* * The page is locked. IO in progress? - * Move it to the back of the list. * Acquire PG_locked early in order to safely * access page->mapping. */ @@ -341,10 +358,16 @@ spin_unlock(&pagemap_lru_lock); writepage(page); + maxlaunder--; page_cache_release(page); spin_lock(&pagemap_lru_lock); continue; + } else { + UnlockPage(page); + list_del(entry); + list_add(entry, &zone->inactive_dirty_list); + continue; } } @@ -391,6 +414,7 @@ /* failed to drop the buffers so stop here */ UnlockPage(page); page_cache_release(page); + maxlaunder--; spin_lock(&pagemap_lru_lock); continue; @@ -443,21 +467,19 @@ */ int page_launder(int gfp_mask) { - int maxtry = 1 << DEF_PRIORITY; struct zone_struct * zone; int freed = 0; /* Global balancing while we have a global shortage. */ - while (maxtry-- && free_high(ALL_ZONES) >= 0) { + if (free_high(ALL_ZONES) >= 0) for_each_zone(zone) if (free_plenty(zone) >= 0) - freed += page_launder_zone(zone, gfp_mask, 6); - } + freed += page_launder_zone(zone, gfp_mask, 0); /* Clean up the remaining zones with a serious shortage, if any. */ for_each_zone(zone) if (free_min(zone) >= 0) - freed += page_launder_zone(zone, gfp_mask, 0); + freed += page_launder_zone(zone, gfp_mask, 1); return freed; } @@ -814,6 +836,7 @@ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(HZ / 4); kswapd_overloaded = 0; + wmb(); return; } --- linux-2.4.19-pre2-ac3/include/linux/mm.h.rmap13b 2002-08-15 23:52:54.000000000 -0300 +++ linux-2.4.19-pre2-ac3/include/linux/mm.h 2002-08-16 00:01:31.000000000 -0300 @@ -344,15 +344,19 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ +#ifdef CONFIG_SMP while (test_and_set_bit(PG_chainlock, &page->flags)) { while (test_bit(PG_chainlock, &page->flags)) cpu_relax(); } +#endif } static inline void pte_chain_unlock(struct page *page) { +#ifdef CONFIG_SMP clear_bit(PG_chainlock, &page->flags); +#endif } /* --- linux-2.4.19-pre2-ac3/include/linux/mmzone.h.rmap13b 2002-08-15 23:53:00.000000000 -0300 +++ linux-2.4.19-pre2-ac3/include/linux/mmzone.h 2002-08-16 00:01:31.000000000 -0300 @@ -27,8 +27,6 @@ struct pglist_data; struct pte_chain; -#define MAX_CHUNKS_PER_NODE 8 - /* * On machines where it is needed (eg PCs) we divide physical memory * into multiple physical zones. On a PC we have 3 zones: -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Lse-tech] Re: 2.5.34-mm4 2002-09-15 18:55 ` 2.5.34-mm4 Andrew Morton 2002-09-15 18:56 ` 2.5.34-mm4 Rik van Riel @ 2002-09-15 19:10 ` Andi Kleen 2002-09-16 18:51 ` Bill Davidsen 2002-09-16 18:48 ` 2.5.34-mm4 Bill Davidsen 2 siblings, 1 reply; 18+ messages in thread From: Andi Kleen @ 2002-09-15 19:10 UTC (permalink / raw) To: Andrew Morton Cc: Rik van Riel, M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech > Overall I find Marcelo kernels to be the most comfortable, followed > by 2.5. Alan's kernels I find to be the least comfortable in a ... and -aa kernels are marcelo kernels, just with the the corner cases fixed too. Works very nicely here. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Lse-tech] Re: 2.5.34-mm4 2002-09-15 19:10 ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen @ 2002-09-16 18:51 ` Bill Davidsen 2002-09-19 9:01 ` Jens Axboe 0 siblings, 1 reply; 18+ messages in thread From: Bill Davidsen @ 2002-09-16 18:51 UTC (permalink / raw) To: Andi Kleen; +Cc: lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Andi Kleen wrote: > > Overall I find Marcelo kernels to be the most comfortable, followed > > by 2.5. Alan's kernels I find to be the least comfortable in a > > ... and -aa kernels are marcelo kernels, just with the the corner > cases fixed too. Works very nicely here. Corner cases? The IDE, VM and scheduler are different... -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Lse-tech] Re: 2.5.34-mm4 2002-09-16 18:51 ` Bill Davidsen @ 2002-09-19 9:01 ` Jens Axboe 0 siblings, 0 replies; 18+ messages in thread From: Jens Axboe @ 2002-09-19 9:01 UTC (permalink / raw) To: Bill Davidsen; +Cc: Andi Kleen, lkml, linux-mm, lse-tech On Mon, Sep 16 2002, Bill Davidsen wrote: > On Sun, 15 Sep 2002, Andi Kleen wrote: > > > > Overall I find Marcelo kernels to be the most comfortable, followed > > > by 2.5. Alan's kernels I find to be the least comfortable in a > > > > ... and -aa kernels are marcelo kernels, just with the the corner > > cases fixed too. Works very nicely here. > > Corner cases? The IDE, VM and scheduler are different... The IDE is the same, I'll refrain from commenting on the rest. There's just an adjustment to the read ahead, which makes sense. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 2.5.34-mm4 2002-09-15 18:55 ` 2.5.34-mm4 Andrew Morton 2002-09-15 18:56 ` 2.5.34-mm4 Rik van Riel 2002-09-15 19:10 ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen @ 2002-09-16 18:48 ` Bill Davidsen 2 siblings, 0 replies; 18+ messages in thread From: Bill Davidsen @ 2002-09-16 18:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, lkml, linux-mm, lse-tech On Sun, 15 Sep 2002, Andrew Morton wrote: > Impressions are: > > - 2.5 swaps a lot in response to heavy pagecache activity. > > SEGQ didn't change that, actually. And this is correct, > as-designed behaviour. We'll need some "don't be irritating" > knob to prevent this. Or speculative pagein when the load > has subsided, which would be a fair-sized project. It would be nice to have a knob in /proc/sys which could be tuned for response or throughput, Preferably not a boolean;-) I suspect that we would have lack of agreement on what that would do, but it sure would be nice! > - In both -ac and 2.5 the scheduler is prone to starving interactive > applications (netscape 4, gkrellm, command-line gdb, others) when > there is a compilation happening. > > This is very, very noticeable; and it afects applications which > do not use sched_yield(). Ingo has put some extra stuff in since > then and I need to retest. > > - In -ac, there are noticeable stalls during heavy writeout. This > may be an ext3 thing, but I can't think of any IO scheduling > differences in -ac ext3. I'd be guessing that it is due to > bdflush/kupdate lumpiness. I have the feeling that 2.5 is less good about noting that a file is open for write only and no seeks have been done. I haven't measured it, but it would seem that writes to such a file would be better on the disk and not taking buffers, since they're probably not going to be read. This is just based on running mkisofs on 2.4.19 and 2.5.34, a watching "no disk activity" followed by a heavy burst. I haven't made any careful measurement, so take this as you will, but I agree that heavy write bogs the system. Clearly with big memory I can/do get the whole ~700MB in memory if writes don't start quickly. Yes, that could be tuning, I know that. > Overall I find Marcelo kernels to be the most comfortable, followed > by 2.5. Alan's kernels I find to be the least comfortable in a > "developer's desktop" situation. On small memory machines I don't see as much to choose, and the -ck series has been very nice to me. I don't run 2.5 on any but test machines, and both are big memory (1+GB) machines. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2002-09-19 9:01 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-09-14 4:06 2.5.34-mm4 Andrew Morton 2002-09-14 4:01 ` 2.5.34-mm4 Rik van Riel 2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth 2002-09-15 14:31 ` 2.5.34-mm4 Rik van Riel 2002-09-16 18:33 ` 2.5.34-mm4 Bill Davidsen 2002-09-15 17:41 ` 2.5.34-mm4 Andrew Morton 2002-09-15 17:36 ` 2.5.34-mm4 Rik van Riel 2002-09-15 17:39 ` 2.5.34-mm4 Rik van Riel 2002-09-15 17:49 ` 2.5.34-mm4 M. Edward Borasky 2002-09-15 17:54 ` 2.5.34-mm4 Rik van Riel 2002-09-15 18:55 ` 2.5.34-mm4 Andrew Morton 2002-09-15 18:56 ` 2.5.34-mm4 Rik van Riel 2002-09-16 1:33 ` 2.5.34-mm4 Alan Cox 2002-09-16 2:32 ` [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel 2002-09-15 19:10 ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen 2002-09-16 18:51 ` Bill Davidsen 2002-09-19 9:01 ` Jens Axboe 2002-09-16 18:48 ` 2.5.34-mm4 Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox