Re: 2.5.34-mm4

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.5.34-mm4
  2002-09-14  4:06 2.5.34-mm4 Andrew Morton
@ 2002-09-14  4:01 ` Rik van Riel
  2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
  1 sibling, 0 replies; 18+ messages in thread
From: Rik van Riel @ 2002-09-14  4:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm, lse-tech

On Fri, 13 Sep 2002, Andrew Morton wrote:

> +iowait.patch
>
>  Instrumentation to show how much time is spent in disk wait.  (Doesn't
>  appear to come out in the new top(1) though?)

Will add it now that you're shipping it again.  Note that this
will be available as patches on my home page and from my bk
tree only for now.  I'll merge the needed patches into the main
procps tree once this stuff gets merged into the kernel.

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* 2.5.34-mm4
@ 2002-09-14  4:06 Andrew Morton
  2002-09-14  4:01 ` 2.5.34-mm4 Rik van Riel
  2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2002-09-14  4:06 UTC (permalink / raw)
  To: lkml, linux-mm, lse-tech

url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/

Some additional work has been performed on the new, faster
sleep/wakeup facilities.

I have converted TCP/IPV4 over to use the faster wakeups.  It would
be appreciated if the people who are interested in (and set up for
testing) high performance networking could test this out.  Note
however that there is no benefit to select()/poll().  That's quite
a large change.

So please bear in mind that this code will only help if applications
are generally sleeping in accept(), connect(), etc.  At this stage
I'd like to know whether this work is generally something which should be
pursued further - let's be careful that the measurements are not
swamped by select()/poll() wakeups.

The individual patches are:

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/wake-speedup.patch
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/tcp-wakeups.patch

These apply against 2.5.26 and possibly earlier, and testing against
earlier kernels would be valid.  Thanks.



Changes have been made to /proc/stat which break top(1) and vmstat(1).
New versions are available at
http://www.zip.com.au/~akpm/linux/patches/procps-2.5.34-mm4.tar.gz
and newer versions will appear at
http://surriel.com/procps/

+aio-sync-iocb.patch

 Ben's AIO patch conflicted with the readv/writev patch.  This is
 Ben's patch reworked to fit on top of readv-writev.patch

+pagevec_lru_add.patch

 Fix a bogon which broke reiserfs4

+taka-writev.patch

 Hirokazu Takahashi's writev() speedup.

+vm-wakeups.patch

 Use the auto waitqueues in the VM and block layers.  Broken out of
 the wake-speedup patch.

+per-node-kswapd.patch

 David Hansen's per-NUMA-node kswapd patch.

+topology-api.patch

 Matthew Dobson's topology API.

+kswapd-reclaim-stats.patch

 Add `kswapd_steal' and `pgrefill' to /proc/vmstat.  The former indicates
 that, on a quick test, 99% of page reclaim is being performed by kswapd.

+iowait.patch

 Instrumentation to show how much time is spent in disk wait.  (Doesn't
 appear to come out in the new top(1) though?)

+tcp-wakeups.patch

 Use auto-waitqueues in TCP/IPV4




linus.patch
  cset-1.568.19.4-to-1.661.txt.gz

scsi_hack.patch
  Fix block-highmem for scsi

ext3-htree.patch
  Indexed directories for ext3

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

readv-writev.patch
  O_DIRECT support for readv/writev

aio-sync-iocb.patch
  Use a sync iocb for generic_file_read

llzpr.patch
  Reduce scheduling latency across zap_page_range

buffermem.patch
  Resurrect buffermem accounting

lpp.patch
  ia32 huge tlb pages

lpp-update.patch
  hugetlbpage fixes

reversemaps-leak.patch
  Fix reverse map accounting leak

sharedmem.patch
  Add /proc/meminfo:Mapped - tha amount of memory which is mapped into pagetables

ext3-sb.patch
  u.ext3_sb -> generic_sbp

pagevec_lru_add.patch
  Run readpage before dropping the page refcount

oom-fix.patch
  Fix an OOM condition on big highmem machines

tlb-cleanup.patch
  Clean up the tlb gather code

dump-stack.patch
  arch-neutral dump_stack() function

wli-cleanup.patch
  random cleanups

madvise-move.patch
  move mdavise implementation into mm/madvise.c

split-vma.patch
  VMA splitting patch

mmap-fixes.patch
  mmap.c cleanup and lock ranking fixes

buffer-ops-move.patch
  Move submit_bh() and ll_rw_block() into fs/buffer.c

slab-stats.patch
  Display total slab memory in /proc/meminfo

writeback-control.patch
  Cleanup and extension of the writeback paths

free_area_init-cleanup.patch
  free_area_init() code cleanup

alloc_pages-cleanup.patch
  alloc_pages cleanup and optimisation

statm_pgd_range-sucks.patch
  Remove the pagetable walk from /proc/stat

remove-sync_thresh.patch
  Remove /proc/sys/vm/dirty_sync_thresh

taka-writev.patch
  Speed up writev

pf_nowarn.patch
  Fix up the handling of PF_NOWARN

jeremy.patch
  Spel Jermy's naim wright

queue-congestion.patch
  Infrastructure for communicating request queue congestion to the VM

nonblocking-ext2-preread.patch
  avoid ext2 inode prereads if the queue is congested

nonblocking-pdflush.patch
  non-blocking writeback infrastructure, use it for pdflush

nonblocking-vm.patch
  Non-blocking page reclaim

wake-speedup.patch
  Faster wakeup code

vm-wakeups.patch
  Use the faster wakeups in the VM and block layers

sync-helper.patch
  Speed up sys_sync() against multiple spindles

slabasap.patch
  Early and smarter shrinking of slabs

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

buddyinfo.patch
  Add /proc/buddyinfo - stats on the free pages pool

free_area.patch
  Remove struct free_area_struct and free_area_t, use `struct free_area'

per-node-kswapd.patch
  Per-node kswapd instance

topology-api.patch
  NUMA topology API

radix_tree_gang_lookup.patch
  radix tree gang lookup

truncate_inode_pages.patch
  truncate/invalidate_inode_pages rewrite

proc_vmstat.patch
  Move the vm accounting out of /proc/stat

kswapd-reclaim-stats.patch
  Add kswapd_steal to /proc/vmstat

iowait.patch
  I/O wait statistics

tcp-wakeups.patch
  Use fast wakeups in TCP/IPV4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-14  4:06 2.5.34-mm4 Andrew Morton
  2002-09-14  4:01 ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 10:50 ` Axel Siebenwirth
  2002-09-15 14:31   ` 2.5.34-mm4 Rik van Riel
  2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
  1 sibling, 2 replies; 18+ messages in thread
From: Axel Siebenwirth @ 2002-09-15 10:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm, lse-tech

Hi Andrew!

On Fri, 13 Sep 2002, Andrew Morton wrote:

> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/

With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
quite unresponsive behaviour. For example I am building X which at that
special moment causes pretty heavy disk load and the system doesn't respond
at all. I was using X and was not able to switch consoles or move mouse only
extremely sluggish.
I have seen that it used more swap that usual.

             total       used       free     shared    buffers     cached
Mem:        191096     159340      31756          0      10568      94100
-/+ buffers/cache:      54672     136424
Swap:       289160          0     289160

This is how it looks like under normal circumstances and when building X I
had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong.
Unfortunately I was not able to start vmstat, first because I can't start
vmstat when system is not responding and second it doesn't work anyway
because of your changes.

Best regards,
Axel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
@ 2002-09-15 14:31   ` Rik van Riel
  2002-09-16 18:33     ` 2.5.34-mm4 Bill Davidsen
  2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
  1 sibling, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2002-09-15 14:31 UTC (permalink / raw)
  To: Axel Siebenwirth; +Cc: Andrew Morton, lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Axel Siebenwirth wrote:
> On Fri, 13 Sep 2002, Andrew Morton wrote:
>
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
>
> With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> quite unresponsive behaviour.

Don't worry, it's supposed to do that. You can't measure desktop
interactivity, so it doesn't exist ;)


Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
@ 2002-09-15 17:36     ` Rik van Riel
  2002-09-15 17:39     ` 2.5.34-mm4 Rik van Riel
  1 sibling, 0 replies; 18+ messages in thread
From: Rik van Riel @ 2002-09-15 17:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Andrew Morton wrote:

> Unfortunately, those updates cause odd-but-not-serious things to
> happen to Red Hat initscripts.  This happens when you install standard
> util-linux as well.  It is due to the initscripts passing in arguments
> which the standard tools do not understand.

I'm about to add all patches from the RH procps rpm to the
procps cvs tree, so this should go away soon.

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
  2002-09-15 17:36     ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 17:39     ` Rik van Riel
  2002-09-15 17:49       ` 2.5.34-mm4 M. Edward Borasky
  1 sibling, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2002-09-15 17:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Andrew Morton wrote:
> Axel Siebenwirth wrote:

> > I have seen that it used more swap that usual.
>
> 2.5 is much more swaphappy than 2.4.  I believe that this is actually
> correct behaviour for optimum throughput.  But it just happens that
> people (me included) hate it.

Time for a corollary to "if you can't measure it, it doesn't exist":

"If you can't measure desktop performance, our method of development
 will ensure it won't exist"

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
  2002-09-15 14:31   ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 17:41   ` Andrew Morton
  2002-09-15 17:36     ` 2.5.34-mm4 Rik van Riel
  2002-09-15 17:39     ` 2.5.34-mm4 Rik van Riel
  1 sibling, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2002-09-15 17:41 UTC (permalink / raw)
  To: Axel Siebenwirth, Con Kolivas; +Cc: lkml, linux-mm, lse-tech

Axel Siebenwirth wrote:
> 
> Hi Andrew!
> 
> On Fri, 13 Sep 2002, Andrew Morton wrote:
> 
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
> 
> With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> quite unresponsive behaviour. For example I am building X which at that
> special moment causes pretty heavy disk load and the system doesn't respond
> at all. I was using X and was not able to switch consoles or move mouse only
> extremely sluggish.

There are large IDE updates in -mm4, and this is consistent with
a disk which isn't doing DMA any more.  Could you (and Con) please
double-check with `hdparm -i' and `hdparm -t' that the disk subsystem
is behaving properly?

Yes, it could well be a VM bug, but I wouldn't want to run round in
confused circles all day ;)  Thanks.

> I have seen that it used more swap that usual.

2.5 is much more swaphappy than 2.4.  I believe that this is actually
correct behaviour for optimum throughput.  But it just happens that
people (me included) hate it.  We don't notice the improved runtimes
for the pagecache-intensive operations but we do notice the time it
takes to get the xterms working again.

We have not yet sat down and worked out what to do about this.

>              total       used       free     shared    buffers     cached
> Mem:        191096     159340      31756          0      10568      94100
> -/+ buffers/cache:      54672     136424
> Swap:       289160          0     289160
> 
> This is how it looks like under normal circumstances and when building X I
> had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong.
> Unfortunately I was not able to start vmstat, first because I can't start
> vmstat when system is not responding and second it doesn't work anyway
> because of your changes.
> 

Yeah, sorry.  The burden of back-compatibility weighed too heavy and
Rik decided that we just have to fix userspace to follow kernel
changes.  There will be breakage for a while;  updates are at
http://surriel.com/procps/.

Unfortunately, those updates cause odd-but-not-serious things to
happen to Red Hat initscripts.  This happens when you install standard
util-linux as well.  It is due to the initscripts passing in arguments
which the standard tools do not understand.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.5.34-mm4
  2002-09-15 17:39     ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 17:49       ` M. Edward Borasky
  2002-09-15 17:54         ` 2.5.34-mm4 Rik van Riel
  0 siblings, 1 reply; 18+ messages in thread
From: M. Edward Borasky @ 2002-09-15 17:49 UTC (permalink / raw)
  To: Rik van Riel, Andrew Morton
  Cc: Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech

Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
cheapest solution may still be to buy more memory, more disks or a faster
processor.

Borasky's Corollary 2: When you try to measure the performance of people the
way you measure performance of computers, you need psychological help.

M. Edward (Ed) Borasky
mailto: znmeb@borasky-research.net
http://www.pdxneurosemantics.com
http://www.meta-trading-coach.com
http://www.borasky-research.net

Coaching: It's Not Just for Athletes and Executives Any More!

-----Original Message-----
From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org]On Behalf Of
Rik van Riel
Sent: Sunday, September 15, 2002 10:39 AM
To: Andrew Morton
Cc: Axel Siebenwirth; Con Kolivas; lkml; linux-mm@kvack.org;
lse-tech@lists.sourceforge.net
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:
> Axel Siebenwirth wrote:

> > I have seen that it used more swap that usual.
>
> 2.5 is much more swaphappy than 2.4.  I believe that this is actually
> correct behaviour for optimum throughput.  But it just happens that
> people (me included) hate it.

Time for a corollary to "if you can't measure it, it doesn't exist":

"If you can't measure desktop performance, our method of development
 will ensure it won't exist"

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/         http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 2.5.34-mm4
  2002-09-15 17:49       ` 2.5.34-mm4 M. Edward Borasky
@ 2002-09-15 17:54         ` Rik van Riel
  2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2002-09-15 17:54 UTC (permalink / raw)
  To: M. Edward Borasky
  Cc: Andrew Morton, Axel Siebenwirth, Con Kolivas, lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, M. Edward Borasky wrote:

> Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
> cheapest solution may still be to buy more memory, more disks or a
> faster processor.

Current 2.5 is sluggish on systems with a fast CPU and 768 MB
of RAM, whereas current -ac runs the same workload smoothly
with 128 MB of RAM.

Now tell me, what's your point ?

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 17:54         ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 18:55           ` Andrew Morton
  2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andrew Morton @ 2002-09-15 18:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm,
	lse-tech

Rik van Riel wrote:
> 
> On Sun, 15 Sep 2002, M. Edward Borasky wrote:
> 
> > Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
> > cheapest solution may still be to buy more memory, more disks or a
> > faster processor.
> 
> Current 2.5 is sluggish on systems with a fast CPU and 768 MB
> of RAM, whereas current -ac runs the same workload smoothly
> with 128 MB of RAM.
> 

I've been running 2.5 on my desktop at work (800MHz/256M UP) since
2.5.26 and on the machine at home (Dual 850MHz/768M) on-and-off
(recent freizures sent that machine back to Marcelo; need to try
again).  I also ran 2.4.19-ac-something for a couple of weeks.

Impressions are:

- 2.5 swaps a lot in response to heavy pagecache activity.

  SEGQ didn't change that, actually.  And this is correct,
  as-designed behaviour.  We'll need some "don't be irritating"
  knob to prevent this.  Or speculative pagein when the load
  has subsided, which would be a fair-sized project.

- In both -ac and 2.5 the scheduler is prone to starving interactive
  applications (netscape 4, gkrellm, command-line gdb, others) when
  there is a compilation happening.

  This is very, very noticeable; and it afects applications which
  do not use sched_yield().  Ingo has put some extra stuff in since
  then and I need to retest.

- In -ac, there are noticeable stalls during heavy writeout.  This
  may be an ext3 thing, but I can't think of any IO scheduling
  differences in -ac ext3.  I'd be guessing that it is due to
  bdflush/kupdate lumpiness.

Overall I find Marcelo kernels to be the most comfortable, followed
by 2.5.  Alan's kernels I find to be the least comfortable in a
"developer's desktop" situation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
@ 2002-09-15 18:56             ` Rik van Riel
  2002-09-16  1:33               ` 2.5.34-mm4 Alan Cox
  2002-09-15 19:10             ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen
  2002-09-16 18:48             ` 2.5.34-mm4 Bill Davidsen
  2 siblings, 1 reply; 18+ messages in thread
From: Rik van Riel @ 2002-09-15 18:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: M. Edward Borasky, Axel Siebenwirth, Con Kolivas, lkml, linux-mm,
	lse-tech

On Sun, 15 Sep 2002, Andrew Morton wrote:

> - In -ac, there are noticeable stalls during heavy writeout.  This
>   may be an ext3 thing, but I can't think of any IO scheduling
>   differences in -ac ext3.  I'd be guessing that it is due to
>   bdflush/kupdate lumpiness.

This is also due to the fact that -ac has an older -rmap
VM. As in current 2.5, rmap can write out all inactive
pages ... and it did in some worst case situations.

This is fixed in rmap14.

(I hope Alan is done playing with IDE soon so I can push
him a VM update)

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lse-tech] Re: 2.5.34-mm4
  2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
  2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
@ 2002-09-15 19:10             ` Andi Kleen
  2002-09-16 18:51               ` Bill Davidsen
  2002-09-16 18:48             ` 2.5.34-mm4 Bill Davidsen
  2 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2002-09-15 19:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, M. Edward Borasky, Axel Siebenwirth, Con Kolivas,
	lkml, linux-mm, lse-tech

> Overall I find Marcelo kernels to be the most comfortable, followed
> by 2.5.  Alan's kernels I find to be the least comfortable in a

... and -aa kernels are marcelo kernels, just with the the corner
cases fixed too. Works very nicely here.

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
@ 2002-09-16  1:33               ` Alan Cox
  2002-09-16  2:32                 ` [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Cox @ 2002-09-16  1:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, M. Edward Borasky, Axel Siebenwirth, Con Kolivas,
	lkml, linux-mm, lse-tech

On Sun, 2002-09-15 at 19:56, Rik van Riel wrote:
> On Sun, 15 Sep 2002, Andrew Morton wrote:
> 
> > - In -ac, there are noticeable stalls during heavy writeout.  This
> >   may be an ext3 thing, but I can't think of any IO scheduling
> >   differences in -ac ext3.  I'd be guessing that it is due to
> >   bdflush/kupdate lumpiness.

I think so. I've always been conservative, I need rmap to pass cerberus
still. But the rmap in -ac is out of date a little with the 2.5 tuning

> This is also due to the fact that -ac has an older -rmap
> VM. As in current 2.5, rmap can write out all inactive
> pages ... and it did in some worst case situations.
> 
> This is fixed in rmap14.
> 
> (I hope Alan is done playing with IDE soon so I can push
> him a VM update)

The big one left to fix is the simplex device bug - which is an "I know
why". The great mystery is the affair of taskfile pio write. Other than
that its annoying glitches not big problems now.

So send me rmap-14a patches by all means

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH](1/2) rmap14 for ac  (was: Re: 2.5.34-mm4)
  2002-09-16  1:33               ` 2.5.34-mm4 Alan Cox
@ 2002-09-16  2:32                 ` Rik van Riel
  0 siblings, 0 replies; 18+ messages in thread
From: Rik van Riel @ 2002-09-16  2:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm, lkml

On 16 Sep 2002, Alan Cox wrote:

> So send me rmap-14a patches by all means

Here they come.  This first patch updates 2.4.20-pre5-ac6 to
rmap14. An incremental patch to rmap14a + misc bugfixes will
be in your mailbox in a few minutes...

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Spamtraps of the month:  september@surriel.com trac@trac.org


--- linux-2.4.19-pre2-ac3/mm/filemap.c.rmap13b	2002-08-15 23:53:06.000000000 -0300
+++ linux-2.4.19-pre2-ac3/mm/filemap.c	2002-08-15 23:56:37.000000000 -0300
@@ -237,12 +237,11 @@

 static void truncate_complete_page(struct page *page)
 {
-	/* Page has already been removed from processes, by vmtruncate()  */
-	if (page->pte_chain)
-		BUG();
-
-	/* Leave it on the LRU if it gets converted into anonymous buffers */
-	if (!page->buffers || do_flushpage(page, 0))
+	/*
+	 * Leave it on the LRU if it gets converted into anonymous buffers
+	 * or anonymous process memory.
+	 */
+	if ((!page->buffers || do_flushpage(page, 0)) && !page->pte_chain)
 		lru_cache_del(page);

 	/*
--- linux-2.4.19-pre2-ac3/mm/memory.c.rmap13b	2002-08-15 23:53:14.000000000 -0300
+++ linux-2.4.19-pre2-ac3/mm/memory.c	2002-08-15 23:59:04.000000000 -0300
@@ -380,49 +380,65 @@
 	return freed;
 }

-/*
- * remove user pages in a given range.
+#define ZAP_BLOCK_SIZE	(256 * PAGE_SIZE)
+
+/**
+ * zap_page_range - remove user pages in a given range
+ * @mm: mm_struct containing the applicable pages
+ * @address: starting address of pages to zap
+ * @size: number of bytes to zap
  */
 void zap_page_range(struct mm_struct *mm, unsigned long address, unsigned long size)
 {
 	mmu_gather_t *tlb;
 	pgd_t * dir;
-	unsigned long start = address, end = address + size;
-	int freed = 0;
-
-	dir = pgd_offset(mm, address);
-
+	unsigned long start, end, addr, block;
+	int freed;
+
 	/*
-	 * This is a long-lived spinlock. That's fine.
-	 * There's no contention, because the page table
-	 * lock only protects against kswapd anyway, and
-	 * even if kswapd happened to be looking at this
-	 * process we _want_ it to get stuck.
+	 * Break the work up into blocks of ZAP_BLOCK_SIZE pages:
+	 * this decreases lock-hold time for the page_table_lock
+	 * dramatically, which could otherwise be held for a very
+	 * long time.  This decreases lock contention and increases
+	 * periods of preemptibility.
 	 */
-	if (address >= end)
-		BUG();
-	spin_lock(&mm->page_table_lock);
-	flush_cache_range(mm, address, end);
-	tlb = tlb_gather_mmu(mm);
+	while (size) {
+		if (size > ZAP_BLOCK_SIZE)
+			block = ZAP_BLOCK_SIZE;
+		else
+			block = size;
+
+		freed = 0;
+		start = addr = address;
+		end = address + block;
+		dir = pgd_offset(mm, address);

-	do {
-		freed += zap_pmd_range(tlb, dir, address, end - address);
-		address = (address + PGDIR_SIZE) & PGDIR_MASK;
-		dir++;
-	} while (address && (address < end));
+		BUG_ON(address >= end);

-	/* this will flush any remaining tlb entries */
-	tlb_finish_mmu(tlb, start, end);
+		spin_lock(&mm->page_table_lock);
+		flush_cache_range(mm, start, end);
+		tlb = tlb_gather_mmu(mm);

-	/*
-	 * Update rss for the mm_struct (not necessarily current->mm)
-	 * Notice that rss is an unsigned long.
-	 */
-	if (mm->rss > freed)
-		mm->rss -= freed;
-	else
-		mm->rss = 0;
-	spin_unlock(&mm->page_table_lock);
+		do {
+			freed += zap_pmd_range(tlb, dir, addr, end - addr);
+			addr = (addr + PGDIR_SIZE) & PGDIR_MASK;
+			dir++;
+		} while (addr && (addr < end));
+
+		/* this will flush any remaining tlb entries */
+		tlb_finish_mmu(tlb, start, end);
+
+		/* Update rss for the mm_struct (need not be current->mm) */
+		if (mm->rss > freed)
+			mm->rss -= freed;
+		else
+			mm->rss = 0;
+
+		spin_unlock(&mm->page_table_lock);
+
+		address += block;
+		size -= block;
+	}
 }

 /*
@@ -873,18 +889,19 @@
 static inline int remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address, unsigned long size,
 	unsigned long phys_addr, pgprot_t prot)
 {
-	unsigned long end;
+	unsigned long base, end;

+	base = address & PGDIR_MASK;
 	address &= ~PGDIR_MASK;
 	end = address + size;
 	if (end > PGDIR_SIZE)
 		end = PGDIR_SIZE;
 	phys_addr -= address;
 	do {
-		pte_t * pte = pte_alloc(mm, pmd, address);
+		pte_t * pte = pte_alloc(mm, pmd, address + base);
 		if (!pte)
 			return -ENOMEM;
-		remap_pte_range(pte, address, end - address, address + phys_addr, prot);
+		remap_pte_range(pte, base + address, end - address, address + phys_addr, prot);
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address && (address < end));
--- linux-2.4.19-pre2-ac3/mm/vmscan.c.rmap13b	2002-08-15 23:53:26.000000000 -0300
+++ linux-2.4.19-pre2-ac3/mm/vmscan.c	2002-08-15 23:59:04.000000000 -0300
@@ -195,6 +195,7 @@
  * page_launder_zone - clean dirty inactive pages, move to inactive_clean list
  * @zone: zone to free pages in
  * @gfp_mask: what operations we are allowed to do
+ * @full_flush: full-out page flushing, if we couldn't get enough clean pages
  *
  * This function is called when we are low on free / inactive_clean
  * pages, its purpose is to refill the free/clean list as efficiently
@@ -208,19 +209,30 @@
  * This code is heavily inspired by the FreeBSD source code. Thanks
  * go out to Matthew Dillon.
  */
-#define	CAN_DO_FS	((gfp_mask & __GFP_FS) && should_write)
-int page_launder_zone(zone_t * zone, int gfp_mask, int priority)
+int page_launder_zone(zone_t * zone, int gfp_mask, int full_flush)
 {
-	int maxscan, cleaned_pages, target;
-	struct list_head * entry;
+	int maxscan, cleaned_pages, target, maxlaunder, iopages;
+	struct list_head * entry, * next;

 	target = free_plenty(zone);
-	cleaned_pages = 0;
+	cleaned_pages = iopages = 0;
+
+	/* If we can get away with it, only flush 2 MB worth of dirty pages */
+	if (full_flush)
+		maxlaunder = 1000000;
+	else {
+		maxlaunder = min_t(int, 512, zone->inactive_dirty_pages / 4);
+		maxlaunder = max(maxlaunder, free_plenty(zone));
+	}

 	/* The main launder loop. */
+rescan:
 	spin_lock(&pagemap_lru_lock);
-	maxscan = zone->inactive_dirty_pages >> priority;
-	while (maxscan-- && !list_empty(&zone->inactive_dirty_list)) {
+	maxscan = zone->inactive_dirty_pages;
+	entry = zone->inactive_dirty_list.prev;
+	next = entry->prev;
+	while (maxscan-- && !list_empty(&zone->inactive_dirty_list) &&
+			next != &zone->inactive_dirty_list) {
 		struct page * page;

 		/* Low latency reschedule point */
@@ -231,14 +243,20 @@
 			continue;
 		}

-		entry = zone->inactive_dirty_list.prev;
+		entry = next;
+		next = entry->prev;
 		page = list_entry(entry, struct page, lru);

+		/* This page was removed while we looked the other way. */
+		if (!PageInactiveDirty(page))
+			goto rescan;
+
 		if (cleaned_pages > target)
 			break;

-		list_del(entry);
-		list_add(entry, &zone->inactive_dirty_list);
+		/* Stop doing IO if we've laundered too many pages already. */
+		if (maxlaunder < 0)
+			gfp_mask &= ~(__GFP_IO|__GFP_FS);

 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
@@ -257,7 +275,6 @@

 		/*
 		 * The page is locked. IO in progress?
-		 * Move it to the back of the list.
 		 * Acquire PG_locked early in order to safely
 		 * access page->mapping.
 		 */
@@ -341,10 +358,16 @@
 				spin_unlock(&pagemap_lru_lock);

 				writepage(page);
+				maxlaunder--;
 				page_cache_release(page);

 				spin_lock(&pagemap_lru_lock);
 				continue;
+			} else {
+				UnlockPage(page);
+				list_del(entry);
+				list_add(entry, &zone->inactive_dirty_list);
+				continue;
 			}
 		}

@@ -391,6 +414,7 @@
 				/* failed to drop the buffers so stop here */
 				UnlockPage(page);
 				page_cache_release(page);
+				maxlaunder--;

 				spin_lock(&pagemap_lru_lock);
 				continue;
@@ -443,21 +467,19 @@
  */
 int page_launder(int gfp_mask)
 {
-	int maxtry = 1 << DEF_PRIORITY;
 	struct zone_struct * zone;
 	int freed = 0;

 	/* Global balancing while we have a global shortage. */
-	while (maxtry-- && free_high(ALL_ZONES) >= 0) {
+	if (free_high(ALL_ZONES) >= 0)
 		for_each_zone(zone)
 			if (free_plenty(zone) >= 0)
-				freed += page_launder_zone(zone, gfp_mask, 6);
-	}
+				freed += page_launder_zone(zone, gfp_mask, 0);

 	/* Clean up the remaining zones with a serious shortage, if any. */
 	for_each_zone(zone)
 		if (free_min(zone) >= 0)
-			freed += page_launder_zone(zone, gfp_mask, 0);
+			freed += page_launder_zone(zone, gfp_mask, 1);

 	return freed;
 }
@@ -814,6 +836,7 @@
 	set_current_state(TASK_UNINTERRUPTIBLE);
 	schedule_timeout(HZ / 4);
 	kswapd_overloaded = 0;
+	wmb();
 	return;
 }

--- linux-2.4.19-pre2-ac3/include/linux/mm.h.rmap13b	2002-08-15 23:52:54.000000000 -0300
+++ linux-2.4.19-pre2-ac3/include/linux/mm.h	2002-08-16 00:01:31.000000000 -0300
@@ -344,15 +344,19 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
+#ifdef CONFIG_SMP
 	while (test_and_set_bit(PG_chainlock, &page->flags)) {
 		while (test_bit(PG_chainlock, &page->flags))
 			cpu_relax();
 	}
+#endif
 }

 static inline void pte_chain_unlock(struct page *page)
 {
+#ifdef CONFIG_SMP
 	clear_bit(PG_chainlock, &page->flags);
+#endif
 }

 /*
--- linux-2.4.19-pre2-ac3/include/linux/mmzone.h.rmap13b	2002-08-15 23:53:00.000000000 -0300
+++ linux-2.4.19-pre2-ac3/include/linux/mmzone.h	2002-08-16 00:01:31.000000000 -0300
@@ -27,8 +27,6 @@
 struct pglist_data;
 struct pte_chain;

-#define MAX_CHUNKS_PER_NODE 8
-
 /*
  * On machines where it is needed (eg PCs) we divide physical memory
  * into multiple physical zones. On a PC we have 3 zones:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 14:31   ` 2.5.34-mm4 Rik van Riel
@ 2002-09-16 18:33     ` Bill Davidsen
  0 siblings, 0 replies; 18+ messages in thread
From: Bill Davidsen @ 2002-09-16 18:33 UTC (permalink / raw)
  To: Rik van Riel; +Cc: lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Rik van Riel wrote:

> On Sun, 15 Sep 2002, Axel Siebenwirth wrote:
> > On Fri, 13 Sep 2002, Andrew Morton wrote:
> >
> > > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
> >
> > With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> > quite unresponsive behaviour.
> 
> Don't worry, it's supposed to do that. You can't measure desktop
> interactivity, so it doesn't exist ;)

But now we have `contest' and we can, so it does.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.5.34-mm4
  2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
  2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
  2002-09-15 19:10             ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen
@ 2002-09-16 18:48             ` Bill Davidsen
  2 siblings, 0 replies; 18+ messages in thread
From: Bill Davidsen @ 2002-09-16 18:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Andrew Morton wrote:

> Impressions are:
> 
> - 2.5 swaps a lot in response to heavy pagecache activity.
> 
>   SEGQ didn't change that, actually.  And this is correct,
>   as-designed behaviour.  We'll need some "don't be irritating"
>   knob to prevent this.  Or speculative pagein when the load
>   has subsided, which would be a fair-sized project.

It would be nice to have a knob in /proc/sys which could be tuned for
response or throughput, Preferably not a boolean;-) I suspect that we
would have lack of agreement on what that would do, but it sure would be
nice!

> - In both -ac and 2.5 the scheduler is prone to starving interactive
>   applications (netscape 4, gkrellm, command-line gdb, others) when
>   there is a compilation happening.
> 
>   This is very, very noticeable; and it afects applications which
>   do not use sched_yield().  Ingo has put some extra stuff in since
>   then and I need to retest.
> 
> - In -ac, there are noticeable stalls during heavy writeout.  This
>   may be an ext3 thing, but I can't think of any IO scheduling
>   differences in -ac ext3.  I'd be guessing that it is due to
>   bdflush/kupdate lumpiness.

I have the feeling that 2.5 is less good about noting that a file is open
for write only and no seeks have been done. I haven't measured it, but it
would seem that writes to such a file would be better on the disk and not
taking buffers, since they're probably not going to be read.

This is just based on running mkisofs on 2.4.19 and 2.5.34, a watching "no
disk activity" followed by a heavy burst. I haven't made any careful
measurement, so take this as you will, but I agree that heavy write bogs
the system. Clearly with big memory I can/do get the whole ~700MB in
memory if writes don't start quickly.

Yes, that could be tuning, I know that.

> Overall I find Marcelo kernels to be the most comfortable, followed
> by 2.5.  Alan's kernels I find to be the least comfortable in a
> "developer's desktop" situation.

On small memory machines I don't see as much to choose, and the -ck series
has been very nice to me. I don't run 2.5 on any but test machines, and
both are big memory (1+GB) machines.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lse-tech] Re: 2.5.34-mm4
  2002-09-15 19:10             ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen
@ 2002-09-16 18:51               ` Bill Davidsen
  2002-09-19  9:01                 ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Bill Davidsen @ 2002-09-16 18:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, linux-mm, lse-tech

On Sun, 15 Sep 2002, Andi Kleen wrote:

> > Overall I find Marcelo kernels to be the most comfortable, followed
> > by 2.5.  Alan's kernels I find to be the least comfortable in a
> 
> ... and -aa kernels are marcelo kernels, just with the the corner
> cases fixed too. Works very nicely here.

Corner cases? The IDE, VM and scheduler are different...

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Lse-tech] Re: 2.5.34-mm4
  2002-09-16 18:51               ` Bill Davidsen
@ 2002-09-19  9:01                 ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2002-09-19  9:01 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Andi Kleen, lkml, linux-mm, lse-tech

On Mon, Sep 16 2002, Bill Davidsen wrote:
> On Sun, 15 Sep 2002, Andi Kleen wrote:
> 
> > > Overall I find Marcelo kernels to be the most comfortable, followed
> > > by 2.5.  Alan's kernels I find to be the least comfortable in a
> > 
> > ... and -aa kernels are marcelo kernels, just with the the corner
> > cases fixed too. Works very nicely here.
> 
> Corner cases? The IDE, VM and scheduler are different...

The IDE is the same, I'll refrain from commenting on the rest. There's
just an adjustment to the read ahead, which makes sense.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-09-19  9:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-14  4:06 2.5.34-mm4 Andrew Morton
2002-09-14  4:01 ` 2.5.34-mm4 Rik van Riel
2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
2002-09-15 14:31   ` 2.5.34-mm4 Rik van Riel
2002-09-16 18:33     ` 2.5.34-mm4 Bill Davidsen
2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
2002-09-15 17:36     ` 2.5.34-mm4 Rik van Riel
2002-09-15 17:39     ` 2.5.34-mm4 Rik van Riel
2002-09-15 17:49       ` 2.5.34-mm4 M. Edward Borasky
2002-09-15 17:54         ` 2.5.34-mm4 Rik van Riel
2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
2002-09-16  1:33               ` 2.5.34-mm4 Alan Cox
2002-09-16  2:32                 ` [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel
2002-09-15 19:10             ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen
2002-09-16 18:51               ` Bill Davidsen
2002-09-19  9:01                 ` Jens Axboe
2002-09-16 18:48             ` 2.5.34-mm4 Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox