* 2.5.40-mm1
@ 2002-10-01 9:32 Andrew Morton
2002-10-08 6:46 ` 2.5.40-mm1 Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2002-10-01 9:32 UTC (permalink / raw)
To: lkml, linux-mm
url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.40/2.5.40-mm1/
Mainly a resync.
- A few minor problems in the per-cpu-pages code have been fixed.
- Updated dcache RCU code.
- Significant brain surgery on the SARD patch.
- Decreased the disk scheduling tunable `fifo_batch' from 32 to 16 to
improve disk read latency.
- Updated ext3 htree patch from Ted.
- Included a patch from Mala Anand which _should_ speed up kernel<->userspace
memory copies for Intel ia32 hardware. But I can't measure any difference
with poorly-aligned pagecache copies.
-scsi_hack.patch
-might_sleep-2.patch
-slab-fix.patch
-hugetlb-doc.patch
-get_user_pages-PG_reserved.patch
-move_one_page_fix.patch
-zab-list_heads.patch
-remove-gfp_nfs.patch
-buddyinfo.patch
-free_area.patch
-per-node-kswapd.patch
-topology-api.patch
-topology_fixes.patch
Merged
+misc.patch
Trivia
+ioperm-fix.patch
Fix the sys_ioperm() might-sleep-while-atomic bug
-sard.patch
+bd-sard.patch
Somewhat rewritten to not key everything off minors and majors - use
pointers instead.
+bio-get-nr-vecs.patch
use bio_get_nr_vecs in fs/mpage.c
+dio-nr-segs.patch
use bio_get_nr_vecs in fs/direct-io.c
-per-node-zone_normal.patch
+per-node-mem_map.patch
Renamed
+free_area_init-cleanup.patch
Clean up some mm init code.
+intel-user-copy.patch
Supposedly faster copy_*_user.
ext3-dxdir.patch
ext3 htree
spin-lock-check.patch
spinlock/rwlock checking infrastructure
rd-cleanup.patch
Cleanup and fix the ramdisk driver (doesn't work right yet)
misc.patch
misc
write-deadlock.patch
Fix the generic_file_write-from-same-mmapped-page deadlock
ioperm-fix.patch
sys_ioperm() atomicity fix
radix_tree_gang_lookup.patch
radix tree gang lookup
truncate_inode_pages.patch
truncate/invalidate_inode_pages rewrite
proc_vmstat.patch
Move the vm accounting out of /proc/stat
kswapd-reclaim-stats.patch
Add kswapd_steal to /proc/vmstat
iowait.patch
I/O wait statistics
bd-sard.patch
dio-bio-add-page.patch
Use bio_add_page() in direct-io.c
tcp-wakeups.patch
Use fast wakeups in TCP/IPV4
swapoff-deadlock.patch
Fix a tmpfs swapoff deadlock
dirty-and-uptodate.patch
page state cleanup
shmem_rename.patch
shmem_rename() directory link count fix
dirent-size.patch
tmpfs: show a non-zero size for directories
tmpfs-trivia.patch
tmpfs: small fixlets
per-zone-vm.patch
separate the kswapd and direct reclaim code paths
swsusp-feature.patch
add shrink_all_memory() for swsusp
bio-get-nr-vecs.patch
use bio_get_nr_vecs() in fs/mpage.c
dio-nr-segs.patch
Use bio_get_nr_vecs() in direct-io.c
remove-page-virtual.patch
remove page->virtual for !WANT_PAGE_VIRTUAL
dirty-memory-clamp.patch
sterner dirty-memory clamping
mempool-wakeup-fix.patch
Fix for stuck tasks in mempool_alloc()
remove-write_mapping_buffers.patch
Remove write_mapping_buffers
buffer_boundary-scheduling.patch
IO schduling for indirect blocks
ll_rw_block-cleanup.patch
cleanup ll_rw_block()
lseek-ext2_readdir.patch
remove lock_kernel() from ext2_readdir()
discontig-no-contig_page_data.patch
undefine contif_page_data for discontigmem
per-node-mem_map.patch
ia32 NUMA: per-node ZONE_NORMAL
alloc_pages_node-cleanup.patch
alloc_pages_node cleanup
free_area_init-cleanup.patch
free_area_init_node cleanup
batched-slab-asap.patch
batched slab shrinking
akpm-deadline.patch
deadline scheduler tweaks
rmqueue_bulk.patch
bulk page allocator
free_pages_bulk.patch
Bulk page freeing function
hot_cold_pages.patch
Hot/Cold pages and zone->lock amortisation
EDEC
Hot/Cold pages and zone->lock amortisation
readahead-cold-pages.patch
Use cache-cold pages for pagecache reads.
pagevec-hot-cold-hint.patch
hot/cold hints for truncate and page reclaim
intel-user-copy.patch
read_barrier_depends.patch
extended barrier primitives
rcu_ltimer.patch
RCU core
dcache_rcu.patch
Use RCU for dcache
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.40-mm1
2002-10-01 9:32 2.5.40-mm1 Andrew Morton
@ 2002-10-08 6:46 ` Andrew Morton
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2002-10-08 6:46 UTC (permalink / raw)
To: lkml, linux-mm, Mala Anand
Andrew Morton wrote:
>
> ...
> - Included a patch from Mala Anand which _should_ speed up kernel<->userspace
> memory copies for Intel ia32 hardware. But I can't measure any difference
> with poorly-aligned pagecache copies.
>
Well Mala, I have to take that back. I must have forgotten to
turn on my computer or brain or something. Your patch kicks
butt.
In this test I timed how long it took to read a fully-cached
1 gigabyte file into an 8192-byte userspace buffer. The alignment
of the user buffer was incremented by one byte between runs.
for i in $(seq 0 32)
do
time time-read -a $i -b 8192 -h 8192 foo
done
time-read.c is in http://www.zip.com.au/~akpm/linux/patches/2.5/ext3-tools.tar.gz
The CPU is "Pentium III (Katmai)"
All times are in seconds:
User buffer 2.5.41 2.5.41+ 2.5.41+
patch patch++
0x804c000 4.373 4.387 6.063
0x804c001 10.024 6.410
0x804c002 10.002 6.411
0x804c003 10.013 6.408
0x804c004 10.105 6.343
0x804c005 10.184 6.394
0x804c006 10.179 6.398
0x804c007 10.185 6.408
0x804c008 9.725 9.724 6.347
0x804c009 9.780 6.436
0x804c00a 9.779 6.421
0x804c00b 9.778 6.433
0x804c00c 9.723 6.402
0x804c00d 9.790 6.382
0x804c00e 9.790 6.381
0x804c00f 9.785 6.380
0x804c010 9.727 9.723 6.277
0x804c011 9.779 6.360
0x804c012 9.783 6.345
0x804c013 9.786 6.341
0x804c014 9.772 6.133
0x804c015 9.919 6.327
0x804c016 9.920 6.319
0x804c017 9.918 6.319
0x804c018 9.846 9.857 6.372
0x804c019 10.060 6.443
0x804c01a 10.049 6.436
0x804c01b 10.041 6.432
0x804c01c 9.931 6.356
0x804c01d 10.013 6.432
0x804c01e 10.020 6.425
0x804c01f 10.016 6.444
0x804c020 4.442 4.423 6.380
So the patch is a 30% win at all alignments except for 32-byte-aligned
destination addresses.
Now, in the patch++ I modified things so we use the copy_user_int()
function for _all_ alignments. Look at the 0x804c008 alignment.
We sped up the copies by 30% by using copy_user_int() instead of
rep;movsl.
This is important, because glibc malloc() returns addresses which
are N+8 aligned. I would expect that this alignment is common.
So. Patch is a huge win as-is. For the PIII it looks like we need
to enable it at all alignments except mod32. And we need to test
with aligned dest, unaligned source.
Can you please do some P4 testing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.40-mm1
2002-10-09 23:20 2.5.40-mm1 Mala Anand
@ 2002-10-09 23:32 ` Andrew Morton
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2002-10-09 23:32 UTC (permalink / raw)
To: Mala Anand; +Cc: lkml, linux-mm, Bill Hartner
Mala Anand wrote:
>
> ...
> P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
> Src is aligned and dst is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch 2.5.40+patch++
> Align throughout throughput throughput
> (bytes) KB/sec KB/sec KB/sec
> 0 1360071 1314783 912359
> 1 323674 340447
> 2 329202 336425
> 4 512955 693170
> 8 523223 615097 506641
> 12 517184 558701 553700
> 16 966598 872080 932736
> 32 846937 838514 845178
Note the tremendous slowdown which the P4 suffers when you're not
cacheline aligned. Even 32-byte-aligned is down a lot.
> I see too much variance in the test results so I ran
> each test 3 times. I tried increasing the iterations
> but it did not reduce the variance.
>
> Dst is aligned and src is misaligned as follows:
>
> Dst 2.5.40 2.5.40+patch
> Align throughout throughput
> (bytes) KB/sec KB/sec
> 0 1275372 1029815
> 1 529907 511815
> 2 534811 530850
> 4 643196 627013
> 8 568000 626676
> 12 574468 658793
> 16 631707 635979
> 32 741485 592938
This differs a little from my P4 testing - the rep;movsl approach
seemed OK for 8,16,32 alignment.
But still, that's something we can tune later.
>
> However I have seen using floating point registers instead of integer
> registers on Pentium IV improves performance to a greater extent on
> some alignments. I need to do more testing and then I will create a
> patch for pentium IV.
I believe there are "issues" using those registers in-kernel. Related
to the need to save/restore them, or errata; not too sure about that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.5.40-mm1
@ 2002-10-09 23:20 Mala Anand
2002-10-09 23:32 ` 2.5.40-mm1 Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Mala Anand @ 2002-10-09 23:20 UTC (permalink / raw)
To: akpm, lkml, linux-mm; +Cc: Bill Hartner
>Andrew Morton wrote:
>So. Patch is a huge win as-is. For the PIII it looks like we need
>to enable it at all alignments except mod32. And we need to test
>with aligned dest, unaligned source.
Pentium III (coppermine) 997Mhz 2-way
Read from pagecache to user buffer misaligning the source
Size of copy is 262144 and the number of iterations copied for
each test is 16384.
Patch++ - uses copy_user_int if size > 64
Patch - uses copy_user_int if size > 64, or src and dst
are not aligned on an 8 byte boundary
dst aligned on an 4k and src misaligned
2.5.40 2.5.40+patch 2.5.40+patch++
Align throughout throughput throughput
(bytes) KB/sec KB/sec KB/sec
0 275592 281356 285567
1 124266 197361
2 120157 200270
4 125935 197558
8 157244 156655 162189
16 167296 173202 173702
32 283731 285222 290810
Looks like the patch can be used for all the above tested
alignments on Pentium III.
>Can you please do some P4 testing?
P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
Src is aligned and dst is misaligned as follows:
Dst 2.5.40 2.5.40+patch 2.5.40+patch++
Align throughout throughput throughput
(bytes) KB/sec KB/sec KB/sec
0 1360071 1314783 912359
1 323674 340447
2 329202 336425
4 512955 693170
8 523223 615097 506641
12 517184 558701 553700
16 966598 872080 932736
32 846937 838514 845178
I see too much variance in the test results so I ran
each test 3 times. I tried increasing the iterations
but it did not reduce the variance.
Dst is aligned and src is misaligned as follows:
Dst 2.5.40 2.5.40+patch
Align throughout throughput
(bytes) KB/sec KB/sec
0 1275372 1029815
1 529907 511815
2 534811 530850
4 643196 627013
8 568000 626676
12 574468 658793
16 631707 635979
32 741485 592938
Since there is 5 - 10% variance in these test's results I am not
sure whether we can use this data to validate. I will try
to run this on another pentium 4 machine.
However I have seen using floating point registers instead of integer
registers on Pentium IV improves performance to a greater extent on
some alignments. I need to do more testing and then I will create a
patch for pentium IV.
Regards,
Mala
Mala Anand
IBM Linux Technology Center - Kernel Performance
E-mail:manand@us.ibm.com
http://www-124.ibm.com/developerworks/opensource/linuxperf
http://www-124.ibm.com/developerworks/projects/linuxperf
Phone:838-8088; Tie-line:678-8088
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-10-09 23:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-01 9:32 2.5.40-mm1 Andrew Morton
2002-10-08 6:46 ` 2.5.40-mm1 Andrew Morton
2002-10-09 23:20 2.5.40-mm1 Mala Anand
2002-10-09 23:32 ` 2.5.40-mm1 Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox