* 2.5.67-mm3
@ 2003-04-14 8:53 Andrew Morton
2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Andrew Morton @ 2003-04-14 8:53 UTC (permalink / raw)
To: linux-kernel, linux-mm
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/
A bunch of new fixes, and a framebuffer update. This should work a bit
better than -mm2.
Changes since 2.5.67-mm2:
-kobject-leak-fix.patch
-radix_tree_delete-api-cleanup.patch
-gen_rtc-compile-fix.patch
-warn-null-wakeup.patch
-remove-flush_page_to_ram.patch
-ext3-quota-deadlock-fix.patch
-dont-clear-PG_uptodate-on-ENOSPC.patch
-stack-protection-fix.patch
-sparc-PTE_FILE_MAX_BITS-fix.patch
-file_lock-spinlock.patch
-bootmem-speedup.patch
-mem_map-init-arch-hooks.patch
-tty-modem-control-api.patch
-kmalloc_sizes-fix.patch
-proc-interrupts-kmalloc-size.patch
-vmalloc-stats.patch
-meminfo-doc.patch
-percpu_counter.patch
-blockgroup_lock.patch
-ext2-no-lock_super-ng.patch
-ext2-ialloc-no-lock_super-ng.patch
-init-sections-in-kallsyms.patch
Merged
+devclass-oops-workaround.patch
Work around a kobject oops-on-boot
+ipip_err-compile-fix.patch
Fix a build problem
p4-oprofile-fix.patch
This works now.
+genrtc-jiffies-fix.patch
Compile warning fix (bugfix on 64-bit machines)
+export-kernel_fpu_begin.patch
Fix modular RAID build
+tasklist_lock-dcache_lock-inversion-fix.patch
Fix a deadlock
+vsyscall-unwinding.patch
Stack unwinding code for vsyscalls
+mce-workqueue-startup-fix.patch
Should fix a startup oops
+1394-compile-fix.patch
Fix 1394 build
+nfs-resource-management.patch
Fix NFS VM conniptions.
+fremap-all-mappings.patch
Bring this back: it prefaults executable mmaps and speeds up
application launching.
+lockmeter-fixes.patch
Fix CONFIG_PREEMPT
+fbdev.patch
Framebuffer update
All 94 patches
linus.patch
mm.patch
add -mmN to EXTRAVERSION
kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)
kgdb-ga-warning-fix.patch
-mm traps.c warning
kgdb-ga-up-warning-fix.patch
devclass-oops-workaround.patch
work around oops in devclass_add_driver()
ipip_err-compile-fix.patch
p4-oprofile-fix.patch
Fix oprofile on hyperthreaded P4's
flush_workqueue-hang-fix.patch
flush_work_queue() fixes
tty-shutdown-race-fix.patch
fix tty shutdown race
ppa-null-pointer-fix.patch
dmfe-kfree_skb-fix.patch
dmfe: don't free skb with local interrupts disabled
config_spinline.patch
uninline spinlocks for profiling accuracy.
ppc64-reloc_hide.patch
ppc64-pci-patch.patch
Subject: pci patch
ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio
ppc64-scruffiness.patch
Fix some PPC64 compile warnings
ppc64-update.patch
ppc64 update
ppc64-update-fixes.patch
sym-do-160.patch
make the SYM driver do 160 MB/sec
config-PAGE_OFFSET.patch
Configurable kenrel/user memory split
buffer-debug.patch
buffer.c debugging
genrtc-jiffies-fix.patch
genrtc: jiffies type fix
export-kernel_fpu_begin.patch
export kernel_fpu_begin() to modules
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
reiserfs_file_write-5.patch
sched_idle-typo-fix.patch
fix sched_idle typo
rcu-stats.patch
RCU statistics reporting
ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3
nfs-speedup.patch
nfs-oom-fix.patch
nfs oom fix
sk-allocation.patch
Subject: Re: nfs oom
nfs-more-oom-fix.patch
rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations
linux-isp.patch
isp-update-1.patch
posix-timer-hang-fix-2.patch
Posix timer hang fix
tasklist_lock-dcache_lock-inversion-fix.patch
Fix deadlock between tasklist_lock and dcache_lock
vsyscall-unwinding.patch
Subject: unwinding for vsyscall code
mce-workqueue-startup-fix.patch
fix MCE startup ordering problems
setserial-fix.patch
Subject: [PATCH 2.5] Minor fix for driver/serial/core.c
1394-compile-fix.patch
Fix iee1394 nodemgr.c compile
nfs-resource-management.patch
Subject: Resource management for NFS...
kblockd.patch
Create `kblockd' workqueue
as-iosched.patch
anticipatory I/O scheduler
as-np-reads-1.patch
AS: read-vs-read fixes
as-np-reads-2.patch
AS: more read-vs-read fixes
as-predict-data-direction.patch
as: predict direction of next IO
as-remove-frontmerge.patch
AS: remove frontmerge tunable
as-misc-cleanups.patch
AS: misc cleanups
as-minor-tweaks.patch
AS: tuning and tweaks
as-remove-stats.patch
AS: remove statistics
as-locking-fix.patch
AS: Fix minor race
as-disable-thinktime.patch
as-use-queue_empty.patch
AS: Use the queue_empty API
cfq-2.patch
CFQ scheduler, #2
unplug-use-kblockd.patch
Use kblockd for running request queues
objrmap.patch
object-based rmap
objrmap-sort-vma-list.patch
objrmap: optimise per-mapping vma searches
objrmap-vma-sorting-fix.patch
fix obj vma sorting
fremap-all-mappings.patch
Make all executable mappings be nonlinear
32bit-dev_t-nfs-export-fix.patch
Fix nfsd exports with big dev_t
sched-2.5.64-D3.patch
sched-2.5.64-D3, more interactivity changes
scheduler-tunables.patch
scheduler tunables
show_task-free-stack-fix.patch
show_task() fix and cleanup
yellowfin-set_bit-fix.patch
yellowfin driver set_bit fix
htree-nfs-fix.patch
Fix ext3 htree / NFS compatibility problems
task_prio-fix.patch
simple task_prio() fix
i8042-share-irqs.patch
allow i8042 interrupt sharing
gfp_repeat.patch
implement __GFP_REPEAT
alloc_buffer_head-take-gfp.patch
make alloc_buffer_head take gfp_flags
pte_alloc_one-use-gfp_repeat.patch
use __GFP_REPEAT in pte_alloc_one()
pmd_alloc_one-use-gfp_repeat.patch
use __GFP_REPEAT in pmd_alloc_one()
dynamic-hd_struct-allocation.patch
Allocate hd_structs dynamically
dynamic-hd_struct-devfs-fix.patch
Fix dynamic hd_struct allocation for devfs
slab_store_user-large-objects.patch
slab debug: perform redzoning against larger objects
htree-nfs-fix-2.patch
htree nfs fix
put_task_struct-debug.patch
dev_t-32-bit.patch
[for playing only] change type of dev_t
dev_t-remove-B_FREE.patch
dev_t: eliminate B_FREE
sg-dev_t-fix.patch
32-bit dev_t fix for sg
xfs-dev_t-warning-fix.patch
xfs dev_t printk warning fix
aggregated-disk-stats.patch
Aggregated disk statistics
oops-dump-preceding-code.patch
i386 oops output: dump preceding code
usb-disconnect-crash-fix.patch
Subject: Re: [linux-usb-devel] timer hang with current 2.5 BK
lockmeter.patch
lockmeter-fixes.patch
ext3-no-bkl.patch
journal_dirty_metadata-speedup.patch
journal_get_write_access-speedup.patch
ext3-concurrent-block-inode-allocation.patch
Subject: [PATCH] concurrent block/inode allocation for EXT3
ext3-concurrent-block-allocation-fix-1.patch
ext3-concurrent-block-allocation-hashed.patch
Subject: Re: [PATCH] concurrent block/inode allocation for EXT3
fbdev.patch
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: 2.5.67-mm3 2003-04-14 8:53 2.5.67-mm3 Andrew Morton @ 2003-04-14 11:03 ` Bill Huey 2003-04-14 15:13 ` 2.5.67-mm3 Rudmer van Dijk 2003-04-15 2:00 ` 2.5.67-mm3 William Lee Irwin III 2003-04-16 2:21 ` 2.5.67-mm3 William Lee Irwin III 2 siblings, 1 reply; 21+ messages in thread From: Bill Huey @ 2003-04-14 11:03 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, Bill Huey (Hui) On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote: > A bunch of new fixes, and a framebuffer update. This should work a bit > better than -mm2. make -f scripts/Makefile.build obj=arch/i386/boot arch/i386/boot/bzImage ld -m elf_i386 -Ttext 0x0 -s --oformat binary -e begtext arch/i386/boot/setup.o -o arch/i386/boot/setup arch/i386/boot/setup.o(.text+0x9a4): In function `video': /tmp/ccyhvWWu.s:2925: undefined reference to `store_edid' make[1]: *** [arch/i386/boot/setup] Error 1 make: *** [bzImage] Error 2 --------------------------------------- Not sure what's triggering this here. bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey @ 2003-04-14 15:13 ` Rudmer van Dijk 2003-04-15 1:03 ` 2.5.67-mm3 Bill Huey 0 siblings, 1 reply; 21+ messages in thread From: Rudmer van Dijk @ 2003-04-14 15:13 UTC (permalink / raw) To: Bill Huey (Hui), Andrew Morton; +Cc: linux-kernel, linux-mm On Monday 14 April 2003 13:03, Bill Huey (Hui) wrote: > On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote: > > A bunch of new fixes, and a framebuffer update. This should work a bit > > better than -mm2. > > make -f scripts/Makefile.build obj=arch/i386/boot arch/i386/boot/bzImage > ld -m elf_i386 -Ttext 0x0 -s --oformat binary -e begtext > arch/i386/boot/setup.o -o arch/i386/boot/setup > arch/i386/boot/setup.o(.text+0x9a4): In function `video': > /tmp/ccyhvWWu.s:2925: undefined reference to `store_edid' > make[1]: *** [arch/i386/boot/setup] Error 1 > make: *** [bzImage] Error 2 > > --------------------------------------- got this also. store_edid is only used when CONFIG_VIDEO_SELECT is set but the call to it is outside the #ifdef... this patch fixes it. Maybe it is better to move the call to store_edid up inside the already avilable #ifdef but I'm not sure if that is possible Rudmer --- linux-2.5.67-mm3/arch/i386/boot/video.S.orig 2003-04-14 17:07:24.000000000 +0200 +++ linux-2.5.67-mm3/arch/i386/boot/video.S 2003-04-14 17:03:08.000000000 +0200 @@ -135,7 +135,9 @@ #endif /* CONFIG_VIDEO_RETAIN */ #endif /* CONFIG_VIDEO_SELECT */ call mode_params # Store mode parameters +#ifdef CONFIG_VIDEO_SELECT call store_edid +#endif /* CONFIG_VIDEO_SELECT */ popw %ds # Restore original DS ret -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-14 15:13 ` 2.5.67-mm3 Rudmer van Dijk @ 2003-04-15 1:03 ` Bill Huey 2003-04-15 1:13 ` 2.5.67-mm3 Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: Bill Huey @ 2003-04-15 1:03 UTC (permalink / raw) To: Rudmer van Dijk; +Cc: Andrew Morton, linux-kernel, linux-mm, Bill Huey (Hui) On Mon, Apr 14, 2003 at 05:13:05PM +0200, Rudmer van Dijk wrote: > this patch fixes it. Maybe it is better to move the call to store_edid up > inside the already avilable #ifdef but I'm not sure if that is possible Now I'm getting console warning "anticipatory scheduler" at boot time and then having it freeze after mounting root read-only. bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 1:03 ` 2.5.67-mm3 Bill Huey @ 2003-04-15 1:13 ` Andrew Morton 2003-04-15 1:34 ` 2.5.67-mm3 Bill Huey 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2003-04-15 1:13 UTC (permalink / raw) To: Bill Huey (Hui); +Cc: rudmer, linux-kernel, linux-mm Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: > > On Mon, Apr 14, 2003 at 05:13:05PM +0200, Rudmer van Dijk wrote: > > this patch fixes it. Maybe it is better to move the call to store_edid up > > inside the already avilable #ifdef but I'm not sure if that is possible > > Now I'm getting console warning "anticipatory scheduler" at boot time > and then having it freeze after mounting root read-only. > Could be anything. Does sysrq not work? If not, please send me your .config. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 1:13 ` 2.5.67-mm3 Andrew Morton @ 2003-04-15 1:34 ` Bill Huey 2003-04-15 9:38 ` 2.5.67-mm3 Rudmer van Dijk 0 siblings, 1 reply; 21+ messages in thread From: Bill Huey @ 2003-04-15 1:34 UTC (permalink / raw) To: Andrew Morton; +Cc: rudmer, linux-kernel, linux-mm, Bill Huey (Hui) On Mon, Apr 14, 2003 at 06:13:02PM -0700, Andrew Morton wrote: > Could be anything. Does sysrq not work? > > If not, please send me your .config. It does it with elevator=deadline too. I'll see if I can get you better dump. bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 1:34 ` 2.5.67-mm3 Bill Huey @ 2003-04-15 9:38 ` Rudmer van Dijk 0 siblings, 0 replies; 21+ messages in thread From: Rudmer van Dijk @ 2003-04-15 9:38 UTC (permalink / raw) To: Bill Huey (Hui), Andrew Morton; +Cc: linux-kernel, linux-mm On Tuesday 15 April 2003 03:34, Bill Huey (Hui) wrote: > On Mon, Apr 14, 2003 at 06:13:02PM -0700, Andrew Morton wrote: > > Could be anything. Does sysrq not work? > > > > If not, please send me your .config. > > It does it with elevator=deadline too. I'll see if I can get you better > dump. no problems here (running with anticipatory scheduling elevator), current uptime is 18h. the only problem I have is that kmod does not seem to work, modules has to be inserted manually... probably a problem with module-init-tools. Rudmer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-14 8:53 2.5.67-mm3 Andrew Morton 2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey @ 2003-04-15 2:00 ` William Lee Irwin III 2003-04-15 4:17 ` 2.5.67-mm3 William Lee Irwin III 2003-04-16 2:21 ` 2.5.67-mm3 William Lee Irwin III 2 siblings, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 2:00 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/ > > A bunch of new fixes, and a framebuffer update. This should work a bit > better than -mm2. If one's goal is to free highmem pages, shrink_slab() is an ineffective method of recovering them, as slab pages are all ZONE_NORMAL or ZONE_DMA. Hence, this "FIXME: do not do for zone highmem". Presumably this is a question of policy, as highmem allocations may be satisfied by reaping slab pages and handing them back; but the FIXME says what we should do. diff -urpN mm3-2.5.67-1/mm/vmscan.c mm3-2.5.67-2/mm/vmscan.c --- mm3-2.5.67-1/mm/vmscan.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-2/mm/vmscan.c 2003-04-14 18:16:41.000000000 -0700 @@ -134,11 +134,9 @@ void remove_shrinker(struct shrinker *sh * If the vm encounted mapped pages on the LRU it increase the pressure on * slab to avoid swapping. * - * FIXME: do not do for zone highmem - * * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits. */ -static int shrink_slab(long scanned, unsigned int gfp_mask) +static int shrink_slab(long scanned, unsigned int gfp_mask) { struct shrinker *shrinker; long pages; @@ -835,7 +833,8 @@ try_to_free_pages(struct zone *classzone /* Take a nap, wait for some writeback to complete */ blk_congestion_wait(WRITE, HZ/10); - shrink_slab(total_scanned, gfp_mask); + if (classzone - classzone->zone_pgdat->node_zones < ZONE_HIGHMEM) + shrink_slab(total_scanned, gfp_mask); } if (gfp_mask & __GFP_FS) out_of_memory(); @@ -895,7 +894,8 @@ static int balance_pgdat(pg_data_t *pgda max_scan = SWAP_CLUSTER_MAX; to_free -= shrink_zone(zone, max_scan, GFP_KERNEL, to_reclaim, &nr_mapped, ps, priority); - shrink_slab(max_scan + nr_mapped, GFP_KERNEL); + if (i < ZONE_HIGHMEM) + shrink_slab(max_scan + nr_mapped, GFP_KERNEL); if (zone->all_unreclaimable) continue; if (zone->pages_scanned > zone->present_pages * 2) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 2:00 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 4:17 ` William Lee Irwin III 2003-04-15 4:31 ` 2.5.67-mm3 Andrew Morton 2003-04-15 5:52 ` 2.5.67-mm3 Antonio Vargas 0 siblings, 2 replies; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 4:17 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm On Mon, Apr 14, 2003 at 07:00:57PM -0700, William Lee Irwin III wrote: > Hence, this "FIXME: do not do for zone highmem". Presumably this is a Another FIXME patch: It's a bit of an open question as to how much of a difference this one makes now, but it says "FIXME". fault_in_pages_writeable() and fault_in_pages_readable() have a limited "range" with respect to the size of the region they can prefault; as they are now, they are only meant to handle spanning a page boundary. This converts them to iterate over the virtual address range specified and so touch each virtual page within it once as specified. As per the comment within the "FIXME", this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE. diff -urpN mm3-2.5.67-3/include/linux/pagemap.h mm3-2.5.67-4/include/linux/pagemap.h --- mm3-2.5.67-3/include/linux/pagemap.h 2003-04-07 10:30:34.000000000 -0700 +++ mm3-2.5.67-4/include/linux/pagemap.h 2003-04-14 19:10:17.000000000 -0700 @@ -168,46 +168,32 @@ extern void end_page_writeback(struct pa /* * Fault a userspace page into pagetables. Return non-zero on a fault. - * - * FIXME: this assumes that two userspace pages are always sufficient. That's - * not true if PAGE_CACHE_SIZE > PAGE_SIZE. + * Both walk with stride PAGE_SIZE touching the start of the affected + * pages until faulting or at the start of a page beyond size from uaddr. */ static inline int fault_in_pages_writeable(char *uaddr, int size) { - int ret; + int ret = 0; + unsigned long addr = (unsigned long)uaddr & PAGE_MASK; /* * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - ret = __put_user(0, uaddr); - if (ret == 0) { - char *end = uaddr + size - 1; - - /* - * If the page was already mapped, this will get a cache miss - * for sure, so try to avoid doing it. - */ - if (((unsigned long)uaddr & PAGE_MASK) != - ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); - } + for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE) + ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr)); + return ret; } static inline void fault_in_pages_readable(const char *uaddr, int size) { volatile char c; - int ret; + int ret = 0; + unsigned long addr = (unsigned long)uaddr & PAGE_MASK; - ret = __get_user(c, (char *)uaddr); - if (ret == 0) { - const char *end = uaddr + size - 1; - - if (((unsigned long)uaddr & PAGE_MASK) != - ((unsigned long)end & PAGE_MASK)) - __get_user(c, (char *)end); - } + for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE) + ret = __get_user(c, (char *)addr); } #endif /* _LINUX_PAGEMAP_H */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:17 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 4:31 ` Andrew Morton 2003-04-15 4:39 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 5:52 ` 2.5.67-mm3 Antonio Vargas 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2003-04-15 4:31 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel, linux-mm William Lee Irwin III <wli@holomorphy.com> wrote: > > + for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE) > + ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr)); This hurts my brain. If anything, it should be formulated as a do-while loop. But I'm not sure we should really bother, because relatively large amounts of stuff is broken for PAGE_SIZE != PAGE_CACHE_SIZE anyway. tmpfs comes to mind... If page clustering needs to redo this code (and I assume it does) then that would be an argument in favour. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:31 ` 2.5.67-mm3 Andrew Morton @ 2003-04-15 4:39 ` William Lee Irwin III 2003-04-15 4:55 ` 2.5.67-mm3 Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 4:39 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm William Lee Irwin III <wli@holomorphy.com> wrote: >> + for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE) >> + ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr)); On Mon, Apr 14, 2003 at 09:31:14PM -0700, Andrew Morton wrote: > This hurts my brain. If anything, it should be formulated as a do-while loop. > But I'm not sure we should really bother, because relatively large amounts of > stuff is broken for PAGE_SIZE != PAGE_CACHE_SIZE anyway. tmpfs comes to > mind... > If page clustering needs to redo this code (and I assume it does) then that > would be an argument in favour. Page clustering wants something similar but slightly different. The unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't really help or hurt it. I believe I actually dodged this bullet by ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE and left it either unaltered and suboptimal or (worst-case) buggy. I'm just going down the list of FIXME's in the VM I turned up by grepping. Should we do the following instead? -- wli diff -urpN mm3-2.5.67-2/include/linux/pagemap.h mm3-2.5.67-2A/include/linux/pagemap.h --- mm3-2.5.67-2/include/linux/pagemap.h 2003-04-07 10:30:34.000000000 -0700 +++ mm3-2.5.67-2A/include/linux/pagemap.h 2003-04-14 21:24:52.000000000 -0700 @@ -169,7 +169,7 @@ extern void end_page_writeback(struct pa /* * Fault a userspace page into pagetables. Return non-zero on a fault. * - * FIXME: this assumes that two userspace pages are always sufficient. That's + * This assumes that two userspace pages are always sufficient. That's * not true if PAGE_CACHE_SIZE > PAGE_SIZE. */ static inline int fault_in_pages_writeable(char *uaddr, int size) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:39 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 4:55 ` Andrew Morton 2003-04-15 5:15 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 6:09 ` 2.5.67-mm3 William Lee Irwin III 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2003-04-15 4:55 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel, linux-mm William Lee Irwin III <wli@holomorphy.com> wrote: > > Page clustering wants something similar but slightly different. The > unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't > really help or hurt it. I believe I actually dodged this bullet by > ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE > and left it either unaltered and suboptimal or (worst-case) buggy. Callers will use sizes between 1 and PAGE_CACHE_SIZE, with arbitrary alignment. So you may need to fault in up to (PAGE_CACHE_SIZE / MMUPAGE_SIZE) + 1 pte's. And up to two PAGE_CACHE_SIZE pages. Sort-of. The code is doing two things. a) Make sure that all the relevant pte's are established in the correct state so we don't take a fault while holding the subsequent atomic kmap. This is just an optimisation. If we _do_ take the fault while holding an atomic kmap, we fall back to sleeping kmap, and do the whole copy again. It almost never happens. b) Making sure that the pagecache page is present before we lock it. This is to handle the icky deadlock which occurs when someone is doing a write() into a MAP_SHARED region of the file, where the source and dest of the copy are the same physical page. If we take a fault and then try to bring the page uptodate in the fault handler we deadlock because the page is already locked. The fault-by-hand-before-locking-the-page is racy - if the VM steals the page again before we lock it (rare), the deadlock can still occur. I've been able to trigger the fault which causes fallback to kmap() occasionally, under heavy load. But never the deadlock. We don't know how to fix this for real. I had patch for a while which added current->locked_page, and filemap_nopage() would compare that with the to-be-locked page and say "ah-hah!" and take avoiding action. But then Hugh rudely pointed out that the deadlock was still present if two tasks were involved, each trying to fault in the other's locked page. > I'm just going down the list of FIXME's in the VM I turned up by grepping. > Should we do the following instead? OK ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:55 ` 2.5.67-mm3 Andrew Morton @ 2003-04-15 5:15 ` William Lee Irwin III 2003-04-15 5:35 ` 2.5.67-mm3 Andrew Morton 2003-04-15 6:09 ` 2.5.67-mm3 William Lee Irwin III 1 sibling, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 5:15 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm William Lee Irwin III <wli@holomorphy.com> wrote: > > Page clustering wants something similar but slightly different. The > > unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't > > really help or hurt it. I believe I actually dodged this bullet by > > ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE > > and left it either unaltered and suboptimal or (worst-case) buggy. On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote: > Callers will use sizes between 1 and PAGE_CACHE_SIZE, with arbitrary > alignment. So you may need to fault in up to > (PAGE_CACHE_SIZE / MMUPAGE_SIZE) + 1 > pte's. And up to two PAGE_CACHE_SIZE pages. I checked on what pgcl was doing, and it's buggy. Thanks, I'll fix it up and keep it rolling for when the time is right to send pieces in (if ever for 2.6). On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote: > Sort-of. The code is doing two things. > a) Make sure that all the relevant pte's are established in the correct > state so we don't take a fault while holding the subsequent atomic kmap. > This is just an optimisation. If we _do_ take the fault while holding > an atomic kmap, we fall back to sleeping kmap, and do the whole copy > again. It almost never happens. This is the easy part; we're basically just prefaulting. On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote: > b) Making sure that the pagecache page is present before we lock it. This > is to handle the icky deadlock which occurs when someone is doing a > write() into a MAP_SHARED region of the file, where the source and dest of > the copy are the same physical page. If we take a fault and then try to > bring the page uptodate in the fault handler we deadlock because the page > is already locked. > The fault-by-hand-before-locking-the-page is racy - if the VM steals > the page again before we lock it (rare), the deadlock can still occur. > I've been able to trigger the fault which causes fallback to kmap() > occasionally, under heavy load. But never the deadlock. > We don't know how to fix this for real. I had patch for a while which > added current->locked_page, and filemap_nopage() would compare that with > the to-be-locked page and say "ah-hah!" and take avoiding action. > But then Hugh rudely pointed out that the deadlock was still present if > two tasks were involved, each trying to fault in the other's locked page. This sounds more serious. My first thought is address-ordering the locking, but it's not obvious how to do that with the current control flow structure (and it sounds oversimplified). On my list of things to think about. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 5:15 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 5:35 ` Andrew Morton 0 siblings, 0 replies; 21+ messages in thread From: Andrew Morton @ 2003-04-15 5:35 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel, linux-mm William Lee Irwin III <wli@holomorphy.com> wrote: > > On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote: > > Sort-of. The code is doing two things. > > a) Make sure that all the relevant pte's are established in the correct > > state so we don't take a fault while holding the subsequent atomic kmap. > > This is just an optimisation. If we _do_ take the fault while holding > > an atomic kmap, we fall back to sleeping kmap, and do the whole copy > > again. It almost never happens. > > This is the easy part; we're basically just prefaulting. btw, this may sound like a lot of futzing about, but the other day I timed four concurrent instances of dd if=/dev/zero of=$i bs=1 count=1M on the four-way. 2.5 ran eight times faster than 2.4. 2.4's kmap_lock contention was astonishing. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:55 ` 2.5.67-mm3 Andrew Morton 2003-04-15 5:15 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 6:09 ` William Lee Irwin III 2003-04-15 6:10 ` 2.5.67-mm3 William Lee Irwin III 1 sibling, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 6:09 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm william Lee Irwin III <wli@holomorphy.com> wrote: >> I'm just going down the list of FIXME's in the VM I turned up by grepping. >> Should we do the following instead? On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote: > OK ;) Okay, these don't get us all the way there, but at least it gets us closer: there are "FIXME" things associated with intrusions of the old buffer_cache (as opposed to the new buffer cache a.k.a. pagecache) into the core VM. The first is simply: Move __set_page_dirty_buffers() to fs/buffer.c, as per the FIXME. diff -urpN mm3-2.5.67-1/fs/buffer.c mm3-2.5.67-2B/fs/buffer.c --- mm3-2.5.67-1/fs/buffer.c 2003-04-14 18:08:14.000000000 -0700 +++ mm3-2.5.67-2B/fs/buffer.c 2003-04-14 22:21:19.000000000 -0700 @@ -779,6 +779,85 @@ void mark_buffer_dirty_inode(struct buff EXPORT_SYMBOL(mark_buffer_dirty_inode); /* + * Add a page to the dirty page list. + * + * It is a sad fact of life that this function is called from several places + * deeply under spinlocking. It may not sleep. + * + * If the page has buffers, the uptodate buffers are set dirty, to preserve + * dirty-state coherency between the page and the buffers. It the page does + * not have buffers then when they are later attached they will all be set + * dirty. + * + * The buffers are dirtied before the page is dirtied. There's a small race + * window in which a writepage caller may see the page cleanness but not the + * buffer dirtiness. That's fine. If this code were to set the page dirty + * before the buffers, a concurrent writepage caller could clear the page dirty + * bit, see a bunch of clean buffers and we'd end up with dirty buffers/clean + * page on the dirty page list. + * + * There is also a small window where the page is dirty, and not on dirty_pages. + * Also a possibility that by the time the page is added to dirty_pages, it has + * been set clean. The page lists are somewhat approximate in this regard. + * It's better to have clean pages accidentally attached to dirty_pages than to + * leave dirty pages attached to clean_pages. + * + * We use private_lock to lock against try_to_free_buffers while using the + * page's buffer list. Also use this to protect against clean buffers being + * added to the page after it was set dirty. + * + * FIXME: may need to call ->reservepage here as well. That's rather up to the + * address_space though. + * + * For now, we treat swapper_space specially. It doesn't use the normal + * block a_ops. + */ +int __set_page_dirty_buffers(struct page *page) +{ + struct address_space * const mapping = page->mapping; + int ret = 0; + + if (mapping == NULL) { + SetPageDirty(page); + goto out; + } + + if (!PageUptodate(page)) + buffer_error(); + + spin_lock(&mapping->private_lock); + if (page_has_buffers(page)) { + struct buffer_head *head = page_buffers(page); + struct buffer_head *bh = head; + + do { + if (buffer_uptodate(bh)) + set_buffer_dirty(bh); + else + buffer_error(); + bh = bh->b_this_page; + } while (bh != head); + } + spin_unlock(&mapping->private_lock); + + if (!TestSetPageDirty(page)) { + spin_lock(&mapping->page_lock); + if (page->mapping) { /* Race with truncate? */ + if (!mapping->backing_dev_info->memory_backed) + inc_page_state(nr_dirty); + list_del(&page->list); + list_add(&page->list, &mapping->dirty_pages); + } + spin_unlock(&mapping->page_lock); + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + } + +out: + return ret; +} +EXPORT_SYMBOL(__set_page_dirty_buffers); + +/* * Write out and wait upon a list of buffers. * * We have conflicting pressures: we want to make sure that all diff -urpN mm3-2.5.67-1/mm/page-writeback.c mm3-2.5.67-2B/mm/page-writeback.c --- mm3-2.5.67-1/mm/page-writeback.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-2B/mm/page-writeback.c 2003-04-14 22:20:23.000000000 -0700 @@ -462,88 +462,6 @@ int write_one_page(struct page *page, in EXPORT_SYMBOL(write_one_page); /* - * Add a page to the dirty page list. - * - * It is a sad fact of life that this function is called from several places - * deeply under spinlocking. It may not sleep. - * - * If the page has buffers, the uptodate buffers are set dirty, to preserve - * dirty-state coherency between the page and the buffers. It the page does - * not have buffers then when they are later attached they will all be set - * dirty. - * - * The buffers are dirtied before the page is dirtied. There's a small race - * window in which a writepage caller may see the page cleanness but not the - * buffer dirtiness. That's fine. If this code were to set the page dirty - * before the buffers, a concurrent writepage caller could clear the page dirty - * bit, see a bunch of clean buffers and we'd end up with dirty buffers/clean - * page on the dirty page list. - * - * There is also a small window where the page is dirty, and not on dirty_pages. - * Also a possibility that by the time the page is added to dirty_pages, it has - * been set clean. The page lists are somewhat approximate in this regard. - * It's better to have clean pages accidentally attached to dirty_pages than to - * leave dirty pages attached to clean_pages. - * - * We use private_lock to lock against try_to_free_buffers while using the - * page's buffer list. Also use this to protect against clean buffers being - * added to the page after it was set dirty. - * - * FIXME: may need to call ->reservepage here as well. That's rather up to the - * address_space though. - * - * For now, we treat swapper_space specially. It doesn't use the normal - * block a_ops. - * - * FIXME: this should move over to fs/buffer.c - buffer_heads have no business in mm/ - */ -#include <linux/buffer_head.h> -int __set_page_dirty_buffers(struct page *page) -{ - struct address_space * const mapping = page->mapping; - int ret = 0; - - if (mapping == NULL) { - SetPageDirty(page); - goto out; - } - - if (!PageUptodate(page)) - buffer_error(); - - spin_lock(&mapping->private_lock); - if (page_has_buffers(page)) { - struct buffer_head *head = page_buffers(page); - struct buffer_head *bh = head; - - do { - if (buffer_uptodate(bh)) - set_buffer_dirty(bh); - else - buffer_error(); - bh = bh->b_this_page; - } while (bh != head); - } - spin_unlock(&mapping->private_lock); - - if (!TestSetPageDirty(page)) { - spin_lock(&mapping->page_lock); - if (page->mapping) { /* Race with truncate? */ - if (!mapping->backing_dev_info->memory_backed) - inc_page_state(nr_dirty); - list_del(&page->list); - list_add(&page->list, &mapping->dirty_pages); - } - spin_unlock(&mapping->page_lock); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - } - -out: - return ret; -} -EXPORT_SYMBOL(__set_page_dirty_buffers); - -/* * For address_spaces which do not use buffers. Just set the page's dirty bit * and move it to the dirty_pages list. Also perform space reservation if * required. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 6:09 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 6:10 ` William Lee Irwin III 0 siblings, 0 replies; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 6:10 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm On Mon, Apr 14, 2003 at 11:09:07PM -0700, William Lee Irwin III wrote: > Okay, these don't get us all the way there, but at least it gets us > closer: there are "FIXME" things associated with intrusions of the > old buffer_cache (as opposed to the new buffer cache a.k.a. pagecache) > into the core VM. > The first is simply: > Move __set_page_dirty_buffers() to fs/buffer.c, as per the FIXME. On top of the __set_page_dirty_buffers() patch: Remove page_has_buffers() from various functions, document the dependencies on buffer_head.h from other files besides filemap.c, and s/this file/core VM/ in filemap.c diff -urpN mm3-2.5.67-2B/mm/filemap.c mm3-2.5.67-3B/mm/filemap.c --- mm3-2.5.67-2B/mm/filemap.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-3B/mm/filemap.c 2003-04-14 22:43:08.000000000 -0700 @@ -31,12 +31,11 @@ * This is needed for the following functions: * - try_to_release_page * - block_invalidatepage - * - page_has_buffers * - generic_osync_inode * - * FIXME: remove all knowledge of the buffer layer from this file + * FIXME: remove all knowledge of the buffer layer from the core VM */ -#include <linux/buffer_head.h> +#include <linux/buffer_head.h> /* for generic_osync_inode */ #include <asm/uaccess.h> #include <asm/mman.h> diff -urpN mm3-2.5.67-2B/mm/swap.c mm3-2.5.67-3B/mm/swap.c --- mm3-2.5.67-2B/mm/swap.c 2003-04-07 10:31:05.000000000 -0700 +++ mm3-2.5.67-3B/mm/swap.c 2003-04-14 22:43:49.000000000 -0700 @@ -21,7 +21,7 @@ #include <linux/pagevec.h> #include <linux/init.h> #include <linux/mm_inline.h> -#include <linux/buffer_head.h> +#include <linux/buffer_head.h> /* for try_to_release_page() */ #include <linux/percpu.h> /* How many pages do we try to swap or page in/out together? */ diff -urpN mm3-2.5.67-2B/mm/swap_state.c mm3-2.5.67-3B/mm/swap_state.c --- mm3-2.5.67-2B/mm/swap_state.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-3B/mm/swap_state.c 2003-04-14 22:28:20.000000000 -0700 @@ -13,7 +13,6 @@ #include <linux/init.h> #include <linux/pagemap.h> #include <linux/backing-dev.h> -#include <linux/buffer_head.h> /* block_sync_page() */ #include <asm/pgtable.h> @@ -187,7 +186,7 @@ void delete_from_swap_cache(struct page BUG_ON(!PageLocked(page)); BUG_ON(PageWriteback(page)); - BUG_ON(page_has_buffers(page)); + BUG_ON(PagePrivate(page)); entry.val = page->index; @@ -236,7 +235,7 @@ int move_from_swap_cache(struct page *pa BUG_ON(!PageLocked(page)); BUG_ON(PageWriteback(page)); - BUG_ON(page_has_buffers(page)); + BUG_ON(PagePrivate(page)); entry.val = page->index; diff -urpN mm3-2.5.67-2B/mm/swapfile.c mm3-2.5.67-3B/mm/swapfile.c --- mm3-2.5.67-2B/mm/swapfile.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-3B/mm/swapfile.c 2003-04-14 22:27:57.000000000 -0700 @@ -15,7 +15,6 @@ #include <linux/namei.h> #include <linux/shm.h> #include <linux/blkdev.h> -#include <linux/buffer_head.h> #include <linux/writeback.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> @@ -300,7 +299,7 @@ int remove_exclusive_swap_page(struct pa struct swap_info_struct * p; swp_entry_t entry; - BUG_ON(page_has_buffers(page)); + BUG_ON(PagePrivate(page)); BUG_ON(!PageLocked(page)); if (!PageSwapCache(page)) @@ -355,7 +354,7 @@ void free_swap_and_cache(swp_entry_t ent if (page) { int one_user; - BUG_ON(page_has_buffers(page)); + BUG_ON(PagePrivate(page)); page_cache_get(page); one_user = (page_count(page) == 2); /* Only cache user (+us), or swap space full? Free it! */ diff -urpN mm3-2.5.67-2B/mm/vmscan.c mm3-2.5.67-3B/mm/vmscan.c --- mm3-2.5.67-2B/mm/vmscan.c 2003-04-14 18:08:15.000000000 -0700 +++ mm3-2.5.67-3B/mm/vmscan.c 2003-04-14 22:45:19.000000000 -0700 @@ -22,7 +22,8 @@ #include <linux/writeback.h> #include <linux/suspend.h> #include <linux/blkdev.h> -#include <linux/buffer_head.h> /* for try_to_release_page() */ +#include <linux/buffer_head.h> /* for try_to_release_page(), + buffer_heads_over_limit */ #include <linux/mm_inline.h> #include <linux/pagevec.h> #include <linux/backing-dev.h> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 4:17 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 4:31 ` 2.5.67-mm3 Andrew Morton @ 2003-04-15 5:52 ` Antonio Vargas 2003-04-15 5:52 ` 2.5.67-mm3 William Lee Irwin III 1 sibling, 1 reply; 21+ messages in thread From: Antonio Vargas @ 2003-04-15 5:52 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, linux-kernel, linux-mm On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote: > On Mon, Apr 14, 2003 at 07:00:57PM -0700, William Lee Irwin III wrote: > > Hence, this "FIXME: do not do for zone highmem". Presumably this is a > > Another FIXME patch: > > > It's a bit of an open question as to how much of a difference this one > makes now, but it says "FIXME". fault_in_pages_writeable() and > fault_in_pages_readable() have a limited "range" with respect to the > size of the region they can prefault; as they are now, they are only > meant to handle spanning a page boundary. This converts them to iterate > over the virtual address range specified and so touch each virtual page > within it once as specified. As per the comment within the "FIXME", > this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE. > > [patch snip] Page clustering? I did a simple patch yesterday called "cow-ahead", which may be related: on a write to a COW page, it breaks the COW from several pages at the same time. The implementation survived a complete debian 2.2 boot and a fork bomb. Please have a look. The idea came from a discussion with Martin J. Bligh... we liked the name too much not to implement it. Greets, Antonio. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 5:52 ` 2.5.67-mm3 Antonio Vargas @ 2003-04-15 5:52 ` William Lee Irwin III 2003-04-15 15:09 ` 2.5.67-mm3 Antonio Vargas 0 siblings, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-15 5:52 UTC (permalink / raw) To: Antonio Vargas; +Cc: Andrew Morton, linux-kernel, linux-mm On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote: >> It's a bit of an open question as to how much of a difference this one >> makes now, but it says "FIXME". fault_in_pages_writeable() and >> fault_in_pages_readable() have a limited "range" with respect to the >> size of the region they can prefault; as they are now, they are only >> meant to handle spanning a page boundary. This converts them to iterate >> over the virtual address range specified and so touch each virtual page >> within it once as specified. As per the comment within the "FIXME", >> this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE. >> [patch snip] On Tue, Apr 15, 2003 at 07:52:29AM +0200, Antonio Vargas wrote: > Page clustering? I did a simple patch yesterday called "cow-ahead", which > may be related: on a write to a COW page, it breaks the COW from several pages > at the same time. The implementation survived a complete debian 2.2 boot > and a fork bomb. Please have a look. The idea came from a discussion with > Martin J. Bligh... we liked the name too much not to implement it. I apologize if the name is deceiving, but it's conventional. I saw your patch and it could very well be valuable, but it would be called "prefaulting" or "faultahead". Page clustering is divorcing the TLB mapping unit from the kernel's internal allocation unit, specifically, enlarging the kernel's allocation unit for reductions in the size of certain data structures (for PAE, the most important of these is the mem_map[] array but the pagecache radix trees also see good reductions), and for physical contiguity benefits in things like io as they are applicable (it is not applicable to workloads with many small files or for workloads with predominantly small io sizes). The article on kerneltrap.org on the subject should have more pointers to explanatory posts etc. to get a better idea of what's going on. Also important is to properly credit Hugh Dickins with the original 2.4 implementation of page clustering, which for optimality and correctness and cleanliness is superior to the current state of my own for 2.5, and is the source base from which my implementation is derived. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-15 5:52 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-15 15:09 ` Antonio Vargas 0 siblings, 0 replies; 21+ messages in thread From: Antonio Vargas @ 2003-04-15 15:09 UTC (permalink / raw) To: William Lee Irwin III, Antonio Vargas, Andrew Morton, linux-kernel, linux-mm, mbligh On Mon, Apr 14, 2003 at 10:52:56PM -0700, William Lee Irwin III wrote: > On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote: > >> It's a bit of an open question as to how much of a difference this one > >> makes now, but it says "FIXME". fault_in_pages_writeable() and > >> fault_in_pages_readable() have a limited "range" with respect to the > >> size of the region they can prefault; as they are now, they are only > >> meant to handle spanning a page boundary. This converts them to iterate > >> over the virtual address range specified and so touch each virtual page > >> within it once as specified. As per the comment within the "FIXME", > >> this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE. > >> [patch snip] > > On Tue, Apr 15, 2003 at 07:52:29AM +0200, Antonio Vargas wrote: > > Page clustering? I did a simple patch yesterday called "cow-ahead", which > > may be related: on a write to a COW page, it breaks the COW from several pages > > at the same time. The implementation survived a complete debian 2.2 boot > > and a fork bomb. Please have a look. The idea came from a discussion with > > Martin J. Bligh... we liked the name too much not to implement it. > > I apologize if the name is deceiving, but it's conventional. I saw your > patch and it could very well be valuable, but it would be called > "prefaulting" or "faultahead". Page clustering is divorcing the TLB > mapping unit from the kernel's internal allocation unit, specifically, > enlarging the kernel's allocation unit for reductions in the size of > certain data structures (for PAE, the most important of these is the > mem_map[] array but the pagecache radix trees also see good reductions), > and for physical contiguity benefits in things like io as they are > applicable (it is not applicable to workloads with many small files or > for workloads with predominantly small io sizes). > > The article on kerneltrap.org on the subject should have more pointers > to explanatory posts etc. to get a better idea of what's going on. > > Also important is to properly credit Hugh Dickins with the original > 2.4 implementation of page clustering, which for optimality and > correctness and cleanliness is superior to the current state of my own > for 2.5, and is the source base from which my implementation is derived. > > > -- wli I mentioned page clustering instead of page-fault clustering, so it was my confusion. I recall seeing lkml posts which talked about "the pgcl patch" and wondered what it was about until I saw the kerneltrap article. Looks like a nice way to speed up the system :) My trusty m68030@50 AmigaOS machine was faster with 32kb blocksize on the fs for some tasks, dunno about Linux with the much more advanced memory and file manager with readahead and related improvements. My patch booted fine on UML, I'll try to boot a physical machine with it and try to time a kernel compile. Besides, if you look closer at the patch, you can see it's doing the "faultahead" only for cow pages, thus giving it the name "cowahead". Since taking a page fault has to be very expensive on today's machines due to deep pipelines, I wonder if we should keep a record of where each vma has last cow-faulted and dinamically adjust the cowahead window just like the file access manages his readahead window. Greets, Antonio. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-14 8:53 2.5.67-mm3 Andrew Morton 2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey 2003-04-15 2:00 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-16 2:21 ` William Lee Irwin III 2003-04-16 2:40 ` 2.5.67-mm3 William Lee Irwin III 2 siblings, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2003-04-16 2:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/ > > A bunch of new fixes, and a framebuffer update. This should work a bit > better than -mm2. follow_hugetlb_page() behaved improperly if its starting address was not hugepage-aligned. It looked a bit unclean too, so I rewrote it. This fixes a bug, and more importantly, makes the thing readable by something other than a compiler (e.g. programmers). diff -urpN linux-2.5.67-bk6/arch/i386/mm/hugetlbpage.c htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c --- linux-2.5.67-bk6/arch/i386/mm/hugetlbpage.c 2003-04-07 10:32:49.000000000 -0700 +++ htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c 2003-04-15 18:58:07.000000000 -0700 @@ -129,37 +129,45 @@ nomem: int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, - unsigned long *st, int *length, int i) + unsigned long *position, int *length, int i) { - pte_t *ptep, pte; - unsigned long start = *st; - unsigned long pstart; - int len = *length; - struct page *page; + unsigned long vpfn, vaddr = *position; + int remainder = *length; + + WARN_ON(!is_vm_hugetlb_page(vma)); - do { - pstart = start; - ptep = huge_pte_offset(mm, start); - pte = *ptep; + vpfn = vaddr/PAGE_SIZE; + while (vaddr < vma->vm_end && remainder) { -back1: - page = pte_page(pte); if (pages) { - page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); + pte_t *pte; + struct page *page; + + pte = huge_pte_offset(mm, vaddr); + + /* hugetlb should be locked, and hence, prefaulted */ + WARN_ON(!pte || pte_none(*pte)); + + page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; + + WARN_ON(!PageCompound(page)); + get_page(page); pages[i] = page; } + if (vmas) vmas[i] = vma; - i++; - len--; - start += PAGE_SIZE; - if (((start & HPAGE_MASK) == pstart) && len && - (start < vma->vm_end)) - goto back1; - } while (len && start < vma->vm_end); - *length = len; - *st = start; + + vaddr += PAGE_SIZE; + ++vpfn; + --remainder; + ++i; + } + + *length = remainder; + *position = vaddr; + return i; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.67-mm3 2003-04-16 2:21 ` 2.5.67-mm3 William Lee Irwin III @ 2003-04-16 2:40 ` William Lee Irwin III 0 siblings, 0 replies; 21+ messages in thread From: William Lee Irwin III @ 2003-04-16 2:40 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm On Tue, Apr 15, 2003 at 07:21:54PM -0700, William Lee Irwin III wrote: > follow_hugetlb_page() behaved improperly if its starting address was > not hugepage-aligned. It looked a bit unclean too, so I rewrote it. > This fixes a bug, and more importantly, makes the thing readable by > something other than a compiler (e.g. programmers). And this one fixes an overflow when there is more than 4GB of hugetlb: diff -urpN htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c htlb-2.5.67-bk6-2/arch/i386/mm/hugetlbpage.c --- htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c 2003-04-15 18:58:07.000000000 -0700 +++ htlb-2.5.67-bk6-2/arch/i386/mm/hugetlbpage.c 2003-04-15 19:25:30.000000000 -0700 @@ -482,9 +482,7 @@ int hugetlb_report_meminfo(char *buf) int is_hugepage_mem_enough(size_t size) { - if (size > (htlbpagemem << HPAGE_SHIFT)) - return 0; - return 1; + return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem; } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-04-16 2:40 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-04-14 8:53 2.5.67-mm3 Andrew Morton 2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey 2003-04-14 15:13 ` 2.5.67-mm3 Rudmer van Dijk 2003-04-15 1:03 ` 2.5.67-mm3 Bill Huey 2003-04-15 1:13 ` 2.5.67-mm3 Andrew Morton 2003-04-15 1:34 ` 2.5.67-mm3 Bill Huey 2003-04-15 9:38 ` 2.5.67-mm3 Rudmer van Dijk 2003-04-15 2:00 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 4:17 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 4:31 ` 2.5.67-mm3 Andrew Morton 2003-04-15 4:39 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 4:55 ` 2.5.67-mm3 Andrew Morton 2003-04-15 5:15 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 5:35 ` 2.5.67-mm3 Andrew Morton 2003-04-15 6:09 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 6:10 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 5:52 ` 2.5.67-mm3 Antonio Vargas 2003-04-15 5:52 ` 2.5.67-mm3 William Lee Irwin III 2003-04-15 15:09 ` 2.5.67-mm3 Antonio Vargas 2003-04-16 2:21 ` 2.5.67-mm3 William Lee Irwin III 2003-04-16 2:40 ` 2.5.67-mm3 William Lee Irwin III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox